Public Terabyte Dataset Contact Form

Public Terabyte Dataset

The Public Terabyte Dataset project is a large-scale crawl of top domains, using Bixolab’s elastic web mining platform, Amazon’s Elastic Map Reduce (EMR) web service, and Concurrent’s Cascading workflow API.

The dataset should be available before the end of 2009, and will be free to anybody running in Amazon’s Elastic Compute Cloud (EC2).

Interested in this dataset? More details are available here. And you can use the form below to ask questions and provide input on the project.

Text only. No markup allowed.