Free Large datasets to experiment with Hadoop
- by Sundar
Do you know any large datasets to experiment with Hadoop which is free/low cost?
Any pointers/links related is appreciated.
Prefernce:
Atleast one GB of data.
Production log data of webserver.
Few of them which I found so far:
http://dumps.wikimedia.org/enwiki/20100130/
http://wiki.freebase.com/wiki/Data_dumps
http://aws.amazon.com/publicdatasets/
Also can we run our own crawler to gather data from sites e.g. Wikipedia? Any pointers on how to do this is appreciated as well.