3 years ago
in English · 9,275 Views
likes 34clips 23comments 4
Free Big Data Sources Everyone Should Know
I was on LinkedIn today and saw this useful list of big data sources shared by Bernard Marr. All of these have completely free data that you can use for data mining and other forms of data analysis. I was pretty excited to read about it because I'm currently working on a data analysis project and was looking for a data set to work with. • Data.gov http://data.gov/ Over 134,101 free government data sets to choose from agriculture to finance or education. • US Census Bureau http://www.census.gov/data.html US population/geographic/education datasets. • European Union Open Data Portal http://open-data.europa.eu/en/data/ Similar datasets but based in European Union • Data.gov.uk http://data.gov.uk/ Data from UK government that includes British National Bibliography • The CIA World Factbook https://www.cia.gov/library/publications/the-world-factbook/ Data of history, population, economy, government, and more • Healthdata.gov https://www.healthdata.gov/ US healthcare data • NHS Health and Social Care Information Centre http://www.hscic.gov.uk/home UK Healthcare data • Amazon Web Services public datasets http://aws.amazon.com/datasets 1000 Genome Project, NASA database of satellite imagery of Earth, and more • Facebook Graph https://developers.facebook.com/docs/graph-api Basically all the user data Facebook can get away with sharing. Those privacy statements are something to think about. • Gapminder http://www.gapminder.org/data/ Data from World Health Organizations and the World Bank • Google Trends http://www.google.com/trends/explore Stats on search volume for given terms since 2004. Pretty fascinating. • Google Finance https://www.google.com/finance • Google Books Ngrams http://storage.googleapis.com/books/ngrams/books/datasetsv2.html You can search and analyze full texts of millions of books. • National Climatic Data Center http://www.ncdc.noaa.gov/data-access/quick-links#loc-clim Environmental/meterological/climate data sets • DBPedia http://wiki.dbpedia.org/About Wikipedia data • Topsy http://topsy.com/ Social media data • Likebutton http://likebutton.com/ Mines facebook public data • New York Times http://developer.nytimes.com/docs Searchable and indexed archives of news articles dated back to 1851 • Freebase http://www.freebase.com/ Community compiled database of structured data (over 45 million entries) • Million Song Data Set http://aws.amazon.com/datasets/6468931156960467 Metadata on over a million songs and pieces of art Hope you find these data sources useful! (via linkedin) @AreliCanales you might want to save this one!
@TechatHeart what kind of data project are you working on?
3 years ago·Reply
@Goyo time series analysis. I wanted to use a cool dataset but it looks like I'm sticking to earthquakes. Well, not that earthquakes aren't interesting haha
3 years ago·Reply
Time series analysis? of what? Sounds like you are trying to do a regression analysis to find the current state of something due to its past... am i correct?
3 years ago·Reply
@Goyo, Yes, I could use regression analysis you are correct :-)
3 years ago·Reply