Dream to Learn is shutting down...

We are very sorry to say that Dream to Learn will be shutting down as of December 28th, 2019. If you have content that you wish to keep, you should make a copy of it before that date.

doc header

1001 Datasets and Data repositories ( List of lists of lists )

1001 Datasets and Data repositories ( List of lists of lists )


Archive here:  https://github.com/rustyoldrake/1001-Data-Sets

This is a LIST of.... "lists of lists". Messy presentation to pull together Raw Datasets for my hacks.  Suggestions to add?  Message me or post comment..

CTRL-F to "FIND" is your best bet - e.g. CTRL-F "food" or "population"

100% of the links below are from external sources (not mine)


Follow me on Twitter https://twitter.com/ryan77anderson Need code or pattern once you find the data?  Try here:  https://dreamtolearn.com/ryan/r_journey_to_watson/13




Source: IBM Data Asset eXchange Explore useful data sets for enterprise data science




Source: IBM Model Asset eXchange Free, deployable, and trainable code.

A place for developers to find and use free and open source deep learning models.


  • Audio Classification
  • Audio Feature Extraction
  • Audio Modeling
  • Facial Recognition
  • Image Classification
  • Image Feature Extraction
  • Image-to-Image Translation or Transformation
  • Image-to-Text Translation
  • Language Modeling
  • Named Entity Recognition
  • Natural Language Processing
  • Object Detection in Images
  • Security
  • Text Classification
  • Text Feature Extraction
  • Text-to-Image Translation
  • Time Series Prediction
  • Video Classification


Source: Quora


Cross-disciplinary data repositories, data collections and data search engines:

  1. http://datasource.kapsarc.org
  2. https://www.kaggle.com/datasets
  3. http://www.assetmacro.com
  4. http://usgovxml.com
  5. http://aws.amazon.com/datasets
  6. http://databib.org
  7. http://datacite.org
  8. http://figshare.com
  9. http://linkeddata.org
  10. http://reddit.com/r/datasets
  11. http://thewebminer.com/
  12. http://thedatahub.org alias http://ckan.net
  13. http://quandl.com
  14. Social Network Analysis Interactive Dataset Library (Social Network Datasets)
  15. Datasets for Data Mining
  16. Enigma Public
  17. http://www.ufindthem.com/
  18. http://NetworkRepository.com - The First Interactive Network Data Repository
  19. http://MLvis.com
  20. Open Data Inception - A Comprehensive List of 2500+ Open Data Portals in the World
  21. http://data.opendatasoft.com OpenDataSoft catalog

Single datasets and data repositories

  1. http://archive.ics.uci.edu/ml/
  2. http://crawdad.org/
  3. http://data.austintexas.gov
  4. http://data.cityofchicago.org
  5. http://data.govloop.com
  6. http://data.gov.uk/
  7. data.gov.in
  8. http://data.medicare.gov
  9. http://data.seattle.gov
  10. http://data.sfgov.org
  11. http://data.sunlightlabs.com
  12. https://datamarket.azure.com/
  13. http://developer.yahoo.com/geo/g...
  14. http://econ.worldbank.org/datasets
  15. http://en.wikipedia.org/wiki/Wik...
  16. http://factfinder.census.gov/ser...
  17. http://ftp.ncbi.nih.gov/
  18. http://gettingpastgo.socrata.com
  19. http://googleresearch.blogspot.c...
  20. http://books.google.com/ngrams/
  21. http://medihal.archives-ouvertes.fr
  22. http://public.resource.org/
  23. http://rechercheisidore.fr
  24. http://snap.stanford.edu/data/in...
  25. http://timetric.com/public-data/
  26. https://wist.echo.nasa.gov/~wist...
  27. http://www2.jpl.nasa.gov/srtm
  28. http://www.archives.gov/research...
  29. http://www.bls.gov/
  30. http://www.crunchbase.com/
  31. http://www.dartmouthatlas.org/
  32. http://www.data.gov/
  33. http://www.datakc.org
  34. http://dbpedia.org
  35. http://www.delicious.com/jbaldwi...
  36. http://www.faa.gov/data_research/
  37. http://www.factual.com/
  38. http://research.stlouisfed.org/f...
  39. http://www.freebase.com/
  40. http://www.google.com/publicdata...
  41. http://www.guardian.co.uk/news/d...
  42. http://www.infochimps.com
  43. http://www.kaggle.com/
  44. http://build.kiva.org/
  45. http://www.nationalarchives.gov....
  46. http://www.nyc.gov/html/datamine...
  47. http://www.ordnancesurvey.co.uk/...
  48. http://www.philwhln.com/how-to-g...
  49. http://www.imdb.com/interfaces
  50. http://imat-relpred.yandex.ru/en...
  51. http://www.dados.gov.pt/pt/catal...
  52. http://knoema.com
  53. http://daten.berlin.de/
  54. http://www.qunb.com
  55. http://databib.org/
  56. http://datacite.org/
  57. http://data.reegle.info/
  58. http://data.wien.gv.at/
  59. http://data.gov.bc.ca
  60. https://pslcdatashop.web.cmu.edu/ (interaction data in learning environments)
  61. http://www.icpsr.umich.edu/icpsrweb/CPES/ - Collaborative Psychiatric Epidemiology Surveys: (A collection of three national surveys focused on each of the major ethnic groups to study psychiatric illnesses and health services use)
  62. http://www.dati.gov.it
  63. http://dati.trentino.it
  64. http://www.databagg.com/
  65. http://networkrepository.com - Network/ML data repository w/ visual interactive analytics
  66. Home (United Nations Environment Programme Grid Genava a lot of GIS datasets



Source: Quora - Alan Morrison PWC





Image Processing

Machine Learning


Natural Language


Public Domains

Search Engines

Social Sciences


Time Series


Complementary Collections

Source: Xiaming's Github caesar0301/awesome-public-datasets, January 2015. Please go to Github for this and other updated lists.


International Historical Statistics (by Brian Mitchell)

  • Data: Aggregate trade (current value), bilateral trade with main trading partners (current value), and major commodity exports by main exporting countries. No data on trade as share of GDP is readily available.
  • Geographical coverage: Countries around the world
  • Time span: Long time series with annual observations – from 19th century up to today (2010)
  • Available at: The books are published in three volumes covering more than 5000 pages. 11 At some universities you can access the online version of the books where data tables can be downloaded as ePDFs and Excel files. The online access ishere.
  • Data from the 19th century onwards for countries around the world is available in the International Historical Statistics (IHS). These statistics – originally published under the editorial leadership of Brian Mitchell (since 1983) – are a collection of data sets taken from many primary sources, including both official national and international abstracts.

Penn World Tables

  • Data: Real and PPP-adjusted GDP in US millions of dollars, national accounts (household consumption, investment, government consumption, exports and imports), exchange rates and population figures.
  • Geographical coverage: Countries around the world
  • Time span: from 1950-2011 (version 8.1)
  • Available at: Online here
  • Feenstra, Robert C., Robert Inklaar and Marcel P. Timmer (2015), “The Next Generation of the Penn World Table” forthcoming American Economic Review, available for download at www.ggdc.net/pwt

Correlates of War Bilateral Trade

  • Data: Total national trade and bilateral trade flows between states. Total imports and exports of each country in current US millions of dollars and bilateral flows in current US millions of dollars
  • Geographical coverage: Single countries around the world
  • Time span: from 1870-2009
  • Available at: Online at www.correlatesofwar.org
  • This data set is hosted by Katherine Barbieri, University of South Carolina, and Omar Keshk, Ohio State University.

World Bank – World Development Indicators

  • Data: Trade (% of GDP) and many more specific series: trade in merchandise, trade in services, trade in high-technology, trade in ICT goods, trade in ICT services – always exports and imports separately. Also export and import value index and volume index.
  • Geographical coverage: Countries and world regions
  • Time span: Annual since 1960
  • Available at: Online at http://data.worldbank.org

UN Comtrade

  • Data: Bilateral trade flows by commodity
  • Geographical coverage: Countries around the world
  • Time span: 1962-2013
  • Available at: Online here


  • Data: Many different measures, including trade by volumes and value
  • Geographical coverage: Countries around the world
  • Time span: For some series, data is available since 1948 – mostly annual, sometimes quarterly.
  • Available at: Online here

Eurostat – COMEXT

  • Data: Trade flows (also by commodity)
  • Geographical coverage: Europe (EU and EFTA)
  • Time span: Mostly since 1988
  • Available at: Online here
  • Also, the Eurostat website ‘Statistics Explained’ publishes up-to-date statistical information on international trade in goodsand services.

World Trade Organization – WTO

  • Data: Many series on tariffs and trade flows
  • Geographical coverage: Countries around the world
  • Time span: Since 1948 for some series
  • Available at: Online here

CEPII database on the World Economy

  • Data: Many different data sets related to international trade, including trade flows by commodity geographical variables, and variables to estimate gravity models
  • Geographical coverage: Countries around the world
  • Time span: Some series go back to the 1990s.
  • Available at: Online here

NBER-United Nations Trade Data, 1962-2000

  • Data: Export and import values and volumes by commodity
  • Geographical coverage: Single countries
  • Time span: 1962-2000
  • Available at: Online here
  • This data is also available from the Center for International Data.

Smaller historical trade data sets

  • Data on UK bilateral trade for the time 1870-1913 was collected by David S. Jacks. It is downloadable in excel format here.
  • For the time 1870-1913 21,000 bilateral trade observations can be found in Mitchener and Weidenmier (2008) – Trade and empire, available in the Economic Journal here.
  • Data on UK, Germany, France, and US between mid-19th to 20th Century can be found here.
  • Data on Developing Country Export – in 1840, 1860, 1880 and 1900 – by John Hanson is available here.
  • Data on trade between England and Africa during the period 1699-1808 is available on the Dutch Data Archiving and Networked Services. It was compiled by Marion Johnson.

Applying these same sources to Education quality in developing countries:



Source: Tableu - How to find the best sources for free, public data sets


  • FiveThirtyEight - A goldmine of over 100 data sets on sports and politics. Examples: March Madness predictions, political polling, the Bachelorette show, etc.
  • The Pudding - This data journalism website aims to explain hotly-debated cultural events with visual essays, sourced from original data sets and primary research. Their GitHub is a hub for pop culture data. Examples: Women’s vs. men’s pants pockets, weather conditions on Mars, etc.
  • Buzzfeed - If you know Buzzfeed, you know that their news site covers a variety of topics in politics, sports, and current events. They also have a rich list of data sets on GitHub. Examples: Trump’s tweets, the text of every State of the Union address, etc.
  • Washington Post - The Washington Post is a respected news source and their list of open data sets contains topics like NCAA financials and transportation data. Examples: School shootings, police shootings, NFL arrests, etc.
  • Viz for Social Good - A hackathon style project that connects the community with non-profit organizations. Examples: Advocating for fatherless boys in Africa, increasing awareness of child refugees, supporting black male entrepreneurs.
  • Makeover Monday - A weekly, social-data project to create a discussion around improving data visualizations. Each Sunday, the team posts a link to a visualization and a data set. Your challenge is to create a better version of the visualization in your own creative way. Their weekly data sets are diverse and stay on the site for reuse, so it is a great place to start in your search for clean data. Examples: Wind energy by state, minimum wage, NHL attendance.
  • Sports Viz Sunday - A community-led project to create, share, and promote visualizations from the world of sports. Sports Viz Sunday hosts a monthly challenge based on a topical sports theme, regularly sharing updates from the sports visualization world and providing rich data sets across a wide range of sports. Examples: World Cup, the Masters, Formula 1 racing.
  • Iron Quest - A project aimed at preparing people for Iron Viz qualifier competitions, offering opportunities to practice finding your own data sets.
  • Twitter data - Twitter has an API that allows you to get data about hashtags, keywords, or accounts. Here’s a guide on how to connect to Twitter data directly in Tableau. If you’re more comfortable working with APIs, you can query to get JSON data, which is a supported data type in Tableau. Here is the complete API documentation. Visualization example: Pulse of Super Bowl LIII.
  • Netflix data - Download your viewing data by going to netflix.com/viewingactivity. Visualization example: I have created a dashboard that compares people’s binges and visualizes Netflix viewing activity over time.
  • Spotify streaming data - Did you know that you can request your personal listening data from Spotify? If you are familiar working with APIs, you can use the Spotify Web API to get data about music artists, albums, and tracks, directly from the Spotify Data Catalogue.
  • Others




Source: Medium: The 50 Best Public Datasets for Machine Learning

"What are some open datasets for machine learning? After scrapping the web for hours after hours, we have created a great cheat sheet for high quality and diverse machine learning datasets.


Dataset Finders

Kaggle: A data science site that contains a variety of externally contributed interesting datasets. You can find all kinds of niche datasets in its master list, from ramen ratings to basketball data to and even seattle pet licenses.

UCI Machine Learning Repository: One of the oldest sources of datasets on the web, and a great first stop when looking for interesting datasets. Although the data sets are user-contributed, and thus have varying levels of cleanliness, the vast majority are clean. You can download data directly from the UCI Machine Learning repository, without registration.

VisualData: Discover computer vision datasets by category, it allows searchable queries.

General Datasets

Public Government datasets

Data.gov: This site makes it possible to download data from multiple US government agencies. Data can range from government budgets to school performance scores. Be warned though: much of the data requires additional research.

Food Environment Atlas: Contains data on how local food choices affect diet in the US.

School system finances: A survey of the finances of school systems in the US.

Chronic disease data: Data on chronic disease indicators in areas across the US.

The US National Center for Education Statistics: Data on educational institutions and education demographics from the US and around the world.

The UK Data Service: The UK’s largest collection of social, economic and population data.

Data USA: A comprehensive visualization of US public data.

Finance & Economics

Quandl: A good source for economic and financial data — useful for building models to predict economic indicators or stock prices.

World Bank Open Data: Datasets covering population demographics, a huge number of economic, and development indicators from across the world.

IMF Data: The International Monetary Fund publishes data on international finances, debt rates, foreign exchange reserves, commodity prices and investments.

Financial Times Market Data: Up to date information on financial markets from around the world, including stock price indexes, commodities and foreign exchange.

Google Trends: Examine and analyze data on internet search activity and trending news stories around the world.

American Economic Association (AEA): A good source to find US macroeconomic data.

Machine Learning Datasets:


Labelme: A large dataset of annotated images.

ImageNet: The de-facto image dataset for new algorithms, organized according to the WordNet hierarchy, in which hundreds and thousands of images depict each node of the hierarchy.

LSUN: Scene understanding with many ancillary tasks (room layout estimation, saliency prediction, etc.)

MS COCO: Generic image understanding and captioning.

COIL100 : 100 different objects imaged at every angle in a 360 rotation.

Visual Genome: Very detailed visual knowledge base with captioning of ~100K images.

Google’s Open Images: A collection of 9 million URLs to images “that have been annotated with labels spanning over 6,000 categories” under Creative Commons.

Labelled Faces in the Wild: 13,000 labeled images of human faces, for use in developing applications that involve facial recognition.

Stanford Dogs Dataset: Contains 20,580 images and 120 different dog breed categories.

Indoor Scene Recognition: A very specific dataset and very useful, as most scene recognition models are better ‘outside’. Contains 67 Indoor categories, and 15620 images.

Sentiment Analysis

Multidomain sentiment analysis dataset: A slightly older dataset that features product reviews from Amazon.

IMDB reviews: An older, relatively small dataset for binary sentiment classification features 25,000 movie reviews.

Stanford Sentiment Treebank: Standard sentiment dataset with sentiment annotations.

Sentiment140: A popular dataset, which uses 160,000 tweets with emoticons pre-removed.

Twitter US Airline Sentiment: Twitter data on US airlines from February 2015, classified as positive, negative, and neutral tweets

Natural Language Processing

HotspotQA Dataset: Question answering dataset featuring natural, multi-hop questions, with strong supervision for supporting facts to enable more explainable question answering systems.

Enron Dataset: Email data from the senior management of Enron, organized into folders.

Amazon Reviews: Contains around 35 million reviews from Amazon spanning 18 years. Data include product and user information, ratings, and the plaintext review.

Google Books Ngrams: A collection of words from Google books.

Blogger Corpus: A collection 681,288-blog posts gathered from blogger.com. Each blog contains a minimum of 200 occurrences of commonly used English words.

Wikipedia Links data: The full text of Wikipedia. The dataset contains almost 1.9 billion words from more than 4 million articles. You can search by word, phrase or part of a paragraph itself.

Gutenberg eBooks List: Annotated list of ebooks from Project Gutenberg.

Hansards text chunks of Canadian Parliament: 1.3 million pairs of texts from the records of the 36th Canadian Parliament.

Jeopardy: Archive of more than 200,000 questions from the quiz show Jeopardy.

SMS Spam Collection in English: A dataset that consists of 5,574 English SMS spam messages

Yelp Reviews: An open dataset released by Yelp, contains more than 5 million reviews.

UCI’s Spambase: A large spam email dataset, useful for spam filtering.


Berkeley DeepDrive BDD100k: Currently the largest dataset for self-driving AI. Contains over 100,000 videos of over 1,100-hour driving experiences across different times of the day and weather conditions. The annotated images come from New York and San Francisco areas.

Baidu Apolloscapes: Large dataset that defines 26 different semantic items such as cars, bicycles, pedestrians, buildings, streetlights, etc.

Comma.ai: More than 7 hours of highway driving. Details include car’s speed, acceleration, steering angle, and GPS coordinates.

Oxford’s Robotic Car: Over 100 repetitions of the same route through Oxford, UK, captured over a period of a year. The dataset captures different combinations of weather, traffic and pedestrians, along with long-term changes such as construction and roadworks.

Cityscape Dataset: A large dataset that records urban street scenes in 50 different cities.

CSSAD Dataset: This dataset is useful for perception and navigation of autonomous vehicles. The dataset skews heavily on roads found in the developed world.

KUL Belgium Traffic Sign Dataset: More than 10000+ traffic sign annotations from thousands of physically distinct traffic signs in the Flanders region in Belgium.

MIT AGE Lab: A sample of the 1,000+ hours of multi-sensor driving datasets collected at AgeLab.

LISA: Laboratory for Intelligent & Safe Automobiles, UC San Diego Datasets: This dataset includes traffic signs, vehicles detection, traffic lights, and trajectory patterns.

Bosch Small Traffic Light Dataset: Dataset for small traffic lights for deep learning.

LaRa Traffic Light Recognition: Another dataset for traffic lights. This is taken in Paris.

WPI datasets: Datasets for traffic lights, pedestrian and lane detection.


MIMIC-III: Openly available dataset developed by the MIT Lab for Computational Physiology, comprising de-identified health data associated with ~40,000 critical care patients. It includes demographics, vital signs, laboratory tests, medications, and more.



Source: GeoPlatform Data.gov Search - Geospatial Platform

The GeoPlatform provides shared and trusted geospatial data, services, and applications for use by the public and by government agencies and partners to meet their mission needs.


NLP Datasets - Source: Niderhoff Github nlp-datasets

https://github.com/niderhoff/nlp-datasets  Alphabetical list of free/public domain datasets with text data for use in Natural Language Processing (NLP). Most stuff here is just raw unstructured text data, if you are looking for annotated corpora or Treebanks refer to the sources at the bottom.

  • Apache Software Foundation Public Mail Archives: all publicly available Apache Software Foundation mail archives as of July 11, 2011 (200 GB)

  • Blog Authorship Corpus: consists of the collected posts of 19,320 bloggers gathered from blogger.com in August 2004. 681,288 posts and over 140 million words. (298 MB)

  • Amazon Fine Food Reviews [Kaggle]: consists of 568,454 food reviews Amazon users left up to October 2012. Paper. (240 MB)

  • Amazon Reviews: Stanford collection of 35 million amazon reviews. (11 GB)

  • ArXiv: All the Papers on archive as fulltext (270 GB) + sourcefiles (190 GB).

  • ASAP Automated Essay Scoring [Kaggle]: For this competition, there are eight essay sets. Each of the sets of essays was generated from a single prompt. Selected essays range from an average length of 150 to 550 words per response. Some of the essays are dependent upon source information and others are not. All responses were written by students ranging in grade levels from Grade 7 to Grade 10. All essays were hand graded and were double-scored. (100 MB)

  • ASAP Short Answer Scoring [Kaggle]: Each of the data sets was generated from a single prompt. Selected responses have an average length of 50 words per response. Some of the essays are dependent upon source information and others are not. All responses were written by students primarily in Grade 10. All responses were hand graded and were double-scored. (35 MB)

  • Classification of political social media: Social media messages from politicians classified by content. (4 MB)

  • CLiPS Stylometry Investigation (CSI) Corpus: a yearly expanded corpus of student texts in two genres: essays and reviews. The purpose of this corpus lies primarily in stylometric research, but other applications are possible. (on request)

  • ClueWeb09 FACC: ClueWeb09 with Freebase annotations (72 GB)

  • ClueWeb11 FACC: ClueWeb11 with Freebase annotations (92 GB)

  • Common Crawl Corpus: web crawl data composed of over 5 billion web pages (541 TB)

  • Cornell Movie Dialog Corpus: contains a large metadata-rich collection of fictional conversations extracted from raw movie scripts: 220,579 conversational exchanges between 10,292 pairs of movie characters, 617 movies (9.5 MB)

  • Corporate messaging: A data categorization job concerning what corporations actually talk about on social media. Contributors were asked to classify statements as information (objective statements about the company or it’s activities), dialog (replies to users, etc.), or action (messages that ask for votes or ask users to click on links, etc.). (600 KB)

  • Crosswikis: English-phrase-to-associated-Wikipedia-article database. Paper. (11 GB)

  • DBpedia: a community effort to extract structured information from Wikipedia and to make this information available on the Web (17 GB)

  • Death Row: last words of every inmate executed since 1984 online (HTML table)

  • Del.icio.us: 1.25 million bookmarks on delicious.com

  • Disasters on social media: 10,000 tweets with annotations whether the tweet referred to a disaster event (2 MB).

  • Economic News Article Tone and Relevance: News articles judged if relevant to the US economy and, if so, what the tone of the article was. Dates range from 1951 to 2014. (12 MB)

  • Enron Email Data: consists of 1,227,255 emails with 493,384 attachments covering 151 custodians (210 GB)

  • Event Registry: Free tool that gives real time access to news articles by 100.000 news publishers worldwide. Has API. (query tool)

  • Examiner.com - Spam Clickbait News Headlines [Kaggle]: 3 Million crowdsourced News headlines published by now defunct clickbait website The Examiner from 2010 to 2015. (200 MB)

  • Federal Contracts from the Federal Procurement Data Center (USASpending.gov): data dump of all federal contracts from the Federal Procurement Data Center found at USASpending.gov (180 GB)

  • Flickr Personal Taxonomies: Tree dataset of personal tags (40 MB)

  • Freebase Data Dump: data dump of all the current facts and assertions in Freebase (26 GB)

  • Freebase Simple Topic Dump: data dump of the basic identifying facts about every topic in Freebase (5 GB)

  • Freebase Quad Dump: data dump of all the current facts and assertions in Freebase (35 GB)

  • GigaOM Wordpress Challenge [Kaggle]: blog posts, meta data, user likes (1.5 GB)

  • Google Books Ngrams: available also in hadoop format on amazon s3 (2.2 TB)

  • Google Web 5gram: contains English word n-grams and their observed frequency counts (24 GB)

  • Gutenberg Ebook List: annotated list of ebooks (2 MB)

  • Hansards text chunks of Canadian Parliament: 1.3 million pairs of aligned text chunks (sentences or smaller fragments) from the official records (Hansards) of the 36th Canadian Parliament. (82 MB)

  • Harvard Library: over 12 million bibliographic records for materials held by the Harvard Library, including books, journals, electronic resources, manuscripts, archival materials, scores, audio, video and other materials. (4 GB)

  • Hate speech identification: Contributors viewed short text and identified if it a) contained hate speech, b) was offensive but without hate speech, or c) was not offensive at all. Contains nearly 15K rows with three contributor judgments per text string. (3 MB)

  • Hillary Clinton Emails [Kaggle]: nearly 7,000 pages of Clinton's heavily redacted emails (12 MB)

  • Home Depot Product Search Relevance [Kaggle]: contains a number of products and real customer search terms from Home Depot's website. The challenge is to predict a relevance score for the provided combinations of search terms and products. To create the ground truth labels, Home Depot has crowdsourced the search/product pairs to multiple human raters. (65 MB)

  • Identifying key phrases in text: Question/Answer pairs + context; context was judged if relevant to question/answer. (8 MB)

  • Jeopardy: archive of 216,930 past Jeopardy questions (53 MB)

  • 200k English plaintext jokes: archive of 208,000 plaintext jokes from various sources.

  • Machine Translation of European Languages: (612 MB)

  • Material Safety Datasheets: 230,000 Material Safety Data Sheets. (3 GB)

  • Million News Headlines - ABC Australia [Kaggle]: 1.3 Million News headlines published by ABC News Australia from 2003 to 2017. (56 MB)

  • MCTest: a freely available set of 660 stories and associated questions intended for research on the machine comprehension of text; for question answering (1 MB)

  • NEGRA: A Syntactically Annotated Corpus of German Newspaper Texts. Available for free for all Universities and non-profit organizations. Need to sign and send form to obtain. (on request)

  • News Headlines of India - Times of India [Kaggle]: 2.7 Million News Headlines with category published by Times of India from 2001 to 2017. (185 MB)

  • News article / Wikipedia page pairings: Contributors read a short article and were asked which of two Wikipedia articles it matched most closely. (6 MB)

  • NIPS2015 Papers (version 2) [Kaggle]: full text of all NIPS2015 papers (335 MB)

  • NYTimes Facebook Data: all the NYTimes facebook posts (5 MB)

  • One Week of Global News Feeds [Kaggle]: News Event Dataset of 1.4 Million Articles published globally in 20 languages over one week of August 2017. (115 MB)

  • Objective truths of sentences/concept pairs: Contributors read a sentence with two concepts. For example “a dog is a kind of animal” or “captain can have the same meaning as master.” They were then asked if the sentence could be true and ranked it on a 1-5 scale. (700 KB)

  • Open Library Data Dumps: dump of all revisions of all the records in Open Library. (16 GB)

  • Personae Corpus: collected for experiments in Authorship Attribution and Personality Prediction. It consists of 145 Dutch-language essays by 145 different students. (on request)

  • Reddit Comments: every publicly available reddit comment as of july 2015. 1.7 billion comments (250 GB)

  • Reddit Comments (May ‘15) [Kaggle]: subset of above dataset (8 GB)

  • Reddit Submission Corpus: all publicly available Reddit submissions from January 2006 - August 31, 2015). (42 GB)

  • Reuters Corpus: a large collection of Reuters News stories for use in research and development of natural language processing, information retrieval, and machine learning systems. This corpus, known as "Reuters Corpus, Volume 1" or RCV1, is significantly larger than the older, well-known Reuters-21578 collection heavily used in the text classification community. Need to sign agreement and sent per post to obtain. (2.5 GB)

  • SaudiNewsNet: 31,030 Arabic newspaper articles alongwith metadata, extracted from various online Saudi newspapers. (2 MB)

  • SMS Spam Collection: 5,574 English, real and non-enconded SMS messages, tagged according being legitimate (ham) or spam. (200 KB)

  • SouthparkData: .csv files containing script information including: season, episode, character, & line. (3.6 MB)

  • Stackoverflow: 7.3 million stackoverflow questions + other stackexchanges (query tool)

  • Twitter Cheng-Caverlee-Lee Scrape: Tweets from September 2009 - January 2010, geolocated. (400 MB)

  • Twitter New England Patriots Deflategate sentiment: Before the 2015 Super Bowl, there was a great deal of chatter around deflated footballs and whether the Patriots cheated. This data set looks at Twitter sentiment on important days during the scandal to gauge public sentiment about the whole ordeal. (2 MB)

  • Twitter Progressive issues sentiment analysis: tweets regarding a variety of left-leaning issues like legalization of abortion, feminism, Hillary Clinton, etc. classified if the tweets in question were for, against, or neutral on the issue (with an option for none of the above). (600 KB)

  • Twitter Sentiment140: Tweets related to brands/keywords. Website includes papers and research ideas. (77 MB)

  • Twitter sentiment analysis: Self-driving cars: contributors read tweets and classified them as very positive, slightly positive, neutral, slightly negative, or very negative. They were also prompted asked to mark if the tweet was not relevant to self-driving cars. (1 MB)

  • Twitter Tokyo Geolocated Tweets: 200K tweets from Tokyo. (47 MB)

  • Twitter UK Geolocated Tweets: 170K tweets from UK. (47 MB)

  • Twitter USA Geolocated Tweets: 200k tweets from the US (45MB)

  • Twitter US Airline Sentiment [Kaggle]: A sentiment analysis job about the problems of each major U.S. airline. Twitter data was scraped from February of 2015 and contributors were asked to first classify positive, negative, and neutral tweets, followed by categorizing negative reasons (such as "late flight" or "rude service"). (2.5 MB)

  • U.S. economic performance based on news articles: News articles headlines and excerpts ranked as whether relevant to U.S. economy. (5 MB)

  • Urban Dictionary Words and Definitions [Kaggle]: Cleaned CSV corpus of 2.6 Million of all Urban Dictionary words, definitions, authors, votes as of May 2016. (238 MB)

  • Wesbury Lab Usenet Corpus: anonymized compilation of postings from 47,860 English-language newsgroups from 2005-2010 (40 GB)

  • Wesbury Lab Wikipedia Corpus Snapshot of all the articles in the English part of the Wikipedia that was taken in April 2010. It was processed, as described in detail below, to remove all links and irrelevant material (navigation text, etc) The corpus is untagged, raw text. Used by Stanford NLP (1.8 GB).

  • Wikipedia Extraction (WEX): a processed dump of english language wikipedia (66 GB)

  • Wikipedia XML Data: complete copy of all Wikimedia wikis, in the form of wikitext source and metadata embedded in XML. (500 GB)

  • Yahoo! Answers Comprehensive Questions and Answers: Yahoo! Answers corpus as of 10/25/2007. Contains 4,483,032 questions and their answers. (3.6 GB)

  • Yahoo! Answers consisting of questions asked in French: Subset of the Yahoo! Answers corpus from 2006 to 2015 consisting of 1.7 million questions posed in French, and their corresponding answers. (3.8 GB)

  • Yahoo! Answers Manner Questions: subset of the Yahoo! Answers corpus from a 10/25/2007 dump, selected for their linguistic properties. Contains 142,627 questions and their answers. (104 MB)

  • Yahoo! HTML Forms Extracted from Publicly Available Webpages: contains a small sample of pages that contain complex HTML forms, contains 2.67 million complex forms. (50+ GB)

  • Yahoo! Metadata Extracted from Publicly Available Web Pages: 100 million triples of RDF data (2 GB)

  • Yahoo N-Gram Representations: This dataset contains n-gram representations. The data may serve as a testbed for query rewriting task, a common problem in IR research as well as to word and sentence similarity task, which is common in NLP research. (2.6 GB)

  • Yahoo! N-Grams, version 2.0: n-grams (n = 1 to 5), extracted from a corpus of 14.6 million documents (126 million unique sentences, 3.4 billion running words) crawled from over 12000 news-oriented sites (12 GB)

  • Yahoo! Search Logs with Relevance Judgments: Annonymized Yahoo! Search Logs with Relevance Judgments (1.3 GB)

  • Yahoo! Semantically Annotated Snapshot of the English Wikipedia: English Wikipedia dated from 2006-11-04 processed with a number of publicly-available NLP tools. 1,490,688 entries. (6 GB)

  • Yelp: including restaurant rankings and 2.2M reviews (on request)

  • Youtube: 1.7 million youtube videos descriptions (torrent)






Free Public Data Sets for Your First Data Science Project


    1. United States Census Data: The U.S. Census Bureau publishes reams of demographic data at the state, city, and even zip code level. The data set is fantastic for creating geographic data visualizations and can be accessed on the Census Bureau website. Alternatively, the data can be accessed via an API. One convenient way to use that API is through the choroplethr. In general, this data is very clean and very comprehensive.
    2. FBI Crime Data: The FBI crime data set is fascinating. If you’re interested in analyzing time series data, you can use it to chart changes in crime rates at the national level over a 20-year period. Alternatively, you can look at the data geographically.
    3. CDC Cause of Death: The Centers for Disease Control and Prevention maintains a database on cause of death. The data can be segmented in almost every way imaginable: age, race, year, and so on.
    4. Medicare Hospital Quality: The Centers for Medicare & Medicaid Services maintains a database on quality of care at more than 4,000 Medicare-certified hospitals across the U.S., providing for interesting comparisons.
    5. SEER Cancer Incidence: The U.S. government also has data about cancer incidence, again segmented by age, race, gender, year, and other factors. It comes from the National Cancer Institute’s Surveillance, Epidemiology, and End Results Program.
    6. Bureau of Labor Statistics: Many important economic indicators for the United States (like unemployment and inflation) can be found on the Bureau of Labor Statistics website. Most of the data can be segmented both by time and by geography.
    7. Bureau of Economic Analysis: The Bureau of Economic Analysis also has national and regional economic data, including gross domestic product and exchange rates.
    8. IMF Economic Data: For access to global financial statistics and other data, check out the International Monetary Fund’s website.
    9. Dow Jones Weekly Returns: Predicting stock prices is a major application of data analysis and machine learning. One relevant data set to explore is the weekly returns of the Dow Jones Index from the Center for Machine Learning and Intelligent Systems at the University of California, Irvine.
    10. Data.gov.uk: The British government’s official data portal offers access to tens of thousands of data sets on topics such as crime, education, transportation, and health.
    11. Enron Emails: After the collapse of Enron, a data set of roughly 500,000 emails with message text and metadata were released. The data set is now famous and provides an excellent testing ground for text-related analysis. You also can explore other research uses of this data set through the page.
    12. Google Books Ngrams: If you’re interested in truly massive data, the Ngram viewer data set counts the frequency of words and phrases by year across a huge number of text sources. The resulting file is 2.2 TB.
    13. UNICEF: If data about the lives of children around the world is of interest, UNICEF is the most credible source. The organization’s public data sets touch upon nutrition, immunization, and education, among others.
    14. Reddit Comments: Reddit released a data set of every comment that has ever been made on the site. That’s over a terabyte of data uncompressed, so if you want a smaller data set to work with Kaggle has hosted the comments from May 2015 on their site.
    15. Wikipedia: Wikipedia provides instructions for downloading the text of English-language articles, in addition to other projects from the Wikimedia Foundation.
    16. Lending Club: Lending Club provides data about loan applications it has rejected as well as the performance of loans that it issued. The data set lends itself both to categorization techniques (will a given loan default) as well as regressions (how much will be paid back on a given loan).
    17. Walmart: Walmart has released historical sales data for 45 stores located in different regions across the United States.
    18. Airbnb: Inside Airbnb offers different data sets related to Airbnb listings in dozens of cities around the world.
    19. Yelp: Yelp maintains a dataset for use in personal, educational, and academic purposes. It includes 6 million reviews spanning 189,000 businesses in 10 metropolitan areas. Students are welcome to participate in Yelp’s dataset challenge.


Economic Datasets

Each year since 1978, the Federal Reserve Bank of Kansas City has sponsored a symposium on an important economic issue facing the U.S. and world economies. Symposium participants include prominent central bankers, finance ministers, academics, and financial market participants from around the world. Papers, commentary, and discussion.



Data From Figure 8


Image URLs, the matched word, whether the pair matched, and a confidence score for each

Judge emotions about nuclear energy from Twitter
Decide whether two English sentences are related

Evaluate how similar are two sets of words on a seven point scale

Sentiment Analysis Global Warming/Climate Change

Judge Emotion About Brands
tweets that mention Claritin for October, 2012

Sentence plausibility-  ranked them on a scale of  implausible to plausible

National Park locations

Company categorizations

How beautiful is this image? (Buildings and Architecture)

How beautiful is this image? (Animals)

Gender breakdown of Time Magazine covers

Judge the relatedness of familiar words and made-up ones



    Audio Content Analysis

    Source: Alexander Lerch / Audio Content Analysis



    AWS Public Data Sets

    Source: AWS Public Data Sets  https://aws.amazon.com/public-datasets/

    Learn more about working with geospatial data on AWS at Earth on AWS.

    • Landsat on AWS: An ongoing collection of satellite imagery of all land on Earth produced by the Landsat 8 satellite.
    • Sentinel-2 on AWS: An ongoing collection of satellite imagery of all land on Earth produced by the Sentinel-2 satellite.
    • GOES on AWS: GOES provides continuous weather imagery and monitoring of meteorological and space environment data across North America.
    • SpaceNet on AWS: A corpus of commercial satellite imagery and labeled training data to foster innovation in the development of computer vision algorithms.
    • OpenStreetMap on AWS: OSM is a free, editable map of the world, created and maintained by volunteers. Regular OSM data archives are made available in Amazon S3.
    • MODIS on AWS: Select products from the Moderate Resolution Imaging Spectroradiometer (MODIS) managed by the U.S. Geological Survey and NASA.
    • Terrain Tiles: A global dataset providing bare-earth terrain heights, tiled for easy usage and provided on S3.
    • NAIP: 1 meter aerial imagery captured during the agricultural growing seasons in the continental U.S.
    • NEXRAD on AWS: Real-time and archival data from the Next Generation Weather Radar (NEXRAD) network.
    • NASA NEX: A collection of Earth science datasets maintained by NASA, including climate change projections and satellite images of the Earth's surface.
    • District of Columbia LiDAR: LiDAR point cloud data for Washington, DC.
    • EPA Risk-Screening Environmental Indicators: detailed air model results from EPA’s Risk-Screening Environmental Indicators (RSEI) model.
    • HIRLAM Weather Model: HIRLAM (High Resolution Limited Area Model) is an operational synoptic and mesoscale weather prediction model managed by the Finnish Meteorological Institute.

        Learn more about genomics in the cloud.

        • 1000 Genomes Project: A detailed map of human genetic variation.
        • TCGA on AWS: Raw and processed genomic, transcriptomic, and epigenomic data from The Cancer Genome Atlas (TCGA) available to qualified researchers via the Cancer Genomics Cloud.
        • ICGC on AWS: Whole genome sequence data available to qualified researchers via The International Cancer Genome Consortium (ICGC).
        • 3000 Rice Genome on AWS: Genome sequence of 3,024 rice varieties.
        • Genome in a Bottle (GIAB): Several reference genomes to enable translation of whole human genome sequencing to clinical practice.

            Learn more about artificial intelligence and machine learning on AWS.

            • Common Crawl: A corpus of web crawl data composed of over 5 billion web pages.
            • Amazon Bin Image Dataset: Over 500,000 bin JPEG images and corresponding JSON metadata files describing products in an operating Amazon Fulfillment Center.
            • GDELT: Over a quarter-billion records monitoring the world's broadcast, print, and web news from nearly every corner of every country, updated daily.
            • Multimedia Commons: A collection of nearly 100M images and videos with audio and visual features and annotations.
            • Google Books Ngrams: A dataset containing Google Books n-gram corpuses.
            • SpaceNet on AWS: A corpus of commercial satellite imagery and labeled training data to foster innovation in the development of computer vision algorithms.
            • IRS 990 Filings on AWS: Machine-readable data from certain electronic 990 forms filed with the IRS from 2011 to present.
            • ACS PUMS on AWS: U.S. Census American Community Survey (ACS) Public Use Microdata Sample (PUMS) is available in a linked data format using the Resource Description Framework (RDF) data model.
            • USAspending.gov on AWS: USAspending.gov database, which includes data on all spending by the federal government, including contracts, grants, loans, employee salaries, and more.  


            Source:  CRAN

            Provides functions to download data from UK Parliament (Text/speeches)



            Source:  https://deeplearning4j.org/opendata


            Source:  Quora:


            Cross-disciplinary data repositories, data collections and data search engines:

            1. http://datasource.kapsarc.org
            2. https://www.kaggle.com/datasets
            3. http://www.assetmacro.com
            4. http://usgovxml.com
            5. http://aws.amazon.com/datasets
            6. http://databib.org
            7. http://datacite.org
            8. http://figshare.com
            9. http://linkeddata.org
            10. http://reddit.com/r/datasets
            11. http://thewebminer.com/
            12. http://thedatahub.org alias http://ckan.net
            13. http://quandl.com
            14. Social Network Analysis Interactive Dataset Library (Social Network Datasets)
            15. Datasets for Data Mining
            16. Enigma Public
            17. http://www.ufindthem.com/
            18. http://NetworkRepository.com - The First Interactive Network Data Repository
            19. http://MLvis.com
            20. Open Data Inception - A Comprehensive List of 2500+ Open Data Portals in the World
            21. http://data.opendatasoft.com OpenDataSoft catalog


            Single datasets and data repositories

            1. http://archive.ics.uci.edu/ml/
            2. http://crawdad.org/
            3. http://data.austintexas.gov
            4. http://data.cityofchicago.org
            5. http://data.govloop.com
            6. http://data.gov.uk/
            7. data.gov.in
            8. http://data.medicare.gov
            9. http://data.seattle.gov
            10. http://data.sfgov.org
            11. http://data.sunlightlabs.com
            12. https://datamarket.azure.com/
            13. http://developer.yahoo.com/geo/g...
            14. http://econ.worldbank.org/datasets
            15. http://en.wikipedia.org/wiki/Wik...
            16. http://factfinder.census.gov/ser...
            17. http://ftp.ncbi.nih.gov/
            18. http://gettingpastgo.socrata.com
            19. http://googleresearch.blogspot.c...
            20. http://books.google.com/ngrams/
            21. http://medihal.archives-ouvertes.fr
            22. http://public.resource.org/
            23. http://rechercheisidore.fr
            24. http://snap.stanford.edu/data/in...
            25. http://timetric.com/public-data/
            26. https://wist.echo.nasa.gov/~wist...
            27. http://www2.jpl.nasa.gov/srtm
            28. http://www.archives.gov/research...
            29. http://www.bls.gov/
            30. http://www.crunchbase.com/
            31. http://www.dartmouthatlas.org/
            32. http://www.data.gov/
            33. http://www.datakc.org
            34. http://dbpedia.org
            35. http://www.delicious.com/jbaldwi...
            36. http://www.faa.gov/data_research/
            37. http://www.factual.com/
            38. http://research.stlouisfed.org/f...
            39. http://www.freebase.com/
            40. http://www.google.com/publicdata...
            41. http://www.guardian.co.uk/news/d...
            42. http://www.infochimps.com
            43. http://www.kaggle.com/
            44. http://build.kiva.org/
            45. http://www.nationalarchives.gov....
            46. http://www.nyc.gov/html/datamine...
            47. http://www.ordnancesurvey.co.uk/...
            48. http://www.philwhln.com/how-to-g...
            49. http://www.imdb.com/interfaces
            50. http://imat-relpred.yandex.ru/en...
            51. http://www.dados.gov.pt/pt/catal...
            52. http://knoema.com
            53. http://daten.berlin.de/
            54. http://www.qunb.com
            55. http://databib.org/
            56. http://datacite.org/
            57. http://data.reegle.info/
            58. http://data.wien.gv.at/
            59. http://data.gov.bc.ca
            60. https://pslcdatashop.web.cmu.edu/ (interaction data in learning environments)
            61. http://www.icpsr.umich.edu/icpsrweb/CPES/ - Collaborative Psychiatric Epidemiology Surveys: (A collection of three national surveys focused on each of the major ethnic groups to study psychiatric illnesses and health services use)
            62. http://www.dati.gov.it
            63. http://dati.trentino.it
            64. http://www.databagg.com/
            65. http://networkrepository.com - Network/ML data repository w/ visual interactive analytics
            66. Home (United Nations Environment Programme Grid Genava a lot of GIS datasets

            Source: Google Search

            r-directory > Reference Links > Free Data Sets  https://r-dir.com/reference/datasets.html
            Big Data Made Simple - 70 WebSites - http://bigdata-madesimple.com/70-websites-to-get-large-data-repositories-for-free/
            18 places to find data sets for data science projects https://www.dataquest.io/blog/free-datasets-for-projects/

            Source:  IBM -  https://apsportal.ibm.com/community

            OrNotebookto default project

            Source: http://www.data.gov/

            Check out Data.gov’s new Metrics Pag - July 31, 2017  By Data.gov



            SOURCE -




            1. Data.gov http://data.gov The US Government pledged last year to make all government data available freely online. This site is the first stage and acts as a portal to all sorts of amazing information on everything from climate to crime.
            2. US Census Bureau http://www.census.gov/data.html A wealth of information on the lives of US citizens covering population data, geographic data and education.
            3. Socrata is another interesting place to explore government-related data, with some visualisation tools built-in.
            4. European Union Open Data Portal http://open-data.europa.eu/en/data/ As the above, but based on data from European Union institutions.
            5. Data.gov.uk http://data.gov.uk/ Data from the UK Government, including the British National Bibliography – metadata on all UK books and publications since 1950.
            6. Canada Open Data is a pilot project with many government and geospatial datasets.
            7. Datacatalogs.org offers open government data from US, EU, Canada, CKAN, and more.
            8. The CIA World Factbook https://www.cia.gov/library/publications/the-world-factbook/ Information on history, population, economy, government, infrastructure and military of 267 countries.Healthdata.gov https://www.healthdata.gov/ 125 years of US healthcare data including claim-level Medicare data, epidemiology and population statistics.
            9. NHS Health and Social Care Information Centre http://www.hscic.gov.uk/home Health data sets from the UK National Health Service.
            10. UNICEF offers statistics on the situation of women and children worldwide.
            11. World Health Organization offers world hunger, health, and disease statistics.
            12. Amazon Web Services public datasets http://aws.amazon.com/datasets Huge resource of public data, including the 1000 Genome Project, an attempt to build the most comprehensive database of human genetic information and NASA ’s database of satellite imagery of Earth.
            13. Facebook FB +0.23% Graph https://developers.facebook.com/docs/graph-api Although much of the information on users’ Facebook profile is private, a lot isn’t – Facebook provide the Graph API as a way of querying the huge amount of information that its users are happy to share with the world (or can’t hide because they haven’t worked out how the privacy settings work).
            14. Face.com: A fascinating tool for facial recognition data.
            15. UCLA makes some of the data from its courses public.
            16. Data Market is a place to check out  data related to economics, healthcare, food and agriculture, and the automotive industry.
            17. Google Public data explorer includes data from world development indicators, OECD, and human development indicators, mostly related to economics data and the world.
            18. Junar is a data scraping service that also includes data feeds.
            19. Buzzdata is a social data sharing service that allows you to upload your own data and connect with others who are uploading their data.
            20. Gapminder http://www.gapminder.org/data/ Compilation of data from sources including the World Health Organization and World Bank covering economic, medical and social statistics from around the world.
            21. Google GOOGL +0.25% Trends http://www.google.com/trends/explore Statistics on search volume (as a proportion of total search) for any given term, since 2004.
            22. Google Finance https://www.google.com/finance 40 years’ worth of stock market data, updated in real time.
            23. Google Books Ngrams http://storage.googleapis.com/books/ngrams/books/datasetsv2.html Search and analyze the full text of any of the millions of books digitised as part of the Google Books project.
            24. National Climatic Data Center http://www.ncdc.noaa.gov/data-access/quick-links#loc-clim Huge collection of environmental, meteorological and climate data sets from the US National Climatic Data Center. The world’s largest archive of weather data.
            25. DBPedia http://wiki.dbpedia.org Wikipedia is comprised of millions of pieces of data, structured and unstructured on every subject under the sun. DBPedia is an ambitious project to catalogue and create a public, freely distributable database allowing anyone to analyze this data.
            26. New York Times NYT -0.42% http://developer.nytimes.com/docs Searchable, indexed archive of news articles going back to 1851.
            27. Freebase http://www.freebase.com/ A community-compiled database of structured data about people, places and things, with over 45 million entries.
            28. Million Song Data Set http://aws.amazon.com/datasets/6468931156960467 Metadata on over a million songs and pieces of music. Part of Amazon Web Services.
            29. UCI Machine Learning Repository is a dataset specifically pre-processed for machine learning.
            30. Financial Data Finder at OSU offers a large catalog of financial data sets.
            31. Pew Research Center offers its raw data from its fascinating research into American life.
            32. The BROAD Institute offers a number of cancer-related datasets.



            Source: Caesar0301 Awesome Data Sets









            Complex Networks


            Computer Networks


            Data Challenges


            Earth Science
















            Image Processing


            Machine Learning




            Natural Language








            Public Domains


            Search Engines


            Social Networks


            Social Sciences






            Time Series




            Source: United Nations http://data.un.org/DataMartInfo.aspx

            Source: http://www.kdnuggets.com/datasets/index.html

            1AWS (Amazon Web Services) Public Data Sets, provides a centralized repository of public data sets that can be seamlessly integrated into AWS cloud-based applications.
            2BigML big list of public data sources.
            3Bioassay data, described in Virtual screening of bioassay data, by Amanda Schierz, J. of Cheminformatics, with 21 Bioassay datasets (Active / Inactive compounds) available for download.
            4Bitly 1.usa.gov data, anonymized clicks on gov links.
            5Canada Open Data, pilot project with many government and geospatial datasets.
            6Causality Workbench data repository.
            7Corral Big Data repository at Texas Advanced Computing Center, supporting data-centric science.
            8Data Source Handbook, A Guide to Public Data, by Pete Warden, O'Reilly (Jan 2011).
            9Datacatalogs.org, open government data from US, EU, Canada, CKAN, and more.
            10Data.gov.uk, publicly available data from UK (also London datastore.)
            11Data.gov/Education, central guide for education data resources including high-value data sets, data visualization tools, resources for the classroom, applications created from open data and more.
            12DataMarket, visualize the world's economy, societies, nature, and industries, with 100 million time series from UN, World Bank, Eurostat and other important data providers.
            13Datamob, public data put to good use.
            14DataSF.org, a clearinghouse of datasets available from the City & County of San Francisco, CA.
            15DataFerrett, a data mining tool that accesses and manipulates TheDataWeb, a collection of many on-line US Goverment datasets.
            16Delve, Data for Evaluating Learning in Valid Experiments
            17EconData, thousands of economic time series, produced by a number of US Government agencies.
            18Enron Email Dataset, data from about 150 users, mostly senior management of Enron.
            19Europeana Data, contains open metadata on 20 million texts, images, videos and sounds gathered by Europeana - the trusted and comprehensive resource for European cultural heritage content.
            20 FEDSTATS (updated) comprehensive source of US statistics and more https://www.usa.gov/statistics 
            21FIMI repository for frequent itemset mining, implementations and datasets.
            22Financial Data Finder at OSU, a large catalog of financial data sets.
            23GDELT: The Global Data on Events, Location and Tone, described by Guardian as "a big data history of life, the universe and everything."
            24GEO (GEO Gene Expression Omnibus), a gene expression/molecular abundance repository supporting MIAME compliant data submissions, and a curated, online resource for gene expression data browsing, query and retrieval.
            25GeoDa Center, geographical and spatial data.
            26Google ngrams datasets, text from millions of books scanned by Google.
            27Grain Market Research, financial data including stocks, futures, etc.
            28Hilary Mason research-quality Big Data sets collection - many text and image datasets.
            29HitCompanies Datasets, comprehensive data on random 10,000 UK companies sampled from HitCompanies, updated automatically using AI/Machine Learning.
            30ICWSM-2009 dataset contains 44 million blog posts made between August 1st and October 1st, 2008.
            31Infochimps, an open catalog and marketplace for data. You can share, sell, curate, and download data about anything and everything.
            32Investor Links, includes financial data
            33KDD Cup center, with all data, tasks, and results.
            34Kevin Chai list of datasets, for text, SNA, and other fields.
            35KONECT, the Koblenz Network Collection, with large network datasets of all types in order to perform research in the area of network mining.
            36Linking Open Data project, at making data freely available to everyone.
            37Million Song Dataset
            38MIT Cancer Genomics gene expression datasets and publications, from MIT Whitehead Center for Genome Research.
            39ML Data, the data repository of the EU Pascal2 networks.
            40NASDAQ Data Store, provides access to market data.
            41National Government Statistical Web Sites, data, reports, statistical yearbooks, press releases, and more from about 70 web sites, including countries from Africa, Europe, Asia, and Latin America.
            42National Space Science Data Center (NSSDC), NASA data sets from planetary exploration, space and solar physics, life sciences, astrophysics, and more.
            43Open Data Census, assesses the state of open data around the world.
            44OpenData from Socrata, access to over 10,000 datasets including business, education, government, and fun.
            45Open Source Sports, many sports databases, including Baseball, Football, Basketball, and Hockey.
            46Peter Skomoroch dataset Bookmarks
            47PubGene(TM) Gene Database and Tools, genomic-related publications database
            48Quandl, a collaboratively curated portal to millions of financial and economic time-series datasets.
            49qunb, a platform to find and visualize quantitative data.
            50Robert Schiller data on housing, stock market, and more from his book Irrational Exuberance.
            51SMD: Stanford Microarray Database, stores raw and normalized data from microarray experiments.
            52Jerry Smith dataset collection, with Finance, Government, Machine Learning, Science, and other data.
            53SourceForge.net Research Data, includes historic and status statistics on approximately 100,000 projects and over 1 million registered users' activities at the project management web site.
            54StatLib, CMU Datasets Archive.
            55STATOO Datasets part 1 and STATOO Datasets part 2
            56Time Series Data Library
            57Visual Analytics Benchmark Repository.
            58UCI KDD Database Repository for large datasets used in machine learning and knowledge discovery research.
            59UCI Machine Learning Repository.
            60UCR Time Series Data Archive, offering datasets, papers, links, and code.
            61United States Census Bureau.
            62Wikiposit, a (virtual) amalgamation of (mostly financial) data from many different sites, allowing users to merge data from different sources
            63Wolfram Alpha disease and patient level dat.
            64Yahoo Sandbox datasets, Language, Graph, Ratings, Advertising and Marketing, Competition
            65Yelp Academic Dataset, all the data and reviews of the 250 closest businesses for 30 universities for students and academics to explore and research.

            Source: http://www.kdnuggets.com/datasets/government-local-public.html

            Public data catalogs, portals, and services

            • AWS (Amazon Web Services) Public Data Sets, provides a centralized repository of public data sets that can be seamlessly integrated into AWS cloud-based applications.
            • Datacatalogs.org, open government data from US, EU, Canada, CKAN, and more.
            • DataMarket, visualize the world's economy, societies, nature, and industries, with 100 million time series from UN, World Bank, Eurostat and other important data providers.
            • datamob, Public data put to good use.
            • Enigma, "Google for public data", provides easy access to government, NGO, and other public domain datasets.
            • Freebase, a community-curated database of well-known people, places, and things.
            • Google Public Data, with dynamic visualization and exploration tools.
            • Knoema World Data Atlas, over 1000 indicators on all countries
            • National Government Statistical Web Sites, data, reports, statistical yearbooks, press releases, and more from about 70 web sites, including countries from Africa, Europe, Asia, and Latin America.
            • Open Data Census, assesses the state of open data around the world.
            • Open Data Institute, catalysing the evolution of open data culture to create economic, environmental, and social value.
            • Socrata OpenData, provides social data discovery services for opening government, healthcare, energy, education, or environment data.
            • Visualing Data big collection of sites and services for accessing data.

            Global, International, UN

            • The World Bank, a comprehensive set of data about development in countries around the globe.
            • UN data, a data access system to UN databases
            • UNICEF statistics, data analysis and other data about UNICEF work.


            USA: Federal


            USA: State, City, and Local



            • Europeana Data, contains open metadata on 20 million texts, images, videos and sounds gathered by Europeana - the trusted and comprehensive resource for European cultural heritage content.
            • Eurostat, the leading provider of high quality statistics on Europe.
            • OECD Data Lab, data visualisations and European data downloads.
            • PublicData.eu, access to open, freely reusable datasets from local, regional and national public bodies across Europe.
            • Data Publica, l'annuaire des donnees en France, public data about France.
            • Paris data.







            • Census India, data on population, economic activity, literacy, education, housing, urbanisation, fertility, mortality, and more.

            Australia, NZ, and Pacific

            • Data.gov.au provides an easy way to find, access and reuse public datasets from the Australian Government.
            • Australian Bureau of Statistics, access to the full range of ABS statistical and reference information.
            • Wiki New Zealand, a collaborative website making data about New Zealand accessible for everyone.


            • Open Data for Africa, supporting statistical development in Africa as a sound basis for designing and managing effective development policies for reducing poverty on the continent.




            Source: http://aws.amazon.com/publicdatasets/


            Available Public Data Sets on AWS

            Click here for the detailed list of available data sets. Here are some examples of popular Public Data Sets:

            • NASA NEX: A collection of Earth science data sets maintained by NASA, including climate change projections and satellite images of the Earth's surface
            • Common Crawl Corpus: A corpus of web crawl data composed of over 5 billion web pages
            • 1000 Genomes Project: A detailed map of human genetic variation
              Google Books Ngrams: A data set containing Google Books n-gram corpuses
            • US Census Data: US demographic data from 1980, 1990, and 2000 US Censuses
            • Freebase Data Dump: A data dump of all the current facts and assertions in the Freebase system, an open database covering millions of topics


            Source: http://kevinchai.net/datasets


            Blog articles which provide dataset directories

            http://conflate.net/inductio/2008/02/a-meta-index-of-data-sets/ – excellent article listing available data sets in the area of machine learning and inference
            http://www.daniel-lemire.com/blog/data-for-data-mining/ – has blog, tag cloud, wiki dataset categories
            http://www.readwriteweb.com/archives/where_to_find_open_data_on_the.php – Article containing a list of available dataset websites

            Dataset directories

            http://www.quora.com/Data/Where-can-I-get-large-datasets-open-to-the-public – Public datasets listed on a Quora Q&A thread.
            http://caw2.barcelonamedia.org/node/7 – Content Analysis for the Web 2.0 (CAW 2.0) Workshop – part of 18th International Conference of the World Wide Web. Contains training and test datasets from Twitter, MySpace, Slashdot, Ciao and Kongregate.
            http://kdd.ics.uci.edu/ – has a machine learning repository
            http://archive.ics.uci.edu/ml/datasets.html http://ckan.net/ – listing of links to various datasets
            http://www.ldc.upenn.edu/Obtaining/ – Linguistic data consortium catalog
            http://googleresearch.blogspot.com/ – google research has stated thathttp://research.google.com will soon host open-source scientific datasets –http://blog.wired.com/wiredscience/2008/01/google-to-provi.html – watch this space.
            http://tunedit.org/search?q=arff – 800 datasets in ARFF format for different problems and application domains
            http://gsociology.icaap.org/dataupload.html – The Global Social Change Research Project – social, political and economic datasets

            Data sets for a specific field

            http://kaggle.com/ – machine learning competitions with data provided by organisations with prize money
            http://theinfo.org/get/data – good list here – pay attention to web/news/blogs and Text/Language categories as well as trust network data
            http://research.microsoft.com/nlp/ – look under data sets
            http://nlp.stanford.edu/links/statnlp.html – look under corpora
            http://trec.nist.gov/data/reuters/reuters.html – Reuters Corpora – contains large collection of news stories for use in Natural Language Processing, Information Retrieval and Machine Learning Systems (need to order CDs)

            http://trec.nist.gov/data.html – Text retrieval. Has spam, web, question answering, blog and ad hoc (e.g. relevance judgement) tracks
            http://plg.uwaterloo.ca/~gvcormac/treccorpus/ (300MB) – Spam Corpus 2005
            http://plg.uwaterloo.ca/~gvcormac/treccorpus06/ (75MB – english, 60MB chinese) – Spam Corpus 2006
            http://trec.nist.gov/data/reljudge_eng.html – Relevance Judgement
            http://ir.dcs.gla.ac.uk/test_collections/blog06info.html (25GB – costs 400 GBP) – Blog 06 data
            http://trec.nist.gov/data/qamain.html – Question Answering (many tracks)
            http://trec.nist.gov/data/novelty.html – Novelty (some relevance) -

            http://infochimps.org/tag/language/datasets – languages
            http://infochimps.org/tag/lexicon/datasets – lexicon
            http://infochimps.org/tag/lexical/datasets – lexical

            http://wordnet.princeton.edu/ – Lexical database that is handy for computational linguistics and natural language processing
            http://www.dmoz.org/Computers/Artificial_Intelligence/Machine_Learning/Datasets/ – Machine learning datasets
            http://cervisia.org/machine_learning_data.php – Machine learning datasets – benchmark data for comparing different algorithms of your classifier is recommended fromhttp://www.ci.tuwien.ac.at/~meyer/benchdata/
            http://www.trustlet.org/wiki/Trust_network_datasets#Released_datasets – Trust datasets – includes Epinions
            http://stuff.metafilter.com/infodump/ – Metafilter – contains posts, comments, tags, favourites, contact and user data
            http://an.kaist.ac.kr/traces/IMC2007.html – YouTube dataset
            http://socialnetworks.mpi-sws.mpg.de/ – social network dataset
            http://people.csail.mit.edu/jrennie/20Newsgroups/ – newsgroup dataset
            http://www.yr-bcn.es/webspam/datasets/ – Webspam datasets

            Link Analysis


            Recommender systems

            http://www.grouplens.org/ – MovieLens
            http://www.ieor.berkeley.edu/~goldberg/jester-data/ – Jester
            http://www.netflixprize.com/ – Netflix
            http://www.informatik.uni-freiburg.de/~cziegler/BX/ – Book Crossing


            http://weimo.de/node/642 – Nabble.com + user ratings of posts


            http://ebiquity.umbc.edu/resource/html/id/212/Splog-Blog-Dataset – Spam blogs (splogs)
            http://www.icwsm.org/data.html – 14 million posts, 3 million weblogs – apparently no longer available since Dec 8, 2006
            http://ir.dcs.gla.ac.uk/test_collections/blog06info.html – but costs 400 GBP!


            http://labs.systemone.at/wikipedia3 – wikipedia 3 providing wikipedia datasets
            http://download.wikipedia.org/ – official wikipedia database dumps (very large)
            http://download.freebase.com/wex/ – English wikipedia articles that have been transformed into XML – all files ~ 55GB
            http://dbpedia.org/About – structured information from wikipedia – dataset of this is available


            http://www.archive.org/web/web.php – 85 billion webpages archived since 1996


            http://opentick.com/ – Stock data
            http://lib.stat.cmu.edu/datasets/ – miscellaneous datasets
            http://lib.stat.cmu.edu/jasadata/ – datasets from Journal of the American Statistical Association
            http://musicbrainz.org/ – music dataset
            http://www.jigsaw.com/ – directory of company & business professional dataset
            http://www.librarything.com/ – library catalogue
            http://www.imeem.com/developers – media library
            http://www.scribd.com/doc/9582/integrating-wikipediawordnet – article talking about integrating Wordnet and Wikipedia with YAGO (an extensible and light-weight ontology)
            http://wiki.openstreetmap.org/index.php/Potential_Datasources – country maps
            http://rdf.dmoz.org/ – open directory project dataset



            Source: http://www.quora.com/Data/Where-can-I-find-large-datasets-open-to-the-public


            Cross-disciplinary data repositories, data collections and data search engines:


            Single datasets and data repositories



            Some others:


            Source: http://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html


            datasets-packageThe R Datasets Package

            -- A --

            ability.covAbility and Intelligence Tests
            airmilesPassenger Miles on Commercial US Airlines, 1937-1960
            AirPassengersMonthly Airline Passenger Numbers 1949-1960
            airqualityNew York Air Quality Measurements
            anscombeAnscombe's Quartet of 'Identical' Simple Linear Regressions
            attenuThe Joyner-Boore Attenuation Data
            attitudeThe Chatterjee-Price Attitude Data
            austresQuarterly Time Series of the Number of Australian Residents

            -- B --

            beaver1Body Temperature Series of Two Beavers
            beaver2Body Temperature Series of Two Beavers
            beaversBody Temperature Series of Two Beavers
            BJsalesSales Data with Leading Indicator
            BJsales.leadSales Data with Leading Indicator
            BODBiochemical Oxygen Demand

            -- C --

            carsSpeed and Stopping Distances of Cars
            ChickWeightWeight versus age of chicks on different diets
            chickwtsChicken Weights by Feed Type
            CO2Carbon Dioxide Uptake in Grass Plants
            co2Mauna Loa Atmospheric CO2 Concentration
            crimtabStudent's 3000 Criminals Data

            -- D --

            datasetsThe R Datasets Package
            discoveriesYearly Numbers of Important Discoveries
            DNaseElisa assay of DNase

            -- E --

            esophSmoking, Alcohol and (O)esophageal Cancer
            euroConversion Rates of Euro Currencies
            euro.crossConversion Rates of Euro Currencies
            eurodistDistances Between European Cities
            EuStockMarketsDaily Closing Prices of Major European Stock Indices, 1991-1998

            -- F --

            faithfulOld Faithful Geyser Data
            fdeathsMonthly Deaths from Lung Diseases in the UK
            FormaldehydeDetermination of Formaldehyde
            freenyFreeny's Revenue Data
            freeny.xFreeny's Revenue Data
            freeny.yFreeny's Revenue Data

            -- H --

            HairEyeColorHair and Eye Color of Statistics Students
            Harman23.corHarman Example 2.3
            Harman74.corHarman Example 7.4

            -- I --

            IndomethPharmacokinetics of Indomethacin
            infertInfertility after Spontaneous and Induced Abortion
            InsectSpraysEffectiveness of Insect Sprays
            irisEdgar Anderson's Iris Data
            iris3Edgar Anderson's Iris Data
            islandsAreas of the World's Major Landmasses

            -- J --

            JohnsonJohnsonQuarterly Earnings per Johnson & Johnson Share

            -- L --

            LakeHuronLevel of Lake Huron 1875-1972
            ldeathsMonthly Deaths from Lung Diseases in the UK
            lhLuteinizing Hormone in Blood Samples
            LifeCycleSavingsIntercountry Life-Cycle Savings Data
            LoblollyGrowth of Loblolly pine trees
            longleyLongley's Economic Regression Data
            lynxAnnual Canadian Lynx trappings 1821-1934

            -- M --

            mdeathsMonthly Deaths from Lung Diseases in the UK
            morleyMichelson Speed of Light Data
            mtcarsMotor Trend Car Road Tests

            -- N --

            nhtempAverage Yearly Temperatures in New Haven
            NileFlow of the River Nile
            nottemAverage Monthly Temperatures at Nottingham, 1920-1939
            npkClassical N, P, K Factorial Experiment

            -- O --

            occupationalStatusOccupational Status of Fathers and their Sons
            OrangeGrowth of Orange Trees
            OrchardSpraysPotency of Orchard Sprays

            -- P --

            PlantGrowthResults from an Experiment on Plant Growth
            precipAnnual Precipitation in US Cities
            presidentsQuarterly Approval Ratings of US Presidents
            pressureVapor Pressure of Mercury as a Function of Temperature
            PuromycinReaction Velocity of an Enzymatic Reaction

            -- Q --

            quakesLocations of Earthquakes off Fiji

            -- R --

            randuRandom Numbers from Congruential Generator RANDU
            riversLengths of Major North American Rivers
            rockMeasurements on Petroleum Rock Samples

            -- S --

            SeatbeltsRoad Casualties in Great Britain 1969-84
            sleepStudent's Sleep Data
            stack.lossBrownlee's Stack Loss Plant Data
            stack.xBrownlee's Stack Loss Plant Data
            stacklossBrownlee's Stack Loss Plant Data
            stateUS State Facts and Figures
            state.abbUS State Facts and Figures
            state.areaUS State Facts and Figures
            state.centerUS State Facts and Figures
            state.divisionUS State Facts and Figures
            state.nameUS State Facts and Figures
            state.regionUS State Facts and Figures
            state.x77US State Facts and Figures
            sunspot.monthMonthly Sunspot Data, from 1749 to "Present"
            sunspot.yearYearly Sunspot Data, 1700-1988
            sunspotsMonthly Sunspot Numbers, 1749-1983
            swissSwiss Fertility and Socioeconomic Indicators (1888) Data

            -- T --

            TheophPharmacokinetics of Theophylline
            TitanicSurvival of passengers on the Titanic
            ToothGrowthThe Effect of Vitamin C on Tooth Growth in Guinea Pigs
            treeringYearly Treering Data, -6000-1979
            treesGirth, Height and Volume for Black Cherry Trees

            -- U --

            UCBAdmissionsStudent Admissions at UC Berkeley
            UKDriverDeathsRoad Casualties in Great Britain 1969-84
            UKgasUK Quarterly Gas Consumption
            UKLungDeathsMonthly Deaths from Lung Diseases in the UK
            USAccDeathsAccidental Deaths in the US 1973-1978
            USArrestsViolent Crime Rates by US State
            USJudgeRatingsLawyers' Ratings of State Judges in the US Superior Court
            USPersonalExpenditurePersonal Expenditure Data
            uspopPopulations Recorded by the US Census

            -- V --

            VADeathsDeath Rates in Virginia (1940)
            volcanoTopographic Information on Auckland's Maunga Whau Volcano

            -- W --

            warpbreaksThe Number of Breaks in Yarn during Weaving
            womenAverage Heights and Weights for American Women
            WorldPhonesThe World's Telephones
            WWWusageInternet Usage per Minute


            Source: http://www.reddit.com/r/datasets/


            Source: http://www.nber.org/data/


            Official Business Cycle Dates

            "The American Business Cycle: Continuity and Change"   Historic Data Tables

            Experimental Coincident, Leading and Recession Indexes
            Stock, Watson

            Index of African Governance
            Rotberg, Gisselquist

            Penn-World Tables
            Feenstra, Inklaar, Timmer

            Barro, Lee

            Cross-country Historical Adoption of Technology (CHAT) data
            Comin, Hobijn

            Economic Policy Uncertainty
            Baker, Bloom, Davis

            A History of U.S. Foreign-Exchange-Market Interventions
            Bordo, Humpage, Schwartz

            Occupational Wages around the World
            Freeman, Oostendorp

            Macro History Database

            Savings, Investment, and Gold in 13 countries (1850-1945)
            Jones, Obstfeld

            Social Security Pension Reform in Europe
            Feldstein, Siebert

            Historical Cross-Country Technological Adoption: Dataset
            Comin, Hobijn

            Facts and Fantasies about Commodity Futures
            Gorton, Rouwenhorst

            US Industrial Production Index 1790 - 1915

            Industry, Productivity, and Digitization Data

            Job Creation and Destruction Data
            Haltiwanger et al

            Management Practices Data
            Bloom, Van Reenen

            Manufacturing Industry Productivity Database
            Becker, Gray, Marvakov

            Internet and Economy Digitization Report

            Public Sector Collective Bargaining Law Data
            Valletta, Freeman

            Form 990 data on tax exempt organizations

            International Trade Data

            Price Quantity Indexes and Values for U.S. Exports and Imports, 1879-1923

            SITC Rev 2 and NAICS (1997)

            U.S. Trade by 1972-SIC category, 1958-1994

            U.S. Trade by 1987-SIC, 1972-2005; NAICS 1989-2005; HS 1989-2008
            Concordance between HS and SIC/NAICS; Concordance of HS codes over time
            Pierce and Schott

            U.S. Imports by TSUSA, HS, SITC, 1972-2001

            U.S. Imports by SAS and Stata, 1972-2001

            U.S. Exports by TSUSA, HS, SITC, 1972-2001

            U.S. Exports by SAS and Stata, 1972-2001

            U.S. Tariffs, 1989-2001

            U.S. Antidumping Database and Links

            World Trade Data ( choose World Import and Export Data )
            Feenstra, Lipsey

            Individual Data

            Angrist Archive
            Joshua Angrist

            Boston Youth Labor (Market) Survey, 1980, 1989
            Freeman, Katz

            Collaborative Perinatal (CPP)

            Consumer Expenditure Survey Extracts
            Harris, Sabelhaus (CBO)

            Current Population Survey

            Fatality Analysis Reporting System (FARS) Data

            Gould Sample

            National Health and Nutrition Examination Survey (NHANES)

            Reading National Health Interview Survey (NHIS) Data with SAS, SPSS, or Stata

            Survey of Economic Expectations
            Dominitz, Manski

            Survey of Income and Program Participation

            Survey of Program Dynamics

            Thorndike, Hagen

            Union Army Data Set

            Worker Representation survey
            Freeman, Rogers

            Hospital/Provider Data

            CMS' Prospective Payment System (PPS)

            Reading CMS'  Healthcare Cost Report Information System (HCRIS) datasets using SAS

            CMS's National Plan and Provider Enumeration System (NPPES) Files

            CMS' National Provider Identifier (NPI) to Unique Physician Identification Number (UPIN) Crosswalk

            CMS' National Provider Identifier (NPI) to State License Crosswalk

            CMS' Provider of Service (POS) files

            CMS' Medicare Provider Charge Data

            CMS' ICD-9-CM to and from ICD-10-CM and ICD-10-PCS Crosswalk or General Equivalence Mappings

            CMS's CBSA, MSA, and State Wage Index Files

            CMS' SSA to FIPS CBSA and MSA County Crosswalks

            CMS' SSA to FIPS State and County Crosswalks

            Demographic and Vital Statistics

            Vital Statistics Books ( Historical )

            Vital Statistics Births

            Interactive index to Vital Statistics Births 1931-1968

            Reading SEER U.S. County Population Data with SAS, SPSS, or Stata 1969-on

            Vital Statistics Births and Infant Mortality 1920-1945
            Cutler, Norberg, Norton

            Vital Statistics Births 1940-1968
            Finkelstein, Heidi Williams

            Vital Statistics Mortality Data

            Vital Statistics Deaths - Historical 1900 - 1936
            Grant Miller

            Vital Statistics Marriage and Divorce

            US Decennial Population by County and State 1900-1990

            US Intercensal Population by County and State 1970-2009
            Roth, James Wang

            US Intercensal Population by State, Age and Sex 1970-1999

            Work-Family Policies and Other Data
            Waldfogel, Han, Ruhm


            Patent and Scientific Papers Data


            U.S. Patents
            Hall, Jaffe, Tratjenberg

            NBER-Rensselaer Polytechnic Institute Scientific Papers Database
            Adams, Clemmons

            Nobel Laureate Data
            Jones, Weinberg

            Other Data


            • NBER
            • NCES
            • Feenberg
            • Cutler, Glaeser, Vigdor
            • Wallis
            • Lichtenberg
            • Borenstein
            • Roth
            • Roth
            • Roth
            • Olken
            • Lahey
            • NBER
            • Norberg


            Source: http://www.wto.org/english/res_e/statis_e/data_pub_e.htm



            Source: http://www.imf.org/external/data.htm

            • World Economic Outlook Databases (WEO) updated
            • International Financial Statistics (IFS)
            • Principal Global Indicators (PGI)
            • Balance of Payments Statistics (BOPS)
            • Coordinated Direct Investment Survey (CDIS)
            • Coordinated Portfolio Investment Survey (CPIS) updated
            • Currency Composition of Official Foreign Exchange Reserves (COFER)
            • Data Template on International Reserves and Foreign Currency Liquidity
            • Financial Access Survey (FAS)
            • Financial Soundness Indicators (FSIs)
            • G-20 Surveillance Notes
            • Joint External Debt Hub
            • Monitoring of Fund Arrangements Database (MONA)
            • Primary Commodity Prices
            • Public Sector Debt Statistics Online Centralized Database
            • Quarterly External Debt Statistics (QEDS)


            Source: http://blog.visual.ly/data-sources/

             Government and political data

            • Data.gov: This is the  go-to resource for government-related data. It claims to have up to 400,000 data sets, both raw data and geo spatial, in a variety of formats.
            • The only caveat in using the data sets is you have to make sure you clean them, since many have missing values and characters.
            • Socrata is another good place to explore government-related data. One great thing about Socrata is they have some visualization tools that make exploring the data easier.
            • City-specific government data: Some cities have their own data portals setup to browse through city-related data. For example, at San Francisco Data you can browse through everything from crime statistics to parking spot available in the city.
            • The UN and UN-related sites like UNICEF and the World Health Organization are rich with all kinds of data, from mortality rates to world hunger statistics.
            • The Census Bureau houses a ton of information about our lives around income, race, education, population and business.

            Data aggregators

            These are the places that house data from all kinds of sources. Sometimes it’s easier to find something here related to a specific category.

            • Programmable Web: A really useful resource to explore API’s and also mashups of different API’s.
            • Infochimps have a data marketplace that offers thousands of public and propietary data sets for download and API access, in a wide range of categories, from historical Twitter and OK Cupid data, to geo locations data, in different formats. You can even upload you own data if you like.
            • Data Market is a good place to explore data related to economics, healthcare, food and agriculture, and the automotive industry.
            • Google Public data explorer houses a lot of data from world development indicators, OECD and human development indicators, mostly related to economics data and the world.
            • Junar is a great data scraping service that also houses data feeds.
            • Buzzdata is a social data sharing service that allows you to upload your own data and connect and follow others who are uploading their own data.

            3. Social data

            Usually, the best place to get social data for an API is the site itself: InstagramGetGlue, Foursquare, pretty much all social media sites have their own API’s. Here are more details on the most popular ones.

            • Twitter: Access to the Twitter API for historical uses is fairly limited, to 3200 tweets. For more, check out PeopleBrowsr,  Gnip (also offers historical access to the WP Automattic data feed),DataSiftInfochimpsTopsy.
            • Foursquare: They have their own API and you can get it through Infochimps, as well.
            • FacebookThe Facebook graph API is the best resource for Facebook.
            • Face.com: A great tool for facial recognition data.

            4. Weather data

            • Wunderground has detailed weather information and also let’s you search historical data by zip code or city. It gives temperature, wind, precipitation and hourly observations for that day.
            • Weatherbase has detailed weather stats on temperature, rain and humidity of nearly 27,000 cities.

            5. Sports data

            These three sites have comprehensive information on teams, players coaches and leaders by season.

            ESPN recently came up with its own API, too. You have to be a partner to get access to their data. 

            6. Universities and research

            Searching the work of academics who specialize in a particular area is always a great place to find some interesting data.

            If you come across specific data that you would like to use, say, in a research paper, the best way to go is to contact the professor directly. (That is how we got the data for our What are the Odds piece, which is one of the most-viewed infographics on the web.)

            One university that makes some of the datasets used in its courses publicly available is UCLA.

            7. News data

            The New York Times has a great API and a really good explorer to access any article in the publication. The data is returned in json format.

            The Guardian Data Blog regularly posts visualizations and makes data available through a Google docs format. The great thing about this is that that the data has already been cleaned.

            CDC Data - Source: http://www.cdc.gov/ncbddd/disabilityandhealth/datasets.html

            Behavioral Risk Factor Surveillance System (BRFSS)
            The BRFSS is a telephone survey that tracks national and state-specific health risk behaviors of adults, 18 years of age or older, residing in the United States. The BRFSS is conducted by the 50 states, the District of Columbia, and three territories (Guam, Puerto Rico, and the U.S. Virgin Islands) and is administered and supported by the Division of Adult and Community Health, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention (CDC).

            National Health Interview Survey (NHIS)
            The NHIS is a multi-purpose, nationwide household health survey of the U.S. civilian noninstitutionalized population conducted annually by the National Center for Health Statistics (NCHS), CDC, to produce national estimates for a variety of health indicators. In 1994 and 1995, the NHIS included a special supplement on disability.

            National Health and Nutrition Examination Survey (NHANES) 
            NHANES is a program of studies designed to assess the health and nutritional status of adults and children in the United States. The survey is unique in that it combines information from interviews and physical examinations.

            National Survey of Family Growth (NSFG)
            The NSFG gathers information on family life, marriage and divorce, pregnancy, infertility, use of contraception, and men's and women's health. The survey results are used by the U.S. Department of Health and Human Services and others to plan health services and health education programs, and to do statistical studies of families, fertility, and health.

            American Community Survey (ACS)
            The ACS is a mail survey that provides demographic, socioeconomic, and housing information about communities in between the 10-year census. The ACS is conducted by the U.S. Census Bureau. The survey is sent to a sample of households in the United States. The ACS identifies serious difficulty in four basic areas of functioning: vision, hearing, ambulation, and cognition. The ACS also includes two questions to identify people with difficulties that might affect their ability to live independently.

            Medical Expenditure Panel Survey (MEPS) 
            The MEPS comprise a set of large-scale surveys of families and individuals, their medical providers, and employers across the United States. The MEPS is the most complete source of data on the cost and use of health care and health insurance coverage.

            Survey of Income and Program Participation (SIPP)
            The SIPP is a multipanel, longitudinal survey conducted by the U.S. Census Bureau. The SIPP covers the civilian, noninstitutionalized population of residents of the United States, and collects data on the sources and amount of individual income, labor force information, program participation and eligibility data, and general demographic characteristics. The SIPP also includes disability supplements that ask questions to determine individual disability status.

            Current Population Survey (CPS) 
            The CPS is a monthly survey of about 50,000 households conducted by the U.S. Bureau of the Census for the Bureau of Labor Statistics. The survey has been conducted for more than 50 years. In June 2008, questions were added to the CPS to identify people with a disability among the civilian noninstitutional population 16 years of age or older. Monthly labor force data are released from the CPS for people with a disability. The collection of these data is sponsored by the Department of Labor’s Office of Disability Employment.

            Personality Testing - Source: http://personality-testing.info/_rawdata/

            5/14/2014Answers to Cattell's 16 Personality Factors Test with items from the IPIP.163 likert rated items, gender, age, country and accuracy.4915916PF
            9/6/2012Answers to the Narcissistic Personality Inventory, constructed with the version from Raskin and Terry (1988).40 multiple choice, gender, age, time elapsed11243NPI
            6/18/2012Answers to the Machivallianism Test, a version of the MACH-IV from Christie and Geis (1970).20 likert rated items, gender, age, time elapsed13156MACH2
            5/18/2014Answers to the Big Five Personality Test, constructed with items from the International Personality Item Pool.50 likert rated statements, gender, age, race, native language, country19719BIG5
            7/22/2012Answers to the Taylor Manifest Anxiety Scale, from Taylor (1953).50 true false statements, gender, age5410TMA
            9/6/2012Answers to the Humor Styles Questionnaire, from Martin et. al. (2003).32 likert rated items, gender, age, self-rated accuracy1071HSQ
            7/16/2012Answers to the Empathizing-Systemizing Test, a combined version of Simon Baron-Cohen's empathizing and systemizing quotients.120 likert rated items, gender, age, self-rated accuracy13256EQSQ
            8/5/2013Answers to the Holland Code (RIASEC) Test, constructed with public domain items from the Interest Item Pool.48 likert rated statements, gender, age, country, time elapsed and self-rated accuracy.8855RIASEC
            7/16/2012Answers to the Sexual Compulsivity Scale from Kalichman and Rompa (1995).10 likert rated statements, gender, age3376SCS
            7/18/2012Answers to the IPIP Assertiveness, Social confidence, Adventurousness, and Dominance scales used as part of an experimental personality test.40 likert rated items, gender, age1005AS+SC+AD+DO
            2/15/2014Answers to the Rosenberg Self-Esteem Scale.10 scale rated items, gender, age, country47974RSE
            5/25/2012Answers to an experimental IQ Test previously offered on this website.25 questions/answers, age, gender.400IQ1
            5/25/2012Answers to a sentence completion survey appended to the Holland Code and big five personality tests; at completion of either test takers were solicited to participate (most did).6 incomplete sentence responses, gender, age, and big five or RIASEC traits.1425SENTANCES1
            8/6/2013Answers to the Experinces in Close Relationships Scale.36 likert rated items, gender, age, county.17386ECR
            9/26/2012Answers to the Consideration of Future Consequences Scale.12 likert rated items, gender, age, self-rated accuracy.614CFCS
            8/7/2012Answers to the Kentucky Inventory of Mindfulness Skills from Baer, Smith and Allen (2004).39 likert rated items, gender, age.601KIMS
            9/6/2012Answers to the Multidimensional Sexual Self-Concept Questionnaire.100 likert rated items, gender, age and context.289MSSCQ
            8/8/2013Answers to the Woodworth Psychoneurotic Inventory.116 yes/no questions, gender, age and country.6019WPI
            12/8/2013Answers to the Hypersensitive Narcissism Scale and The Dirty Dozen.22 scale rated items, gender, age, accuracy and country.53981HSNS+DD
            3/8/2014Answers to the Short Dark Triad by Paulhus and Jones (2011).27 scale rated items and country.18192SD3
            4/21/2014Answers to the Feminist Perspectives Scale, from Henley, N.; Meng, K.; O'Brien, D.; McCarthy, W.; Sockloskie, R. (1998). "Developing a Scale to Measure the Diversity of Feminist Attitudes". Psychology of Women Quarterly, 22(2), 317-348.60 scale rated items, gender, age, country.13477FPS
            5/21/2014Answers to the Wagner Preference Inventory, from Wagner, Rudolph F., and Kelly A. Wells. "A refined neurobehavioral inventory of hemispheric preference." Journal of clinical psychology 41.5 (1985): 671-676.12 multiple choice questions, country13502Wagner
            5/23/2014A user generated corpus of personality test items from a short survey were users prompted to generate descriptions of what was unqiue about their personality.3 free response, age, gender, native language, country2722itemsgen
            6/21/2014Answers to the IPIP HEXACO equivalent scales.240 scale rated items, country22786HEXACO



            Source: Awesome Public Datasets









            Complex Networks


            Computer Networks


            Contextual Data


            Data Challenges


















            Image Processing


            Machine Learning




            Natural Language






            Public Domains


            Search Engines


            Social Networks


            Social Sciences






            Time Series




            Complementary Collections


            Source: Neo4J


            Source:  Vincent Arel-Bundock




            datasetsAirPassengersMonthly Airline Passenger Numbers 1949-1960CSV DOC
            datasetsBJsalesSales Data with Leading IndicatorCSV DOC
            datasetsBODBiochemical Oxygen DemandCSV DOC
            datasetsCO2Carbon Dioxide Uptake in Grass PlantsCSV DOC
            datasetsFormaldehydeDetermination of FormaldehydeCSV DOC
            datasetsHairEyeColorHair and Eye Color of Statistics StudentsCSV DOC
            datasetsInsectSpraysEffectiveness of Insect SpraysCSV DOC
            datasetsJohnsonJohnsonQuarterly Earnings per Johnson & Johnson ShareCSV DOC
            datasetsLakeHuronLevel of Lake Huron 1875-1972CSV DOC
            datasetsLifeCycleSavingsIntercountry Life-Cycle Savings DataCSV DOC
            datasetsNileFlow of the River NileCSV DOC
            datasetsOrchardSpraysPotency of Orchard SpraysCSV DOC
            datasetsPlantGrowthResults from an Experiment on Plant GrowthCSV DOC
            datasetsPuromycinReaction Velocity of an Enzymatic ReactionCSV DOC
            datasetsTitanicSurvival of passengers on the TitanicCSV DOC
            datasetsToothGrowthThe Effect of Vitamin C on Tooth Growth in Guinea PigsCSV DOC
            datasetsUCBAdmissionsStudent Admissions at UC BerkeleyCSV DOC
            datasetsUKDriverDeathsRoad Casualties in Great Britain 1969-84CSV DOC
            datasetsUKgasUK Quarterly Gas ConsumptionCSV DOC
            datasetsUSAccDeathsAccidental Deaths in the US 1973-1978CSV DOC
            datasetsUSArrestsViolent Crime Rates by US StateCSV DOC
            datasetsUSJudgeRatingsLawyers' Ratings of State Judges in the US Superior CourtCSV DOC
            datasetsUSPersonalExpenditurePersonal Expenditure DataCSV DOC
            datasetsVADeathsDeath Rates in Virginia (1940)CSV DOC
            datasetsWWWusageInternet Usage per MinuteCSV DOC
            datasetsWorldPhonesThe World's TelephonesCSV DOC
            datasetsairmilesPassenger Miles on Commercial US Airlines, 1937-1960CSV DOC
            datasetsairqualityNew York Air Quality MeasurementsCSV DOC
            datasetsanscombeAnscombe's Quartet of 'Identical' Simple Linear RegressionsCSV DOC
            datasetsattenuThe Joyner-Boore Attenuation DataCSV DOC
            datasetsattitudeThe Chatterjee-Price Attitude DataCSV DOC
            datasetsaustresQuarterly Time Series of the Number of Australian ResidentsCSV DOC
            datasetscarsSpeed and Stopping Distances of CarsCSV DOC
            datasetschickwtsChicken Weights by Feed TypeCSV DOC
            datasetsco2Mauna Loa Atmospheric CO2 ConcentrationCSV DOC
            datasetscrimtabStudent's 3000 Criminals DataCSV DOC
            datasetsdiscoveriesYearly Numbers of Important DiscoveriesCSV DOC
            datasetsesophSmoking, Alcohol and (O)esophageal CancerCSV DOC
            datasetseuroConversion Rates of Euro CurrenciesCSV DOC
            datasetsfaithfulOld Faithful Geyser DataCSV DOC
            datasetsfreenyFreeny's Revenue DataCSV DOC
            datasetsinfertInfertility after Spontaneous and Induced AbortionCSV DOC
            datasetsirisEdgar Anderson's Iris DataCSV DOC
            datasetsislandsAreas of the World's Major LandmassesCSV DOC
            datasetslhLuteinizing Hormone in Blood SamplesCSV DOC
            datasetslongleyLongley's Economic Regression DataCSV DOC
            datasetslynxAnnual Canadian Lynx trappings 1821-1934CSV DOC
            datasetsmorleyMichelson Speed of Light DataCSV DOC
            datasetsmtcarsMotor Trend Car Road TestsCSV DOC
            datasetsnhtempAverage Yearly Temperatures in New HavenCSV DOC
            datasetsnottemAverage Monthly Temperatures at Nottingham, 1920-1939CSV DOC
            datasetsnpkClassical N, P, K Factorial ExperimentCSV DOC
            datasetsoccupationalStatusOccupational Status of Fathers and their SonsCSV DOC
            datasetsprecipAnnual Precipitation in US CitiesCSV DOC
            datasetspresidentsQuarterly Approval Ratings of US PresidentsCSV DOC
            datasetspressureVapor Pressure of Mercury as a Function of TemperatureCSV DOC
            datasetsquakesLocations of Earthquakes off FijiCSV DOC
            datasetsranduRandom Numbers from Congruential Generator RANDUCSV DOC
            datasetsriversLengths of Major North American RiversCSV DOC
            datasetsrockMeasurements on Petroleum Rock SamplesCSV DOC
            datasetssleepStudent's Sleep DataCSV DOC
            datasetsstacklossBrownlee's Stack Loss Plant DataCSV DOC
            datasetssunspot.monthMonthly Sunspot Data, from 1749 to "Present"CSV DOC
            datasetssunspot.yearYearly Sunspot Data, 1700-1988CSV DOC
            datasetssunspotsMonthly Sunspot Numbers, 1749-1983CSV DOC
            datasetsswissSwiss Fertility and Socioeconomic Indicators (1888) DataCSV DOC
            datasetstreeringYearly Treering Data, -6000-1979CSV DOC
            datasetstreesGirth, Height and Volume for Black Cherry TreesCSV DOC
            datasetsuspopPopulations Recorded by the US CensusCSV DOC
            datasetsvolcanoTopographic Information on Auckland's Maunga Whau VolcanoCSV DOC
            datasetswarpbreaksThe Number of Breaks in Yarn during WeavingCSV DOC
            datasetswomenAverage Heights and Weights for American WomenCSV DOC
            bootacmeMonthly Excess ReturnsCSV DOC
            bootaidsDelay in AIDS Reporting in England and WalesCSV DOC
            bootairconditFailures of Air-conditioning EquipmentCSV DOC
            bootaircondit7Failures of Air-conditioning EquipmentCSV DOC
            bootamisCar Speeding and Warning SignsCSV DOC
            bootamlRemission Times for Acute Myelogenous LeukaemiaCSV DOC
            bootbigcityPopulation of U.S. CitiesCSV DOC
            bootbramblesSpatial Location of Bramble CanesCSV DOC
            bootbreslowSmoking Deaths Among DoctorsCSV DOC
            bootcalciumCalcium Uptake DataCSV DOC
            bootcaneSugar-cane Disease DataCSV DOC
            bootcapabilitySimulated Manufacturing Process DataCSV DOC
            bootcatsMWeight Data for Domestic CatsCSV DOC
            bootcavPosition of Muscle CaveolaeCSV DOC
            bootcd4CD4 Counts for HIV-Positive PatientsCSV DOC
            bootchanningChanning House DataCSV DOC
            bootcityPopulation of U.S. CitiesCSV DOC
            bootclaridgeGenetic Links to Left-handednessCSV DOC
            bootclothNumber of Flaws in ClothCSV DOC
            bootco.transferCarbon Monoxide TransferCSV DOC
            bootcoalDates of Coal Mining DisastersCSV DOC
            bootdarwinDarwin's Plant Height DifferencesCSV DOC
            bootdogsCardiac Data for Domestic DogsCSV DOC
            bootdowns.bcIncidence of Down's Syndrome in British ColumbiaCSV DOC
            bootducksBehavioral and Plumage Characteristics of Hybrid DucksCSV DOC
            bootfirCounts of Balsam-fir SeedlingsCSV DOC
            bootfretsHead Dimensions in BrothersCSV DOC
            bootgravAcceleration Due to GravityCSV DOC
            bootgravityAcceleration Due to GravityCSV DOC
            boothiroseFailure Time of PET FilmCSV DOC
            bootislayJura Quartzite Azimuths on IslayCSV DOC
            bootmanausAverage Heights of the Rio Negro river at ManausCSV DOC
            bootmelanomaSurvival from Malignant MelanomaCSV DOC
            bootmotorData from a Simulated Motorcycle AccidentCSV DOC
            bootneuroNeurophysiological Point Process DataCSV DOC
            bootnitrofenToxicity of Nitrofen in Aquatic SystemsCSV DOC
            bootnodalNodal Involvement in Prostate CancerCSV DOC
            bootnuclearNuclear Power Station Construction DataCSV DOC
            bootpaulsenNeurotransmission in Guinea Pig BrainsCSV DOC
            bootpoisonsAnimal Survival TimesCSV DOC
            bootpolarPole Positions of New Caledonian LateritesCSV DOC
            bootremissionCancer Remission and Cell ActivityCSV DOC
            bootsalinityWater Salinity and River DischargeCSV DOC
            bootsurvivalSurvival of Rats after Radiation DosesCSV DOC
            boottauTau Particle Decay ModesCSV DOC
            boottunaTuna Sighting DataCSV DOC
            booturineUrine Analysis DataCSV DOC
            bootwoolAustralian Relative Wool PricesCSV DOC
            KMsurvaidsdata from Section 1.19CSV DOC
            KMsurvalloautodata from Section 1.9CSV DOC
            KMsurvallograftdata from Exercise 13.1, p418CSV DOC
            KMsurvaztdata from Exercise 4.7, p122CSV DOC
            KMsurvbaboondata from Exercise 5.8, p147CSV DOC
            KMsurvbcdeterdata from Section 1.18CSV DOC
            KMsurvbfeeddata from Section 1.14CSV DOC
            KMsurvbmtdata from Section 1.3CSV DOC
            KMsurvbnctdata from Exercise 7.7, p223CSV DOC
            KMsurvbtrialdata from Section 1.5CSV DOC
            KMsurvburndata from Section 1.6CSV DOC
            KMsurvchanningdata from Section 1.16CSV DOC
            KMsurvdrug6mpdata from Section 1.2CSV DOC
            KMsurvdrughivdata from Exercise 7.6, p222CSV DOC
            KMsurvhodgdata from Section 1.10CSV DOC
            KMsurvkidneydata from Section 1.4CSV DOC
            KMsurvkidrecurrData on 38 individuals using a kidney dialysis machineCSV DOC
            KMsurvkidtrandata from Section 1.7CSV DOC
            KMsurvlarynxdata from Section 1.8CSV DOC
            KMsurvlungdata from Exercise 4.4, p120CSV DOC
            KMsurvpneumondata from Section 1.13CSV DOC
            KMsurvpsychdata from Section 1.15CSV DOC
            KMsurvratsdata from Exercise 7.13, p225CSV DOC
            KMsurvstddata from Section 1.12CSV DOC
            KMsurvstddiagdata from Exercise 5.6, p146CSV DOC
            KMsurvtonguedata from Section 1.11CSV DOC
            KMsurvtwinsdata from Exercise 7.14, p225CSV DOC
            robustbaseAnimals2Brain and Body Weights for 65 Species of Land AnimalsCSV DOC
            robustbaseCrohnDCrohn's Disease Adverse Events DataCSV DOC
            robustbaseNOxEmissionsNOx Air Pollution DataCSV DOC
            robustbaseSiegelsExSiegel's Exact Fit Example DataCSV DOC
            robustbaseaircraftAircraft DataCSV DOC
            robustbaseairmayAir Quality DataCSV DOC
            robustbasealcoholAlcohol Solubility in Water DataCSV DOC
            robustbaseambientNOxCHDaily Means of NOx (mono-nitrogen oxides) in airCSV DOC
            robustbasebiomassTillBiomass Tillage DataCSV DOC
            robustbasebushfireCampbell Bushfire DataCSV DOC
            robustbasecarrotsInsect Damages on CarrotsCSV DOC
            robustbasecloudCloud point of a LiquidCSV DOC
            robustbasecolemanColeman Data SetCSV DOC
            robustbasecondrozCondroz DataCSV DOC
            robustbasecushnyCushny and Peebles Prolongation of Sleep DataCSV DOC
            robustbasedeliveryDelivery Time DataCSV DOC
            robustbaseeducationEducation Expenditure DataCSV DOC
            robustbaseepilepsyEpilepsy Attacks Data SetCSV DOC
            robustbaseexAMExample Data of Antille and May - for Simple RegressionCSV DOC
            robustbasefoodstampFood Stamp Program ParticipationCSV DOC
            robustbasehbkHawkins, Bradu, Kass's Artificial DataCSV DOC
            robustbaseheartHeart Catherization DataCSV DOC
            robustbasekootenayWaterflow Measurements of Kootenay River in Libby and NewgateCSV DOC
            robustbaselacticLactic Acid Concentration Measurement DataCSV DOC
            robustbasemilkDaudin's Milk Composition DataCSV DOC
            robustbasepensionPension Funds DataCSV DOC
            robustbasephosphorPhosphorus Content DataCSV DOC
            robustbasepilotPilot-Plant DataCSV DOC
            robustbasepossumDivPossum Diversity DataCSV DOC
            robustbasepulpfiberPulp Fiber and Paper DataCSV DOC
            robustbaseradarImageSatellite Radar Image Data from near MunichCSV DOC
            robustbasesalinitySalinity DataCSV DOC
            robustbasestarsCYGHertzsprung-Russell Diagram Data of Star Cluster CYG OB1CSV DOC
            robustbasetelefNumber of International Calls from BelgiumCSV DOC
            robustbasetoxicityToxicity of Carboxylic Acids DataCSV DOC
            robustbasevasoVaso Constriction Skin Data SetCSV DOC
            robustbasewagnerGrowthWagner's Hannover Employment Growth DataCSV DOC
            robustbasewoodModified Data on Wood Specific GravityCSV DOC
            carAMSsurveyAmerican Math Society Survey DataCSV DOC
            carAdlerExperimenter ExpectationsCSV DOC
            carAngellMoral Integration of American CitiesCSV DOC
            carAnscombeU. S. State Public-School ExpendituresCSV DOC
            carBaumannMethods of Teaching Reading ComprehensionCSV DOC
            carBfoxCanadian Women's Labour-Force ParticipationCSV DOC
            carBlackmoreExercise Histories of Eating-Disordered and Control SubjectsCSV DOC
            carBurtFraudulent Data on IQs of Twins Raised ApartCSV DOC
            carCanPopCanadian Population DataCSV DOC
            carChileVoting Intentions in the 1988 Chilean PlebisciteCSV DOC
            carChirotThe 1907 Romanian Peasant RebellionCSV DOC
            carCowlesCowles and Davis's Data on VolunteeringCSV DOC
            carDavisSelf-Reports of Height and WeightCSV DOC
            carDavisThinDavis's Data on Drive for ThinnessCSV DOC
            carDepredationsMinnesota Wolf Depredation DataCSV DOC
            carDuncanDuncan's Occupational Prestige DataCSV DOC
            carEricksenThe 1980 U.S. Census UndercountCSV DOC
            carFloridaFlorida County VotingCSV DOC
            carFreedmanCrowding and Crime in U. S. Metropolitan AreasCSV DOC
            carFriendlyFormat Effects on RecallCSV DOC
            carGinzbergData on DepressionCSV DOC
            carGreeneRefugee AppealsCSV DOC
            carGuyerAnonymity and CooperationCSV DOC
            carHartnagelCanadian Crime-Rates Time SeriesCSV DOC
            carHighway1Highway AccidentsCSV DOC
            carKosteckiDillonTreatment of Migraine HeadachesCSV DOC
            carLeinhardtData on Infant-MortalityCSV DOC
            carLoBDCancer drug data use to provide an example of the use of the skew power distributions.CSV DOC
            carMandelContrived Collinear DataCSV DOC
            carMigrationCanadian Interprovincial Migration DataCSV DOC
            carMooreStatus, Authoritarianism, and ConformityCSV DOC
            carMrozU.S. Women's Labor-Force ParticipationCSV DOC
            carOBrienKaiserO'Brien and Kaiser's Repeated-Measures DataCSV DOC
            carOrnsteinInterlocking Directorates Among Major Canadian FirmsCSV DOC
            carPotteryChemical Composition of PotteryCSV DOC
            carPrestigePrestige of Canadian OccupationsCSV DOC
            carQuartetFour Regression DatasetsCSV DOC
            carRobeyFertility and ContraceptionCSV DOC
            carSLIDSurvey of Labour and Income DynamicsCSV DOC
            carSahlinsAgricultural Production in Mazulu VillageCSV DOC
            carSalariesSalaries for ProfessorsCSV DOC
            carSoilsSoil Compositions of Physical and Chemical CharacteristicsCSV DOC
            carStatesEducation and Related Statistics for the U.S. StatesCSV DOC
            carTransactTransaction dataCSV DOC
            carUNGDP and Infant MortalityCSV DOC
            carUSPopPopulation of the United StatesCSV DOC
            carVocabVocabulary and EducationCSV DOC
            carWeightLossWeight Loss DataCSV DOC
            carWomenlfCanadian Women's Labour-Force ParticipationCSV DOC
            carWongPost-Coma Recovery of IQCSV DOC
            carWoolWool dataCSV DOC
            clusteragricultureEuropean Union Agricultural WorkforcesCSV DOC
            clusteranimalsAttributes of AnimalsCSV DOC
            clusterchorSubSubset of C-horizon of Kola DataCSV DOC
            clusterflowerFlower CharacteristicsCSV DOC
            clusterplantTraitsPlant Species Traits DataCSV DOC
            clusterplutonIsotopic Composition Plutonium BatchesCSV DOC
            clusterruspiniRuspini DataCSV DOC
            clustervotes.repubVotes for Republican Candidate in Presidential ElectionsCSV DOC
            clusterxclaraBivariate Data Set with 3 ClustersCSV DOC
            COUNTaffairsaffairsCSV DOC
            COUNTazcabgptcaazcabgptcaCSV DOC
            COUNTazdrg112azdrg112CSV DOC
            COUNTazproazproCSV DOC
            COUNTazprocedureazprocedureCSV DOC
            COUNTbadhealthbadhealthCSV DOC
            COUNTfasttrakgfasttrakgCSV DOC
            COUNTfishingfishingCSV DOC
            COUNTlbwlbwCSV DOC
            COUNTlbwgrplbwgrpCSV DOC
            COUNTloomisloomisCSV DOC
            COUNTmdvismdvisCSV DOC
            COUNTmedparmedparCSV DOC
            COUNTnutsnutsCSV DOC
            COUNTrwmrwmCSV DOC
            COUNTrwm1984rwm1984CSV DOC
            COUNTrwm5yrrwm5yrCSV DOC
            COUNTshipsshipsCSV DOC
            COUNTsmokingsmokingCSV DOC
            COUNTtitanictitanicCSV DOC
            COUNTtitanicgrptitanicgrpCSV DOC
            EcdatAccidentShip AccidentsCSV DOC
            EcdatAirlineCost for U.S. AirlinesCSV DOC
            EcdatAirqAir Quality for Californian Metropolitan AreasCSV DOC
            EcdatBenefitsUnemployement of Blue Collar WorkersCSV DOC
            EcdatBidsBids Received By U.S. FirmsCSV DOC
            EcdatBudgetFoodBudget Share of Food for Spanish HouseholdsCSV DOC
            EcdatBudgetItalyBudget Shares for Italian HouseholdsCSV DOC
            EcdatBudgetUKBudget Shares of British HouseholdsCSV DOC
            EcdatBwagesWages in BelgiumCSV DOC
            EcdatCPSch3Earnings from the Current Population SurveyCSV DOC
            EcdatCRANpackagesGrowth of CRANCSV DOC
            EcdatCapmStock Market DataCSV DOC
            EcdatCarStated Preferences for Car ChoiceCSV DOC
            EcdatCaschoolThe California Test Score Data SetCSV DOC
            EcdatCatsupChoice of Brand for CatsupCSV DOC
            EcdatCigarCigarette ConsumptionCSV DOC
            EcdatCigaretteThe Cigarette Consumption Panel Data SetCSV DOC
            EcdatClothingSales Data of Men's Fashion StoresCSV DOC
            EcdatComputersPrices of Personal ComputersCSV DOC
            EcdatCrackerChoice of Brand for CrakersCSV DOC
            EcdatCrimeCrime in North CarolinaCSV DOC
            EcdatDMDM Dollar Exchange RateCSV DOC
            EcdatDiamondPricing the C's of Diamond StonesCSV DOC
            EcdatDoctorNumber of Doctor VisitsCSV DOC
            EcdatDoctorAUSDoctor Visits in AustraliaCSV DOC
            EcdatDoctorContactsContacts With Medical DoctorCSV DOC
            EcdatEarningsEarnings for Three Age GroupsCSV DOC
            EcdatElectricityCost Function for Electricity ProducersCSV DOC
            EcdatFairExtramarital Affairs DataCSV DOC
            EcdatFatalityDrunk Driving Laws and Traffic DeathsCSV DOC
            EcdatFishingChoice of Fishing ModeCSV DOC
            EcdatForwardExchange Rates of US Dollar Against Other CurrenciesCSV DOC
            EcdatFriendFoeData from the Television Game Show Friend Or Foe ?CSV DOC
            EcdatGarchDaily Observations on Exchange Rates of the US Dollar Against Other CurrenciesCSV DOC
            EcdatGasolineGasoline ConsumptionCSV DOC
            EcdatGrilichesWage DatasCSV DOC
            EcdatGrunfeldGrunfeld Investment DataCSV DOC
            EcdatHCHeating and Cooling System Choice in Newly Built Houses in CaliforniaCSV DOC
            EcdatHHSCyberSecurityBreachesCybersecurity breaches reported to the US Department of Health and Human ServicesCSV DOC
            EcdatHIHealth Insurance and Hours Worked By WivesCSV DOC
            EcdatHdmaThe Boston HDMA Data SetCSV DOC
            EcdatHeatingHeating System Choice in California HousesCSV DOC
            EcdatHedonicHedonic Prices of Cencus Tracts in BostonCSV DOC
            EcdatHousingSales Prices of Houses in the City of WindsorCSV DOC
            EcdatIcecreamIce Cream ConsumptionCSV DOC
            EcdatJournalsEconomic Journals Dat SetCSV DOC
            EcdatKakaduWillingness to Pay for the Preservation of the Kakadu National ParkCSV DOC
            EcdatKetchupChoice of Brand for KetchupCSV DOC
            EcdatKleinKlein's Model ICSV DOC
            EcdatLaborSupplyWages and Hours WorkedCSV DOC
            EcdatLabourBelgian FirmsCSV DOC
            EcdatMCASThe Massashusets Test Score Data SetCSV DOC
            EcdatMalesWages and Education of Young MalesCSV DOC
            EcdatMathlevelLevel of Calculus Attained for Students Taking Advanced Micro-economicsCSV DOC
            EcdatMedExpStructure of Demand for Medical CareCSV DOC
            EcdatMetalProduction for SIC 33CSV DOC
            EcdatModeMode ChoiceCSV DOC
            EcdatModeChoiceData to Study Travel Mode ChoiceCSV DOC
            EcdatMofaInternational Expansion of U.S. Mofa's (majority-owned Foreign Affiliates in Fire (finance, Insurance and Real Estate)CSV DOC
            EcdatMrozLabor Supply DataCSV DOC
            EcdatMunExpMunicipal Expenditure DataCSV DOC
            EcdatNaturalParkWillingness to Pay for the Preservation of the Alentejo Natural ParkCSV DOC
            EcdatNerloveCost Function for Electricity Producers, 1955CSV DOC
            EcdatOFPVisits to Physician OfficeCSV DOC
            EcdatOilOil InvestmentCSV DOC
            EcdatPSIDPanel Survey of Income DynamicsCSV DOC
            EcdatParticipationLabor Force ParticipationCSV DOC
            EcdatPatentsHGHDynamic Relation Between Patents and R&DCSV DOC
            EcdatPatentsRDPatents, R&D and Technological Spillovers for a Panel of FirmsCSV DOC
            EcdatPoundPound-dollar Exchange RateCSV DOC
            EcdatProducUs States ProductionCSV DOC
            EcdatRetSchoolReturn to SchoolingCSV DOC
            EcdatSP500Returns on Standard & Poor's 500 IndexCSV DOC
            EcdatSchoolingWages and SchoolingCSV DOC
            EcdatSomervilleVisits to Lake SomervilleCSV DOC
            EcdatStarEffects on Learning of Small Class SizesCSV DOC
            EcdatStrikeStrike Duration DataCSV DOC
            EcdatStrikeDurStrikes DurationCSV DOC
            EcdatStrikeNbNumber of Strikes in Us ManufacturingCSV DOC
            EcdatSumHesThe Penn TableCSV DOC
            EcdatTobaccoHouseholds Tobacco Budget ShareCSV DOC
            EcdatTrainStated Preferences for Train TravelingCSV DOC
            EcdatTranspEqStatewide Data on Transportation Equipment ManufacturingCSV DOC
            EcdatTreatmentEvaluating Treatment Effect of Training on EarningsCSV DOC
            EcdatTunaChoice of Brand for TunaCSV DOC
            EcdatUSFinanceIndustryUS Finance Industry ProfitsCSV DOC
            EcdatUSclassifiedDocumentsOfficial Secrecy of the United States GovernmentCSV DOC
            EcdatUSstateAbbreviationsStandard abbreviations for states of the United StatesCSV DOC
            EcdatUStaxWordsNumber of Words in US Tax LawCSV DOC
            EcdatUnempDurUnemployment DurationCSV DOC
            EcdatUnemploymentUnemployment DurationCSV DOC
            EcdatUniversityProvision of University Teaching and ResearchCSV DOC
            EcdatVietNamHMedical Expenses in Viet-nam (household Level)CSV DOC
            EcdatVietNamIMedical Expenses in Viet-nam (individual Level)CSV DOC
            EcdatWagesPanel Datas of Individual WagesCSV DOC
            EcdatWages1Wages, Experience and SchoolingCSV DOC
            EcdatWorkinghoursWife Working HoursCSV DOC
            EcdatYenYen-dollar Exchange RateCSV DOC
            EcdatYogurtChoice of Brand for YogurtsCSV DOC
            EcdatbankingCrisesCountries in Banking CrisesCSV DOC
            EcdatbreachesCyber Security BreachesCSV DOC
            EcdatincomeInequalityIncome Inequality in the USCSV DOC
            EcdatnonEnglishNamesNames with Character Set ProblemsCSV DOC
            EcdatpoliticalKnowledgePolitical knowledge in the US and EuropeCSV DOC
            gapPDA study of Parkinson's disease and APOE, LRRK2, SNCA makersCSV DOC
            gapaldh2ALDH2 markers and AlcoholismCSV DOC
            gapapoeapocAPOE/APOC1 markers and Alzheimer'sCSV DOC
            gapcfCystic fibrosis dataCSV DOC
            gapcrohnCrohn's disease dataCSV DOC
            gapfaFriedreich Ataxia dataCSV DOC
            gapfsnpsA case-control data involving four SNPs with missing genotypeCSV DOC
            gaphlaThe HLA dataCSV DOC
            gaphr1420An example data for Manhattan plot with annotationCSV DOC
            gapl51An example pedigree dataCSV DOC
            gaplukasAn example pedigreeCSV DOC
            gapmaoA study of Parkinson's disease and MAO geneCSV DOC
            gapmeyerA pedigree data on 282 animals deriving from two generationsCSV DOC
            gapmfblongExample data for ACEnucfamCSV DOC
            gapmhtdataAn example data for Manhattan plotCSV DOC
            gapnep499A study of Alzheimer's disease with eight SNPs and APOECSV DOC
            ggplot2luv_colours'colors()' in Luv space.CSV DOC
            HistDataArbuthnotArbuthnot's data on male and female birth ratios in London from 1629-1710.CSV DOC
            HistDataArmadaLa Felicisima ArmadaCSV DOC
            HistDataBowleyBowley's data on values of British and Irish trade, 1855-1899CSV DOC
            HistDataCavendishCavendish's Determinations of the Density of the EarthCSV DOC
            HistDataChestSizesChest measurements of 5738 Scottish MilitiamenCSV DOC
            HistDataCushnyPeeblesCushny-Peebles Data: Soporific Effects of Scopolamine DerivativesCSV DOC
            HistDataCushnyPeeblesNCushny-Peebles Data: Soporific Effects of Scopolamine DerivativesCSV DOC
            HistDataDactylEdgeworth's counts of dactyls in Virgil's AeneidCSV DOC
            HistDataDrinksWagesElderton and Pearson's (1910) data on drinking and wagesCSV DOC
            HistDataFingerprintsWaite's data on Patterns in FingerprintsCSV DOC
            HistDataGaltonGalton's data on the heights of parents and their childrenCSV DOC
            HistDataGaltonFamiliesGalton's data on the heights of parents and their children, by childCSV DOC
            HistDataGuerryData from A.-M. Guerry, "Essay on the Moral Statistics of France"CSV DOC
            HistDataJevonsW. Stanley Jevons' data on numerical discriminationCSV DOC
            HistDataLangren.allvan Langren's Data on Longitude Distance between Toledo and RomeCSV DOC
            HistDataLangren1644van Langren's Data on Longitude Distance between Toledo and RomeCSV DOC
            HistDataMacdonellMacdonell's Data on Height and Finger Length of Criminals, used by Gosset (1908)CSV DOC
            HistDataMacdonellDFMacdonell's Data on Height and Finger Length of Criminals, used by Gosset (1908)CSV DOC
            HistDataMichelsonMichelson's Determinations of the Velocity of LightCSV DOC
            HistDataMichelsonSetsMichelson's Determinations of the Velocity of LightCSV DOC
            HistDataMinard.citiesData from Minard's famous graphic map of Napoleon's march on MoscowCSV DOC
            HistDataMinard.tempData from Minard's famous graphic map of Napoleon's march on MoscowCSV DOC
            HistDataMinard.troopsData from Minard's famous graphic map of Napoleon's march on MoscowCSV DOC
            HistDataNightingaleFlorence Nightingale's data on deaths from various causes in the Crimean WarCSV DOC
            HistDataOldMapsLatitudes and Longitudes of 39 Points in 11 Old MapsCSV DOC
            HistDataPearsonLeePearson and Lee's data on the heights of parents and children classified by genderCSV DOC
            HistDataPolioTrialsPolio Field Trials DataCSV DOC
            HistDataProstitutesParent-Duchatelet's time-series data on the number of prostitutes in ParisCSV DOC
            HistDataPyxTrial of the PyxCSV DOC
            HistDataQuarrelsStatistics of Deadly QuarrelsCSV DOC
            HistDataSnow.deathsJohn Snow's map and data on the 1854 London Cholera outbreakCSV DOC
            HistDataSnow.deaths2John Snow's map and data on the 1854 London Cholera outbreakCSV DOC
            HistDataSnow.polygonsJohn Snow's map and data on the 1854 London Cholera outbreakCSV DOC
            HistDataSnow.pumpsJohn Snow's map and data on the 1854 London Cholera outbreakCSV DOC
            HistDataSnow.streetsJohn Snow's map and data on the 1854 London Cholera outbreakCSV DOC
            HistDataWheatPlayfair's Data on Wages and the Price of WheatCSV DOC
            HistDataWheat.monarchsPlayfair's Data on Wages and the Price of WheatCSV DOC
            HistDataYeastStudent's (1906) Yeast Cell CountsCSV DOC
            HistDataYeastD.matStudent's (1906) Yeast Cell CountsCSV DOC
            HistDataZeaMaysDarwin's Heights of Cross- and Self-fertilized Zea May PairsCSV DOC
            latticebarleyYield data from a Minnesota barley trialCSV DOC
            latticeenvironmentalAtmospheric environmental conditions in New York CityCSV DOC
            latticeethanolEngine exhaust fumes from burning ethanolCSV DOC
            latticemelanomaMelanoma skin cancer incidenceCSV DOC
            latticesingerHeights of New York Choral Society singersCSV DOC
            MASSAids2Australian AIDS Survival DataCSV DOC
            MASSAnimalsBrain and Body Weights for 28 SpeciesCSV DOC
            MASSBostonHousing Values in Suburbs of BostonCSV DOC
            MASSCars93Data from 93 Cars on Sale in the USA in 1993CSV DOC
            MASSCushingsDiagnostic Tests on Patients with Cushing's SyndromeCSV DOC
            MASSDDTDDT in KaleCSV DOC
            MASSGAGurineLevel of GAG in Urine of ChildrenCSV DOC
            MASSInsuranceNumbers of Car Insurance claimsCSV DOC
            MASSMelanomaSurvival from Malignant MelanomaCSV DOC
            MASSOMETests of Auditory Perception in Children with OMECSV DOC
            MASSPima.teDiabetes in Pima Indian WomenCSV DOC
            MASSPima.trDiabetes in Pima Indian WomenCSV DOC
            MASSPima.tr2Diabetes in Pima Indian WomenCSV DOC
            MASSRabbitBlood Pressure in RabbitsCSV DOC
            MASSRubberAccelerated Testing of Tyre RubberCSV DOC
            MASSSP500Returns of the Standard and Poors 500CSV DOC
            MASSSitkaGrowth Curves for Sitka Spruce Trees in 1988CSV DOC
            MASSSitka89Growth Curves for Sitka Spruce Trees in 1989CSV DOC
            MASSSkyeAFM Compositions of Aphyric Skye LavasCSV DOC
            MASSTrafficEffect of Swedish Speed Limits on AccidentsCSV DOC
            MASSUScerealNutritional and Marketing Information on US CerealsCSV DOC
            MASSUScrimeThe Effect of Punishment Regimes on Crime RatesCSV DOC
            MASSVAVeteran's Administration Lung Cancer TrialCSV DOC
            MASSabbeyDeterminations of Nickel ContentCSV DOC
            MASSaccdeathsAccidental Deaths in the US 1973-1978CSV DOC
            MASSanorexiaAnorexia Data on Weight ChangeCSV DOC
            MASSbacteriaPresence of Bacteria after Drug TreatmentsCSV DOC
            MASSbeav1Body Temperature Series of Beaver 1CSV DOC
            MASSbeav2Body Temperature Series of Beaver 2CSV DOC
            MASSbiopsyBiopsy Data on Breast Cancer PatientsCSV DOC
            MASSbirthwtRisk Factors Associated with Low Infant Birth WeightCSV DOC
            MASScabbagesData from a cabbage field trialCSV DOC
            MASScaithColours of Eyes and Hair of People in CaithnessCSV DOC
            MASScatsAnatomical Data from Domestic CatsCSV DOC
            MASScementHeat Evolved by Setting CementsCSV DOC
            MASSchemCopper in Wholemeal FlourCSV DOC
            MASScoopCo-operative Trial in Analytical ChemistryCSV DOC
            MASScpusPerformance of Computer CPUsCSV DOC
            MASScrabsMorphological Measurements on Leptograpsus CrabsCSV DOC
            MASSdeathsMonthly Deaths from Lung Diseases in the UKCSV DOC
            MASSdriversDeaths of Car Drivers in Great Britain 1969-84CSV DOC
            MASSeaglesForaging Ecology of Bald EaglesCSV DOC
            MASSepilSeizure Counts for EpilepticsCSV DOC
            MASSfarmsEcological Factors in Farm ManagementCSV DOC
            MASSfglMeasurements of Forensic Glass FragmentsCSV DOC
            MASSforbesForbes' Data on Boiling Points in the AlpsCSV DOC
            MASSgalaxiesVelocities for 82 GalaxiesCSV DOC
            MASSgehanRemission Times of Leukaemia PatientsCSV DOC
            MASSgenotypeRat Genotype DataCSV DOC
            MASSgeyserOld Faithful Geyser DataCSV DOC
            MASSgilgaisLine Transect of Soil in Gilgai TerritoryCSV DOC
            MASShillsRecord Times in Scottish Hill RacesCSV DOC
            MASShousingFrequency Table from a Copenhagen Housing Conditions SurveyCSV DOC
            MASSimmerYields from a Barley Field TrialCSV DOC
            MASSleukSurvival Times and White Blood Counts for Leukaemia PatientsCSV DOC
            MASSmammalsBrain and Body Weights for 62 Species of Land MammalsCSV DOC
            MASSmcycleData from a Simulated Motorcycle AccidentCSV DOC
            MASSmenarcheAge of Menarche in WarsawCSV DOC
            MASSmichelsonMichelson's Speed of Light DataCSV DOC
            MASSminn38Minnesota High School Graduates of 1938CSV DOC
            MASSmotorsAccelerated Life Testing of MotorettesCSV DOC
            MASSmuscleEffect of Calcium Chloride on Muscle Contraction in Rat HeartsCSV DOC
            MASSnewcombNewcomb's Measurements of the Passage Time of LightCSV DOC
            MASSnlschoolsEighth-Grade Pupils in the NetherlandsCSV DOC
            MASSnpkClassical N, P, K Factorial ExperimentCSV DOC
            MASSnpr1US Naval Petroleum Reserve No. 1 dataCSV DOC
            MASSoatsData from an Oats Field TrialCSV DOC
            MASSpaintersThe Painter's Data of de PilesCSV DOC
            MASSpetrolN. L. Prater's Petrol Refinery DataCSV DOC
            MASSquineAbsenteeism from School in Rural New South WalesCSV DOC
            MASSroadRoad Accident Deaths in US StatesCSV DOC
            MASSrotiferNumbers of Rotifers by Fluid DensityCSV DOC
            MASSshipsShips Damage DataCSV DOC
            MASSshrimpPercentage of Shrimp in Shrimp CocktailCSV DOC
            MASSshuttleSpace Shuttle Autolander ProblemCSV DOC
            MASSsnailsSnail Mortality DataCSV DOC
            MASSsteamThe Saturated Steam Pressure DataCSV DOC
            MASSstormerThe Stormer Viscometer DataCSV DOC
            MASSsurveyStudent Survey DataCSV DOC
            MASSsynth.teSynthetic Classification ProblemCSV DOC
            MASSsynth.trSynthetic Classification ProblemCSV DOC
            MASStopoSpatial Topographic DataCSV DOC
            MASSwadersCounts of Waders at 15 Sites in South AfricaCSV DOC
            MASSwhitesideHouse Insulation: Whiteside's DataCSV DOC
            MASSwtlossWeight Loss Data from an Obese PatientCSV DOC
            plmCigarCigarette ConsumptionCSV DOC
            plmCrimeCrime in North CarolinaCSV DOC
            plmEmplUKEmployment and Wages in the United KingdomCSV DOC
            plmGasolineGasoline ConsumptionCSV DOC
            plmGrunfeldGrunfeld's Investment DataCSV DOC
            plmHedonicHedonic Prices of Census Tracts in the Boston AreaCSV DOC
            plmLaborSupplyWages and Hours WorkedCSV DOC
            plmMalesWages and Education of Young MalesCSV DOC
            plmParityPurchasing Power Parity and other parity relationshipsCSV DOC
            plmProducUS States ProductionCSV DOC
            plmRiceFarmsProduction of Rice in IndiaCSV DOC
            plmSnmespEmployment and Wages in SpainCSV DOC
            plmSumHesThe Penn World Table, v. 5CSV DOC
            plmWagesPanel Data of Individual WagesCSV DOC
            plyrbaseballYearly batting records for all major league baseball playersCSV DOC
            psclAustralianElectionPollingPolitical opinion polls in Australia, 2004-07CSV DOC
            psclAustralianElectionselections to Australian House of Representatives, 1949-2007CSV DOC
            psclEfronMorrisBatting Averages for 18 major league baseball players, 1970CSV DOC
            psclRockTheVoteVoter turnout experiment, using Rock The Vote adsCSV DOC
            psclUKHouseOfCommons1992 United Kingdom electoral returnsCSV DOC
            psclabsenteeAbsentee and Machine Ballots in Pennsylvania State Senate RacesCSV DOC
            pscladmitApplications to a Political Science PhD ProgramCSV DOC
            psclbioChemistsarticle production by graduate students in biochemistry Ph.D. programsCSV DOC
            psclca2006California Congressional Districts in 2006CSV DOC
            pscliraqVoteU.S. Senate vote on the use of force against Iraq, 2002.CSV DOC
            psclpoliticalInformationInterviewer ratings of respondent levels of political informationCSV DOC
            psclpresidentialElectionselections for U.S. President, 1932-2012, by stateCSV DOC
            psclprussianPrussian army horse kick dataCSV DOC
            psclunionDensitycross national rates of trade union densityCSV DOC
            psclvote92Reports of voting in the 1992 U.S. Presidential election.CSV DOC
            reshape2french_friesSensory data from a french fries experiment.CSV DOC
            reshape2smithsDemo data describing the Smiths.CSV DOC
            reshape2tipsTipping dataCSV DOC
            rpartcar.test.frameAutomobile Data from 'Consumer Reports' 1990CSV DOC
            rpartcar90Automobile Data from 'Consumer Reports' 1990CSV DOC
            rpartcu.summaryAutomobile Data from 'Consumer Reports' 1990CSV DOC
            rpartkyphosisData on Children who have had Corrective Spinal SurgeryCSV DOC
            rpartsolderSoldering of Components on Printed-Circuit BoardsCSV DOC
            rpartstagecStage C Prostate CancerCSV DOC
            sandwichPublicSchoolsUS Expenditures for Public SchoolsCSV DOC
            semBollenBollen's Data on Industrialization and Political DemocracyCSV DOC
            semCNESVariables from the 1997 Canadian National Election StudyCSV DOC
            semKleinKlein's Data on the U. S. EconomyCSV DOC
            semKmentaPartly Artificial Data on the U. S. EconomyCSV DOC
            semTestsSix Mental TestsCSV DOC
            survivalbladderBladder Cancer RecurrencesCSV DOC
            survivalcancerNCCTG Lung Cancer DataCSV DOC
            survivalcgdChronic Granulotomous Disease dataCSV DOC
            survivalcolonChemotherapy for Stage B/C colon cancerCSV DOC
            survivalflchainAssay of serum free light chain for 7874 subjects.CSV DOC
            survivalgenfanGenerator fansCSV DOC
            survivalheartStanford Heart Transplant dataCSV DOC
            survivalkidneyKidney catheter dataCSV DOC
            survivalleukemiaAcute Myelogenous Leukemia survival dataCSV DOC
            survivalloganData from the 1972-78 GSS data used by LoganCSV DOC
            survivallungNCCTG Lung Cancer DataCSV DOC
            survivalmgusMonoclonal gammapothy dataCSV DOC
            survivalmgus2Monoclonal gammapothy dataCSV DOC
            survivalnwtcoData from the National Wilm's Tumor StudyCSV DOC
            survivalovarianOvarian Cancer Survival DataCSV DOC
            survivalpbcMayo Clinic Primary Biliary Cirrhosis DataCSV DOC
            survivalratsRat treatment data from Mantel et alCSV DOC
            survivalretinopathyDiabetic RetinopathyCSV DOC
            survivalstanford2More Stanford Heart Transplant dataCSV DOC
            survivaltobinTobin's Tobit dataCSV DOC
            survivaltransplantLiver transplant waiting listCSV DOC
            survivalveteranVeterans' Administration Lung Cancer studyCSV DOC
            vcdArthritisArthritis Treatment DataCSV DOC
            vcdBaseballBaseball DataCSV DOC
            vcdBrokenMarriageBroken Marriage DataCSV DOC
            vcdBundesligaErgebnisse der Fussball-BundesligaCSV DOC
            vcdBundestag2005Votes in German Bundestag Election 2005CSV DOC
            vcdButterflyButterfly Species in MalayaCSV DOC
            vcdCoalMinersBreathlessness and Wheeze in Coal MinersCSV DOC
            vcdDanishWelfareDanish Welfare Study DataCSV DOC
            vcdEmploymentEmployment StatusCSV DOC
            vcdFederalist'May' in Federalist PapersCSV DOC
            vcdHittersHitters DataCSV DOC
            vcdHorseKicksDeath by Horse KicksCSV DOC
            vcdHospitalHospital dataCSV DOC
            vcdJobSatisfactionJob Satisfaction DataCSV DOC
            vcdJointSportsOpinions About Joint SportsCSV DOC
            vcdLifeboatsLifeboats on the TitanicCSV DOC
            vcdNonResponseNon-Response Survey DataCSV DOC
            vcdOvaryCancerOvary Cancer DataCSV DOC
            vcdPreSexPre-marital Sex and DivorceCSV DOC
            vcdPunishmentCorporal Punishment DataCSV DOC
            vcdRepVictRepeat Victimization DataCSV DOC
            vcdSaxonyFamilies in SaxonyCSV DOC
            vcdSexualFunSex is FunCSV DOC
            vcdSpaceShuttleSpace Shuttle O-ring FailuresCSV DOC
            vcdSuicideSuicide Rates in GermanyCSV DOC
            vcdTrucksTruck Accidents DataCSV DOC
            vcdUKSoccerUK Soccer ScoresCSV DOC
            vcdVisualAcuityVisual Acuity in Left and Right EyesCSV DOC
            vcdVonBortVon Bortkiewicz Horse Kicks DataCSV DOC
            vcdWeldonDiceWeldon's Dice DataCSV DOC
            vcdWomenQueueWomen in QueuesCSV DOC
            ZeligMatchIt.urlTable of links for ZeligCSV DOC
            ZeligPEriskPolitical Economic Risk Data from 62 Countries in 1987CSV DOC
            ZeligSupremeCourtU.S. Supreme Court Vote MatrixCSV DOC
            ZeligWeimar1932 Weimar election dataCSV DOC
            ZeligZelig.urlTable of links for ZeligCSV DOC
            ZeligapprovalU.S. Presidential Approval DataCSV DOC
            ZeligbivariateSample data for bivariate probit regressionCSV DOC
            ZeligcoalitionCoalition Dissolution in Parliamentary DemocraciesCSV DOC
            Zeligcoalition2Coalition Dissolution in Parliamentary Democracies, Modified VersionCSV DOC
            ZeligeidatSimulation Data for Ecological InferenceCSV DOC
            Zeligfree1Freedom of Speech DataCSV DOC
            Zeligfree2Freedom of Speech DataCSV DOC
            ZeligfriendshipSimulated Example of Schoolchildren Friendship NetworkCSV DOC
            ZeliggrunfeldSimulation Data for model Seemingly Unrelated Regression (sur) that corresponds to method SUR of systemfitCSV DOC
            ZelighoffSocial Security Expenditure DataCSV DOC
            ZelighomerunSample Data on Home Runs Hit By Mark McGwire and Sammy Sosa in 1998.CSV DOC
            Zeligimmi1Individual Preferences Over Immigration PolicyCSV DOC
            Zeligimmi2Individual Preferences Over Immigration PolicyCSV DOC
            Zeligimmi3Individual Preferences Over Immigration PolicyCSV DOC
            Zeligimmi4Individual Preferences Over Immigration PolicyCSV DOC
            Zeligimmi5Individual Preferences Over Immigration PolicyCSV DOC
            ZeligimmigrationIndividual Preferences Over Immigration PolicyCSV DOC
            ZeligkleinSimulation Data for model Two-Stage Least Square (twosls) that corresponds to method 2SLS of systemfitCSV DOC
            ZeligkmentaSimulation Data for model Three-Stage Least Square (threesls) that corresponds to method 3SLS of systemfitCSV DOC
            ZeligmacroMacroeconomic DataCSV DOC
            ZeligmexicoVoting Data from the 1988 Mexican Presidental ElectionCSV DOC
            ZeligmidMilitarized Interstate DisputesCSV DOC
            ZelignewpaintersThe Discretized Painter's Data of de PilesCSV DOC
            ZeligsanctionMultilateral Economic SanctionsCSV DOC
            ZeligseatshareLeft Party Seat Share in 11 OECD CountriesCSV DOC
            Zeligsna.exSimulated Example of Social Network DataCSV DOC
            ZeligswissSwiss Fertility and Socioeconomic Indicators (1888) DataCSV DOC
            ZeligtobinTobin's Tobit DataCSV DOC
            ZeligturnoutTurnout Data Set from the National Election SurveyCSV DOC
            ZeligvoteincomeSample Turnout and Demographic Data from the 2000 Current Population SurveyCSV DOC
            HSAURBCGBCG Vaccine DataCSV DOC
            HSAURBtheBBeat the Blues DataCSV DOC
            HSAURCYGOB1CYG OB1 Star Cluster DataCSV DOC
            HSAURForbes2000The Forbes 2000 Ranking of the World's Biggest Companies (Year 2004)CSV DOC
            HSAURGHQGeneral Health QuestionnaireCSV DOC
            HSAURLanzaPrevention of Gastointestinal DamagesCSV DOC
            HSAURagefatTotal Body Composision DataCSV DOC
            HSAURaspirinAspirin DataCSV DOC
            HSAURbirthdeathratesBirth and Death Rates DataCSV DOC
            HSAURbladdercancerBladder Cancer DataCSV DOC
            HSAURcloudsCloud Seeding DataCSV DOC
            HSAURepilepsyEpilepsy DataCSV DOC
            HSAURfosterFoster Feeding ExperimentCSV DOC
            HSAURheptathlonOlympic Heptathlon Seoul 1988CSV DOC
            HSAURmastectomySurvival Times after Mastectomy of Breast Cancer PatientsCSV DOC
            HSAURmeteoMeteorological Measurements for 11 YearsCSV DOC
            HSAURorallesionsOral Lesions in Rural IndiaCSV DOC
            HSAURphosphatePhosphate Level DataCSV DOC
            HSAURpistonringsPiston Rings FailuresCSV DOC
            HSAURplanetsExoplanets DataCSV DOC
            HSAURplasmaBlood Screening DataCSV DOC
            HSAURpolypsFamilial Andenomatous PolyposisCSV DOC
            HSAURpolyps3Familial Andenomatous PolyposisCSV DOC
            HSAURpotteryRomano-British Pottery DataCSV DOC
            HSAURrearrestsRearrests of Juvenile FelonsCSV DOC
            HSAURrespiratoryRespiratory Illness DataCSV DOC
            HSAURroomwidthStudents Estimates of Lecture Room WidthCSV DOC
            HSAURschizophreniaAge of Onset of Schizophrenia DataCSV DOC
            HSAURschizophrenia2Schizophrenia DataCSV DOC
            HSAURschooldaysDays not Spent at SchoolCSV DOC
            HSAURskullsEgyptian SkullsCSV DOC
            HSAURsmokingNicotine Gum and Smoking CessationCSV DOC
            HSAURstudentsStudent Risk TakingCSV DOC
            HSAURsuicidesCrowd Baiting Behaviour and SuicidesCSV DOC
            HSAURtoothpasteToothpaste DataCSV DOC
            HSAURvotingHouse of Representatives Voting DataCSV DOC
            HSAURwaterMortality and Water HardnessCSV DOC
            HSAURwatervolesWater Voles DataCSV DOC
            HSAURwavesElectricity from Wave Power at SeaCSV DOC
            HSAURweightgainGain in Weight of RatsCSV DOC
            HSAURwomensroleWomens Role in SocietyCSV DOC
            psychBechtoldtSeven data sets showing a bifactor solution.CSV DOC
            psychBechtoldt.1Seven data sets showing a bifactor solution.CSV DOC
            psychBechtoldt.2Seven data sets showing a bifactor solution.CSV DOC
            psychDwyer8 cognitive variables used by Dwyer for an example.CSV DOC
            psychGleserExample data from Gleser, Cronbach and Rajaratnam (1965) to show basic principles of generalizability theory.CSV DOC
            psychGorsuchExample data set from Gorsuch (1997) for an example factor extension.CSV DOC
            psychHarman.55 socio-economic variables from Harman (1967)CSV DOC
            psychHarman.8Correlations of eight physical variables (from Harman, 1966)CSV DOC
            psychHarman.politicalEight political variables used by Harman (1967) as example 8.17CSV DOC
            psychHolzingerSeven data sets showing a bifactor solution.CSV DOC
            psychHolzinger.9Seven data sets showing a bifactor solution.CSV DOC
            psychReiseSeven data sets showing a bifactor solution.CSV DOC
            psychSchmid12 variables created by Schmid and Leiman to show the Schmid-Leiman TransformationCSV DOC
            psychThurstoneSeven data sets showing a bifactor solution.CSV DOC
            psychThurstone.33Seven data sets showing a bifactor solution.CSV DOC
            psychTucker9 Cognitive variables discussed by Tucker and Lewis (1973)CSV DOC
            psychability16 ability items scored as correct or incorrect.CSV DOC
            psychaffectTwo data sets of affect and arousal scores as a function of personality and movie conditionsCSV DOC
            psychbfi25 Personality items representing 5 factorsCSV DOC
            psychbfi.dictionary25 Personality items representing 5 factorsCSV DOC
            psychblotBond's Logical Operations Test - BLOTCSV DOC
            psychburt11 emotional variables from Burt (1915)CSV DOC
            psychcitiesDistances between 11 US citiesCSV DOC
            psychcubitsGalton's example of the relationship between height and 'cubit' or forearm lengthCSV DOC
            psychcushnyA data set from Cushny and Peebles (1905) on the effect of three drugs on hours of sleep, used by Student (1908)CSV DOC
            psychepiEysenck Personality Inventory (EPI) data for 3570 participantsCSV DOC
            psychepi.bfi13 personality scales from the Eysenck Personality Inventory and Big 5 inventoryCSV DOC
            psychepi.dictionaryEysenck Personality Inventory (EPI) data for 3570 participantsCSV DOC
            psychgaltonGalton's Mid parent child height dataCSV DOC
            psychheightsA data.frame of the Galton (1888) height and cubit data set.CSV DOC
            psychincomeUS family income from US census 2008CSV DOC
            psychiqitems16 multiple choice IQ itemsCSV DOC
            psychmsq75 mood items from the Motivational State Questionnaire for 3896 participantsCSV DOC
            psychneoNEO correlation matrix from the NEO_PI_R manualCSV DOC
            psychpeasGalton's PeasCSV DOC
            psychsat.act3 Measures of ability: SATV, SATQ, ACTCSV DOC
            psychwithinBetweenAn example of the distinction between within group and between group correlationsCSV DOC
            quantregBoscoBoscovich DataCSV DOC
            quantregCobarOreCobar Ore dataCSV DOC
            quantregMammalsGarland(1983) Data on Running Speed of MammalsCSV DOC
            quantregbarroBarro DataCSV DOC
            quantregengelEngel DataCSV DOC
            quantreggaspriceTime Series of US Gasoline PricesCSV DOC
            quantreguisUIS Drug Treatment study dataCSV DOC
            geepackdietoxGrowth curves of pigs in a 3x3 factorial experimentCSV DOC
            geepackkochOrdinal Data from KochCSV DOC
            geepackohioOhio Children Wheeze StatusCSV DOC
            geepackrespdisClustered Ordinal Respiratory DisorderCSV DOC
            geepackrespiratoryData from a clinical trial comparing two treatments for a respiratory illnessCSV DOC
            geepackseizureEpiliptic SeizuresCSV DOC
            geepacksitka89Growth of Sitka Spruce TreesCSV DOC
            geepackspruceLog-size of 79 Sitka spruce treesCSV DOC
            texmexliverLiver related laboratory dataCSV DOC
            texmexportpirieRain, wavesurge and portpirie datasets.CSV DOC
            texmexrainRain, wavesurge and portpirie datasets.CSV DOC
            texmexsummerAir pollution data, separately for summer and winter monthsCSV DOC
            texmexwavesurgeRain, wavesurge and portpirie datasets.CSV DOC
            texmexwinterAir pollution data, separately for summer and winter monthsCSV DOC
            multgeearthritisRheumatoid Arthritis Clinical TrialCSV DOC
            multgeehousingHomeless DataCSV DOC
            evirbmwDaily Log Returns on BMW Share PriceCSV DOC
            evirdanishDanish Fire Insurance ClaimsCSV DOC
            evirnidd.annualThe River Nidd DataCSV DOC
            evirnidd.threshThe River Nidd DataCSV DOC
            evirsiemensDaily Log Returns on Siemens Share PriceCSV DOC
            evirsp.rawSP Data to June 1993CSV DOC
            evirspto87SP Return Data to October 1987CSV DOC
            lme4ArabidopsisArabidopsis clipping/fertilization dataCSV DOC
            lme4DyestuffYield of dyestuff by batchCSV DOC
            lme4Dyestuff2Yield of dyestuff by batchCSV DOC
            lme4InstEvalUniversity Lecture/Instructor Evaluations by Students at ETHCSV DOC
            lme4PastesPaste strength by batch and caskCSV DOC
            lme4PenicillinVariation in penicillin testingCSV DOC
            lme4VerbAggVerbal Aggression item responsesCSV DOC
            lme4cakeBreakage Angle of Chocolate CakesCSV DOC
            lme4cbppContagious bovine pleuropneumoniaCSV DOC
            lme4grouseticksData on red grouse ticks from Elston et al. 2001CSV DOC
            lme4sleepstudyReaction times in a sleep deprivation studyCSV DOC
            mosaicDataAlcoholAlcohol Consumption per CapitaCSV DOC
            mosaicDataBirthdaysUS Births in 1969 - 1988CSV DOC
            mosaicDataBirthsUS BirthsCSV DOC
            mosaicDataBirths78US Births in 1978CSV DOC
            mosaicDataCPS85Data from the 1985 Current Population Survey (CPS85)CSV DOC
            mosaicDataCoolingWaterCoolingWaterCSV DOC
            mosaicDataCountriesCountriesCSV DOC
            mosaicDataDimesWeight of dimesCSV DOC
            mosaicDataGaltonGalton's dataset of parent and child heightsCSV DOC
            mosaicDataGestationData from the Child Health and Development StudiesCSV DOC
            mosaicDataGoosePermitsGoose Permit StudyCSV DOC
            mosaicDataHELPfullHealth Evaluation and Linkage to Primary CareCSV DOC
            mosaicDataHELPmissHealth Evaluation and Linkage to Primary CareCSV DOC
            mosaicDataHELPrctHealth Evaluation and Linkage to Primary CareCSV DOC
            mosaicDataHeatXData from a heat exchanger laboratoryCSV DOC
            mosaicDataKidsFeetFoot measurements in childrenCSV DOC
            mosaicDataMarriageMarriage recordsCSV DOC
            mosaicDataMitesMites and Wilt DiseaseCSV DOC
            mosaicDataRailTrailVolume of Users of a Rail TrailCSV DOC
            mosaicDataRidersVolume of Users of a Massachusetts Rail TrailCSV DOC
            mosaicDataSATState by State SAT dataCSV DOC
            mosaicDataSaratogaHousesHouses in Saratoga County (2006)CSV DOC
            mosaicDataSnowGRSnowfall data for Grand Rapids, MICSV DOC
            mosaicDataSwimRecords100 m Swimming World RecordsCSV DOC
            mosaicDataTenMileRaceCherry Blossom RaceCSV DOC
            mosaicDataUtilitiesUtility billsCSV DOC
            mosaicDataUtilities2Utility billsCSV DOC
            mosaicDataWhickhamData from the Whickham surveyCSV DOC
            ISLRAutoAuto Data SetCSV DOC
            ISLRCaravanThe Insurance Company (TIC) BenchmarkCSV DOC
            ISLRCarseatsSales of Child Car SeatsCSV DOC
            ISLRCollegeU.S. News and World Report's College DataCSV DOC
            ISLRDefaultCredit Card Default DataCSV DOC
            ISLRHittersBaseball DataCSV DOC
            ISLROJOrange Juice DataCSV DOC
            ISLRPortfolioPortfolio DataCSV DOC
            ISLRSmarketS&P Stock Market DataCSV DOC
            ISLRWageMid-Atlantic Wage DataCSV DOC
            ISLRWeeklyWeekly S&P Stock Market DataCSV DOC
            Stat2DataAlfalfaAlfalfaCSV DOC
            Stat2DataArcheryDataArcheryDataCSV DOC
            Stat2DataAutoPollutionAutoPollutionCSV DOC
            Stat2DataBackpackBackpackCSV DOC
            Stat2DataBaseballTimesBaseballTimesCSV DOC
            Stat2DataBeeStingsBeeStingsCSV DOC
            Stat2DataBirdNestBirdNestCSV DOC
            Stat2DataBlood1Blood1CSV DOC
            Stat2DataBlueJaysBlue JaysCSV DOC
            Stat2DataBritishUnionsBritishUnionsCSV DOC
            Stat2DataCAFECAFECSV DOC
            Stat2DataCO2CO2CSV DOC
            Stat2DataCalciumBPCalciumBPCSV DOC
            Stat2DataCancerSurvivalCancerSurvivalCSV DOC
            Stat2DataCaterpillarsCaterpillarsCSV DOC
            Stat2DataCerealCerealCSV DOC
            Stat2DataChemoTHCChemoTHCCSV DOC
            Stat2DataChildSpeaksChildSpeaksCSV DOC
            Stat2DataClothingClothingCSV DOC
            Stat2DataCloudSeedingCloud SeedingCSV DOC
            Stat2DataCloudSeeding2Cloud Seeding 2CSV DOC
            Stat2DataCrackerFiberCracker Fiber in DietsCSV DOC
            Stat2DataCuckooCuckooCSV DOC
            Stat2DataDay1SurveyDay1SurveyCSV DOC
            Stat2DataDiamondsDiamondsCSV DOC
            Stat2DataDiamonds2Diamonds2CSV DOC
            Stat2DataElection08Election08CSV DOC
            Stat2DataEthanolEthanolCSV DOC
            Stat2DataFGByDistanceFGByDistanceCSV DOC
            Stat2DataFantasyBaseballFantasyBaseballCSV DOC
            Stat2DataFertilityFertilityCSV DOC
            Stat2DataFilmFilmCSV DOC
            Stat2DataFinalFourIzzoFinalFourIzzoCSV DOC
            Stat2DataFinalFourLongFinalFourLongCSV DOC
            Stat2DataFinalFourShortFinalFourShortCSV DOC
            Stat2DataFingersFingersCSV DOC
            Stat2DataFirstYearGPAFirstYearGPACSV DOC
            Stat2DataFishEggsFishEggsCSV DOC
            Stat2DataFlightResponseFlightResponseCSV DOC
            Stat2DataFluorescenceFluorescenceCSV DOC
            Stat2DataFruitFliesFruitFliesCSV DOC
            Stat2DataGoldenrodGoldenrod GallsCSV DOC
            Stat2DataGroceryGroceryCSV DOC
            Stat2DataGunnelsGunnelsCSV DOC
            Stat2DataHawkTailHawkTailCSV DOC
            Stat2DataHawkTail2HawkTail2CSV DOC
            Stat2DataHawksHawksCSV DOC
            Stat2DataHearingTestHearingTestCSV DOC
            Stat2DataHighPeaksHighPeaksCSV DOC
            Stat2DataHoopsHoopsCSV DOC
            Stat2DataHorsePricesHorsePricesCSV DOC
            Stat2DataHousesHousesCSV DOC
            Stat2DataICUICUCSV DOC
            Stat2DataInfantMortalityInfantMortalityCSV DOC
            Stat2DataInsuranceVoteInsuranceVoteCSV DOC
            Stat2DataJurorsJurorsCSV DOC
            Stat2DataKids198Kids198CSV DOC
            Stat2DataLeafHoppersLeafHoppersCSV DOC
            Stat2DataLeukemiaLeukemiaCSV DOC
            Stat2DataLongJumpOlympicsLongJumpOlympicsCSV DOC
            Stat2DataLostLetterLostLetterCSV DOC
            Stat2DataMLB2007StandingsMLB2007StandingsCSV DOC
            Stat2DataMarathonMarathonCSV DOC
            Stat2DataMarketsMarketsCSV DOC
            Stat2DataMathEnrollmentMath EnrollmentsCSV DOC
            Stat2DataMathPlacementMath PlacementCSV DOC
            Stat2DataMedGPAMedGPACSV DOC
            Stat2DataMentalHealthMental Health AdmissionsCSV DOC
            Stat2DataMetabolicRateMetabolic Rate of CaterpillarsCSV DOC
            Stat2DataMetroHealth83MetroHealth83CSV DOC
            Stat2DataMilgramMilgramCSV DOC
            Stat2DataMothEggsMoth EggsCSV DOC
            Stat2DataNCbirthsNCbirthsCSV DOC
            Stat2DataNFL2007StandingsNFL2007StandingsCSV DOC
            Stat2DataNursingNursingCSV DOC
            Stat2DataOlivesOlivesCSV DOC
            Stat2DataOringsOringsCSV DOC
            Stat2DataOverdrawnOverdrawnCSV DOC
            Stat2DataPalmBeachPalmBeachCSV DOC
            Stat2DataPedometerPedometerCSV DOC
            Stat2DataPerchPerchCSV DOC
            Stat2DataPigFeedPigFeedCSV DOC
            Stat2DataPinesPinesCSV DOC
            Stat2DataPoliticalPoliticalCSV DOC
            Stat2DataPollster08Pollster08CSV DOC
            Stat2DataPopcornPopcornCSV DOC
            Stat2DataPorscheJaguarPorscheJaguarCSV DOC
            Stat2DataPorschePricePorschePriceCSV DOC
            Stat2DataPulsePulseCSV DOC
            Stat2DataPutts1Putts1CSV DOC
            Stat2DataPutts2Putts2CSV DOC
            Stat2DataReligionGDPReligionGDPCSV DOC
            Stat2DataRetirementRetirementCSV DOC
            Stat2DataRiverElementsRiverElementsCSV DOC
            Stat2DataRiverIronRiver IronCSV DOC
            Stat2DataSATGPASAT scores and GPACSV DOC
            Stat2DataSampleFGSampleFGCSV DOC
            Stat2DataSandwichAntsSandwich AntsCSV DOC
            Stat2DataSeaSlugsSea SlugsCSV DOC
            Stat2DataSparrowsSparrowsCSV DOC
            Stat2DataSpeciesAreaSpecies AreaCSV DOC
            Stat2DataSpeedSpeedCSV DOC
            Stat2DataSwahiliSwahiliCSV DOC
            Stat2DataTMSTMSCSV DOC
            Stat2DataTextPricesText PricesCSV DOC
            Stat2DataThreeCarsThree CarsCSV DOC
            Stat2DataTipJokeTip JokeCSV DOC
            Stat2DataTitanicTitanicCSV DOC
            Stat2DataTomlinsonRushLaDainian Tomlinson Rushing YardsCSV DOC
            Stat2DataTwinsLungsTwinsLungsCSV DOC
            Stat2DataUSstampsUSstampsCSV DOC
            Stat2DataVoltsVoltsCSV DOC
            Stat2DataWalkingBabiesWalkingBabiesCSV DOC
            Stat2DataWeightLossIncentiveWeightLossIncentiveCSV DOC
            Stat2DataWeightLossIncentive4WeightLossIncentive4CSV DOC
            Stat2DataWeightLossIncentive7WeightLossIncentive7CSV DOC
            Stat2DataWordMemoryWordMemoryCSV DOC
            Stat2DataYouthRisk2007YouthRisk2007CSV DOC
            Stat2DataYouthRisk2009YouthRisk2009CSV DOC


            Source:  r-dir (r-directory)



            World Bank Data - Literally hundreds of datasets spanning many decades, sortable by topic or country. Data is downloadable in Excel or XML formats, or you can make API calls. This is an outstanding resource.

            Gapminder - Hundreds of datasets on world health, economics, population, etc. All of it is viewable online within Google Docs, and downloadable as spreadsheets.

            The Data Hub - Hosted by CKAN. Most of these datasets come from the government.

            Datamob - List of public datasets.

            Numbrary - Lists of datasets.

            Kaggle - Kaggle is a site that hosts data mining competitions. Each competition provides a data set that's free for download.

            SNAP - Stanford's Large Network Dataset Collection. This list has several datasets related to social networking. Lots of fun in here!

            KONECT - The Koblenz Network Collection. Several datasets related to social networking & Wikipedia.

            Million Song Dataset - This is a collection of audio features and metadata for a million contemporary popular music tracks.

            Energy Information Administration - This site offers a number of datasets on energy production, consumption, sources, etc.

            GeoDa Center - This is a collection of geospatial datasets offered by Arizona State Univerisity's Center for Geospatial Analysis & Computation.

            Reddit Datasets - This last one isn't a dataset itself, but rather a social news site devoted to datasets. It's updated regularly with news about newly available datasets.

            Quandl - This is a web-based front end to a number of public data sets. What's nice about this website is that it allows for the combination of data from a number of sources, and can export the data in a number of formats.

            1,001 Datasets - This is a list of lists of datasets. There's not much organization here, but there really are a LOT of datasets. Dive in and have fun.

            Yahoo! Webscope - A reference library of interesting and scientifically useful datasets for non-commercial use by academics and other scientists.

            Time Series Data Library - Curated by Professor Rob Hyndman of Monash University in Australia, this is a collection of over 500 datasets containing time-series data, organized by category.

            Awesome Public Datasets - Curated list of hundreds of public datasets, organized by topic.

            Common Crawl - Massive dataset of billions of pages scraped from the web. The data itself is on Amazon Public Datasets, so its easy to load it into an EC2 instance there. The dataset is updated with a new scrape about once per month.


            SOURCE: Amazon Public Datasets - Collection of datasets that are ready to be loaded into an EC2 instance.

            A Multi-wavelength Infrared Atlas of the Galactic Plane Open Source tools were used to combine images from five major infrared surveys of the Galactic Plane, archived at the NASA/IPAC Infrared Science Archive (IRSA). The result is a 16-wavelength infrared Atlas of the Galactic Plane that coves the wavelength range 1 μm to 24 μm.

            CCAFS-Climate Data High resolution climate data to help assess the impacts of climate change primarily on agriculture. These open access datasets of climate projections will help researchers make climate change impact assessments.

            NASA NEX Three NASA NEX datasets are now available, including climate projections and satellite images of Earth.

            Human Microbiome Project Human Microbiome Project Data Set

            Enron Email Data Enron email data publicly released as part of FERC's Western Energy Markets investigation converted to industry standard formats by EDRM. The data set consists of 1,227,255 emails with 493,384 attachments covering 151 custodians. The email is provided in Microsoft PST, IETF MIME, and EDRM XML formats.

            Japan Census Data Multiple data sets including: (1) Population Census of Japan (1995, 2000, 2005, 2010), (2) Establishment and Enterprise Census of Japan (1999, 2001, 2004, 2006), and (3) Economic Census of Japan (2009).

            Apache Software Foundation Public Mail Archives  A collection of all publicly available Apache Software Foundation mail archives as of July 11, 2011

            Freebase Simple Topic Dump A data dump of the basic identifying facts about every topic in Freebase

            Freebase Quad Dump A data dump of all the current facts and assertions in Freebase

            Wikipedia Page Traffic Statistic V3 This dataset contains a 150 GB sample of the data used to power trendingtopics.org. It includes a full 3 months of hourly page traffic statistics from Wikipedia (1/1/2011-3/31/2011).

            Material Safety Data Sheets 230,000 Material Safety Data Sheets.

            Million Song Dataset The Million Songs Collection is a collection of 28 datasets containing audio features and metadata for a million contemporary popular music tracks.

            Million Song Sample Dataset This is a 10,000 song subset of audio features and metadata from the Million Songs collection - a collection of 28 datasets containing audio features and metadata for a million contemporary popular music tracks.

            Marvel Universe Social Graph This dataset is an example of a social collaboration network based on the characters in The Marvel Universe, that is, the artificial world that takes place in the universe of the Marvel comic books.

            Google Books Ngrams A data set containing Google Books n-gram corpora. This data set is freely available on Amazon S3 in a Hadoop friendly file format and is licensed under a Creative Commons Attribution 3.0 Unported License. The original dataset is available from http://books.google.com/ngrams/.

            The WestburyLab USENET corpus The WestburyLab USENET corpus is an anonymized compilation of postings from 47,860 English-language newsgroups from 2005-2010.

            1000 Genomes Project The 1000 Genomes Project, initiated in 2008, is an international public-private consortium that aims to build the most detailed map of human genetic variation available.

            Wikipedia Traffic Statistics V2 Contains 16 months of hourly pageview statistics for all articles in Wikipedia

            M-Lab dataset: Network Diagnostic Tool (NDT) NDT test results created through Measurement Lab (M-Lab) between February 2009 and September 2009

            M-Lab dataset: Network Path and Application Diagnosis tool (NPAD) NPAD test results created through Measurement Lab (M-Lab) between February 2009 and September 2009

            Petroleum Public Data Set (working Title) Public-domain data for the oil & gas industry, assembled from the contributions of participating agencies in the United States, Canada and around the world. This data provides industry stakeholders with an opportunity to focus their efforts on the analysis and interpretation of this data without concern for the trivial and time-consuming tasks of locating, downloading, reformatting and integrating the data prior to value-added work being performed.

            Sloan Digital Sky Survey DR6 Subset The Sloan Digital Sky Survey is the most ambitious astronomical survey ever undertaken.

            Wikipedia Page Traffic Statistics Contains 7 months of hourly pageview statistics for all articles in Wikipedia

            Wikipedia XML Data A complete copy of all Wikimedia wikis, in the form of wikitext source and metadata embedded in XML.

            Federal Reserve Economic Data - Fred Database of 20,059 U.S. economic time series.

            Twilio/Wigle.net Street Vector Data Set Twilio/Wigle.net database of mapped US street names and address ranges.

            Federal Contracts from the Federal Procurement Data Center (USASpending.gov) A data dump of all federal contracts from the Federal Procurement Data Center found at USASpending.gov.

            University of Florida Sparse Matrix Collection The University of Florida Sparse Matrix Collection is a large, widely available, and actively growing set of sparse matrices that arise in real applications.

            2008 TIGER/Line Shapefiles Census 2000 and Current United States shapefiles

            Wikipedia Extraction (WEX) A processed dump of the English language Wikipedia

            Business and Industry Summary Data US Business and Industry Summary Data

            2003-2006 US Economic Data US Economic Data for years 2003 to 2006

            Freebase Data Dump Freebase is an open database of the world's information, covering millions of topics in hundreds of categories

            DBpedia 3.5.1 DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web

            1980 US Census Data from the 1980 US Census

            1990 US Census Data from the 1990 US Census

            2000 US Census Data from the 2000 US Census

            Transportation Databases Various transportation statistics

            Labor Statistics Databases Various Labor Statistics












            Enjoy! As mentioned above - 100% of this data is reposted - original source is in links - if I've missed any citations, please let me know and will fix

            Interested in more content by this author?

            loading, please wait...

            Followed By

            • don carter
            • Amitabh Modi
            • Budi Wins

            Recommended By

            • Atul Loona
            • Budi Wins

            About this document

            1001 Datasets and Data repositories ( List of lists of lists ) - rough list to compile - a rough lists of lists

            Created: December 28, 2013

            You might also like

            This Document Appears in