doc header

1001 Datasets and Data repositories ( List of lists of lists )

1001 Datasets and Data repositories ( List of lists of lists )

This is a LIST of.... "lists of lists". Messy presentation ( my own use) to pull together Raw Datasets for my hacks.  Suggestions to add?  Message me or post comment..

Follow me on Twitter   100% of the links below are from external sources (not mine)


Need code or pattern once you find the data?  Try here:


NLP Datasets

Source: Niderhoff Github nlp-datasets  Alphabetical list of free/public domain datasets with text data for use in Natural Language Processing (NLP). Most stuff here is just raw unstructured text data, if you are looking for annotated corpora or Treebanks refer to the sources at the bottom.

  • Apache Software Foundation Public Mail Archives: all publicly available Apache Software Foundation mail archives as of July 11, 2011 (200 GB)

  • Blog Authorship Corpus: consists of the collected posts of 19,320 bloggers gathered from in August 2004. 681,288 posts and over 140 million words. (298 MB)

  • Amazon Fine Food Reviews [Kaggle]: consists of 568,454 food reviews Amazon users left up to October 2012. Paper. (240 MB)

  • Amazon Reviews: Stanford collection of 35 million amazon reviews. (11 GB)

  • ArXiv: All the Papers on archive as fulltext (270 GB) + sourcefiles (190 GB).

  • ASAP Automated Essay Scoring [Kaggle]: For this competition, there are eight essay sets. Each of the sets of essays was generated from a single prompt. Selected essays range from an average length of 150 to 550 words per response. Some of the essays are dependent upon source information and others are not. All responses were written by students ranging in grade levels from Grade 7 to Grade 10. All essays were hand graded and were double-scored. (100 MB)

  • ASAP Short Answer Scoring [Kaggle]: Each of the data sets was generated from a single prompt. Selected responses have an average length of 50 words per response. Some of the essays are dependent upon source information and others are not. All responses were written by students primarily in Grade 10. All responses were hand graded and were double-scored. (35 MB)

  • Classification of political social media: Social media messages from politicians classified by content. (4 MB)

  • CLiPS Stylometry Investigation (CSI) Corpus: a yearly expanded corpus of student texts in two genres: essays and reviews. The purpose of this corpus lies primarily in stylometric research, but other applications are possible. (on request)

  • ClueWeb09 FACC: ClueWeb09 with Freebase annotations (72 GB)

  • ClueWeb11 FACC: ClueWeb11 with Freebase annotations (92 GB)

  • Common Crawl Corpus: web crawl data composed of over 5 billion web pages (541 TB)

  • Cornell Movie Dialog Corpus: contains a large metadata-rich collection of fictional conversations extracted from raw movie scripts: 220,579 conversational exchanges between 10,292 pairs of movie characters, 617 movies (9.5 MB)

  • Corporate messaging: A data categorization job concerning what corporations actually talk about on social media. Contributors were asked to classify statements as information (objective statements about the company or it’s activities), dialog (replies to users, etc.), or action (messages that ask for votes or ask users to click on links, etc.). (600 KB)

  • Crosswikis: English-phrase-to-associated-Wikipedia-article database. Paper. (11 GB)

  • DBpedia: a community effort to extract structured information from Wikipedia and to make this information available on the Web (17 GB)

  • Death Row: last words of every inmate executed since 1984 online (HTML table)

  • 1.25 million bookmarks on

  • Disasters on social media: 10,000 tweets with annotations whether the tweet referred to a disaster event (2 MB).

  • Economic News Article Tone and Relevance: News articles judged if relevant to the US economy and, if so, what the tone of the article was. Dates range from 1951 to 2014. (12 MB)

  • Enron Email Data: consists of 1,227,255 emails with 493,384 attachments covering 151 custodians (210 GB)

  • Event Registry: Free tool that gives real time access to news articles by 100.000 news publishers worldwide. Has API. (query tool)

  • - Spam Clickbait News Headlines [Kaggle]: 3 Million crowdsourced News headlines published by now defunct clickbait website The Examiner from 2010 to 2015. (200 MB)

  • Federal Contracts from the Federal Procurement Data Center ( data dump of all federal contracts from the Federal Procurement Data Center found at (180 GB)

  • Flickr Personal Taxonomies: Tree dataset of personal tags (40 MB)

  • Freebase Data Dump: data dump of all the current facts and assertions in Freebase (26 GB)

  • Freebase Simple Topic Dump: data dump of the basic identifying facts about every topic in Freebase (5 GB)

  • Freebase Quad Dump: data dump of all the current facts and assertions in Freebase (35 GB)

  • GigaOM Wordpress Challenge [Kaggle]: blog posts, meta data, user likes (1.5 GB)

  • Google Books Ngrams: available also in hadoop format on amazon s3 (2.2 TB)

  • Google Web 5gram: contains English word n-grams and their observed frequency counts (24 GB)

  • Gutenberg Ebook List: annotated list of ebooks (2 MB)

  • Hansards text chunks of Canadian Parliament: 1.3 million pairs of aligned text chunks (sentences or smaller fragments) from the official records (Hansards) of the 36th Canadian Parliament. (82 MB)

  • Harvard Library: over 12 million bibliographic records for materials held by the Harvard Library, including books, journals, electronic resources, manuscripts, archival materials, scores, audio, video and other materials. (4 GB)

  • Hate speech identification: Contributors viewed short text and identified if it a) contained hate speech, b) was offensive but without hate speech, or c) was not offensive at all. Contains nearly 15K rows with three contributor judgments per text string. (3 MB)

  • Hillary Clinton Emails [Kaggle]: nearly 7,000 pages of Clinton's heavily redacted emails (12 MB)

  • Home Depot Product Search Relevance [Kaggle]: contains a number of products and real customer search terms from Home Depot's website. The challenge is to predict a relevance score for the provided combinations of search terms and products. To create the ground truth labels, Home Depot has crowdsourced the search/product pairs to multiple human raters. (65 MB)

  • Identifying key phrases in text: Question/Answer pairs + context; context was judged if relevant to question/answer. (8 MB)

  • Jeopardy: archive of 216,930 past Jeopardy questions (53 MB)

  • 200k English plaintext jokes: archive of 208,000 plaintext jokes from various sources.

  • Machine Translation of European Languages: (612 MB)

  • Material Safety Datasheets: 230,000 Material Safety Data Sheets. (3 GB)

  • Million News Headlines - ABC Australia [Kaggle]: 1.3 Million News headlines published by ABC News Australia from 2003 to 2017. (56 MB)

  • MCTest: a freely available set of 660 stories and associated questions intended for research on the machine comprehension of text; for question answering (1 MB)

  • NEGRA: A Syntactically Annotated Corpus of German Newspaper Texts. Available for free for all Universities and non-profit organizations. Need to sign and send form to obtain. (on request)

  • News Headlines of India - Times of India [Kaggle]: 2.7 Million News Headlines with category published by Times of India from 2001 to 2017. (185 MB)

  • News article / Wikipedia page pairings: Contributors read a short article and were asked which of two Wikipedia articles it matched most closely. (6 MB)

  • NIPS2015 Papers (version 2) [Kaggle]: full text of all NIPS2015 papers (335 MB)

  • NYTimes Facebook Data: all the NYTimes facebook posts (5 MB)

  • One Week of Global News Feeds [Kaggle]: News Event Dataset of 1.4 Million Articles published globally in 20 languages over one week of August 2017. (115 MB)

  • Objective truths of sentences/concept pairs: Contributors read a sentence with two concepts. For example “a dog is a kind of animal” or “captain can have the same meaning as master.” They were then asked if the sentence could be true and ranked it on a 1-5 scale. (700 KB)

  • Open Library Data Dumps: dump of all revisions of all the records in Open Library. (16 GB)

  • Personae Corpus: collected for experiments in Authorship Attribution and Personality Prediction. It consists of 145 Dutch-language essays by 145 different students. (on request)

  • Reddit Comments: every publicly available reddit comment as of july 2015. 1.7 billion comments (250 GB)

  • Reddit Comments (May ‘15) [Kaggle]: subset of above dataset (8 GB)

  • Reddit Submission Corpus: all publicly available Reddit submissions from January 2006 - August 31, 2015). (42 GB)

  • Reuters Corpus: a large collection of Reuters News stories for use in research and development of natural language processing, information retrieval, and machine learning systems. This corpus, known as "Reuters Corpus, Volume 1" or RCV1, is significantly larger than the older, well-known Reuters-21578 collection heavily used in the text classification community. Need to sign agreement and sent per post to obtain. (2.5 GB)

  • SaudiNewsNet: 31,030 Arabic newspaper articles alongwith metadata, extracted from various online Saudi newspapers. (2 MB)

  • SMS Spam Collection: 5,574 English, real and non-enconded SMS messages, tagged according being legitimate (ham) or spam. (200 KB)

  • SouthparkData: .csv files containing script information including: season, episode, character, & line. (3.6 MB)

  • Stackoverflow: 7.3 million stackoverflow questions + other stackexchanges (query tool)

  • Twitter Cheng-Caverlee-Lee Scrape: Tweets from September 2009 - January 2010, geolocated. (400 MB)

  • Twitter New England Patriots Deflategate sentiment: Before the 2015 Super Bowl, there was a great deal of chatter around deflated footballs and whether the Patriots cheated. This data set looks at Twitter sentiment on important days during the scandal to gauge public sentiment about the whole ordeal. (2 MB)

  • Twitter Progressive issues sentiment analysis: tweets regarding a variety of left-leaning issues like legalization of abortion, feminism, Hillary Clinton, etc. classified if the tweets in question were for, against, or neutral on the issue (with an option for none of the above). (600 KB)

  • Twitter Sentiment140: Tweets related to brands/keywords. Website includes papers and research ideas. (77 MB)

  • Twitter sentiment analysis: Self-driving cars: contributors read tweets and classified them as very positive, slightly positive, neutral, slightly negative, or very negative. They were also prompted asked to mark if the tweet was not relevant to self-driving cars. (1 MB)

  • Twitter Tokyo Geolocated Tweets: 200K tweets from Tokyo. (47 MB)

  • Twitter UK Geolocated Tweets: 170K tweets from UK. (47 MB)

  • Twitter USA Geolocated Tweets: 200k tweets from the US (45MB)

  • Twitter US Airline Sentiment [Kaggle]: A sentiment analysis job about the problems of each major U.S. airline. Twitter data was scraped from February of 2015 and contributors were asked to first classify positive, negative, and neutral tweets, followed by categorizing negative reasons (such as "late flight" or "rude service"). (2.5 MB)

  • U.S. economic performance based on news articles: News articles headlines and excerpts ranked as whether relevant to U.S. economy. (5 MB)

  • Urban Dictionary Words and Definitions [Kaggle]: Cleaned CSV corpus of 2.6 Million of all Urban Dictionary words, definitions, authors, votes as of May 2016. (238 MB)

  • Wesbury Lab Usenet Corpus: anonymized compilation of postings from 47,860 English-language newsgroups from 2005-2010 (40 GB)

  • Wesbury Lab Wikipedia Corpus Snapshot of all the articles in the English part of the Wikipedia that was taken in April 2010. It was processed, as described in detail below, to remove all links and irrelevant material (navigation text, etc) The corpus is untagged, raw text. Used by Stanford NLP (1.8 GB).

  • Wikipedia Extraction (WEX): a processed dump of english language wikipedia (66 GB)

  • Wikipedia XML Data: complete copy of all Wikimedia wikis, in the form of wikitext source and metadata embedded in XML. (500 GB)

  • Yahoo! Answers Comprehensive Questions and Answers: Yahoo! Answers corpus as of 10/25/2007. Contains 4,483,032 questions and their answers. (3.6 GB)

  • Yahoo! Answers consisting of questions asked in French: Subset of the Yahoo! Answers corpus from 2006 to 2015 consisting of 1.7 million questions posed in French, and their corresponding answers. (3.8 GB)

  • Yahoo! Answers Manner Questions: subset of the Yahoo! Answers corpus from a 10/25/2007 dump, selected for their linguistic properties. Contains 142,627 questions and their answers. (104 MB)

  • Yahoo! HTML Forms Extracted from Publicly Available Webpages: contains a small sample of pages that contain complex HTML forms, contains 2.67 million complex forms. (50+ GB)

  • Yahoo! Metadata Extracted from Publicly Available Web Pages: 100 million triples of RDF data (2 GB)

  • Yahoo N-Gram Representations: This dataset contains n-gram representations. The data may serve as a testbed for query rewriting task, a common problem in IR research as well as to word and sentence similarity task, which is common in NLP research. (2.6 GB)

  • Yahoo! N-Grams, version 2.0: n-grams (n = 1 to 5), extracted from a corpus of 14.6 million documents (126 million unique sentences, 3.4 billion running words) crawled from over 12000 news-oriented sites (12 GB)

  • Yahoo! Search Logs with Relevance Judgments: Annonymized Yahoo! Search Logs with Relevance Judgments (1.3 GB)

  • Yahoo! Semantically Annotated Snapshot of the English Wikipedia: English Wikipedia dated from 2006-11-04 processed with a number of publicly-available NLP tools. 1,490,688 entries. (6 GB)

  • Yelp: including restaurant rankings and 2.2M reviews (on request)

  • Youtube: 1.7 million youtube videos descriptions (torrent)




Audio Content Analysis

Source: Alexander Lerch / Audio Content Analysis


AWS Public Data Sets

Source: AWS Public Data Sets

Learn more about working with geospatial data on AWS at Earth on AWS.

  • Landsat on AWS: An ongoing collection of satellite imagery of all land on Earth produced by the Landsat 8 satellite.
  • Sentinel-2 on AWS: An ongoing collection of satellite imagery of all land on Earth produced by the Sentinel-2 satellite.
  • GOES on AWS: GOES provides continuous weather imagery and monitoring of meteorological and space environment data across North America.
  • SpaceNet on AWS: A corpus of commercial satellite imagery and labeled training data to foster innovation in the development of computer vision algorithms.
  • OpenStreetMap on AWS: OSM is a free, editable map of the world, created and maintained by volunteers. Regular OSM data archives are made available in Amazon S3.
  • MODIS on AWS: Select products from the Moderate Resolution Imaging Spectroradiometer (MODIS) managed by the U.S. Geological Survey and NASA.
  • Terrain Tiles: A global dataset providing bare-earth terrain heights, tiled for easy usage and provided on S3.
  • NAIP: 1 meter aerial imagery captured during the agricultural growing seasons in the continental U.S.
  • NEXRAD on AWS: Real-time and archival data from the Next Generation Weather Radar (NEXRAD) network.
  • NASA NEX: A collection of Earth science datasets maintained by NASA, including climate change projections and satellite images of the Earth's surface.
  • District of Columbia LiDAR: LiDAR point cloud data for Washington, DC.
  • EPA Risk-Screening Environmental Indicators: detailed air model results from EPA’s Risk-Screening Environmental Indicators (RSEI) model.
  • HIRLAM Weather Model: HIRLAM (High Resolution Limited Area Model) is an operational synoptic and mesoscale weather prediction model managed by the Finnish Meteorological Institute.

      Learn more about genomics in the cloud.

      • 1000 Genomes Project: A detailed map of human genetic variation.
      • TCGA on AWS: Raw and processed genomic, transcriptomic, and epigenomic data from The Cancer Genome Atlas (TCGA) available to qualified researchers via the Cancer Genomics Cloud.
      • ICGC on AWS: Whole genome sequence data available to qualified researchers via The International Cancer Genome Consortium (ICGC).
      • 3000 Rice Genome on AWS: Genome sequence of 3,024 rice varieties.
      • Genome in a Bottle (GIAB): Several reference genomes to enable translation of whole human genome sequencing to clinical practice.

          Learn more about artificial intelligence and machine learning on AWS.

          • Common Crawl: A corpus of web crawl data composed of over 5 billion web pages.
          • Amazon Bin Image Dataset: Over 500,000 bin JPEG images and corresponding JSON metadata files describing products in an operating Amazon Fulfillment Center.
          • GDELT: Over a quarter-billion records monitoring the world's broadcast, print, and web news from nearly every corner of every country, updated daily.
          • Multimedia Commons: A collection of nearly 100M images and videos with audio and visual features and annotations.
          • Google Books Ngrams: A dataset containing Google Books n-gram corpuses.
          • SpaceNet on AWS: A corpus of commercial satellite imagery and labeled training data to foster innovation in the development of computer vision algorithms.
          • IRS 990 Filings on AWS: Machine-readable data from certain electronic 990 forms filed with the IRS from 2011 to present.
          • ACS PUMS on AWS: U.S. Census American Community Survey (ACS) Public Use Microdata Sample (PUMS) is available in a linked data format using the Resource Description Framework (RDF) data model.
          • on AWS: database, which includes data on all spending by the federal government, including contracts, grants, loans, employee salaries, and more.  


          Source:  CRAN

          Provides functions to download data from UK Parliament (Text/speeches)


          Source:  Quora:

          Cross-disciplinary data repositories, data collections and data search engines:

          12. alias
          14. Social Network Analysis Interactive Dataset Library (Social Network Datasets)
          15. Datasets for Data Mining
          16. Enigma Public
          18. - The First Interactive Network Data Repository
          20. Open Data Inception - A Comprehensive List of 2500+ Open Data Portals in the World
          21. OpenDataSoft catalog

          Single datasets and data repositories

          19. http://googleresearch.blogspot.c...
          60. (interaction data in learning environments)
          61. - Collaborative Psychiatric Epidemiology Surveys: (A collection of three national surveys focused on each of the major ethnic groups to study psychiatric illnesses and health services use)
          65. - Network/ML data repository w/ visual interactive analytics
          66. Home (United Nations Environment Programme Grid Genava a lot of GIS datasets

          Source: Google Search

          r-directory > Reference Links > Free Data Sets
          Big Data Made Simple - 70 WebSites -
          18 places to find data sets for data science projects

          Source:  IBM -

          OrNotebookto default project


          Check out’s new Metrics Pag - July 31, 2017  By



          SOURCE -



          1. The US Government pledged last year to make all government data available freely online. This site is the first stage and acts as a portal to all sorts of amazing information on everything from climate to crime.
          2. US Census Bureau A wealth of information on the lives of US citizens covering population data, geographic data and education.
          3. Socrata is another interesting place to explore government-related data, with some visualisation tools built-in.
          4. European Union Open Data Portal As the above, but based on data from European Union institutions.
          5. Data from the UK Government, including the British National Bibliography – metadata on all UK books and publications since 1950.
          6. Canada Open Data is a pilot project with many government and geospatial datasets.
          7. offers open government data from US, EU, Canada, CKAN, and more.
          8. The CIA World Factbook Information on history, population, economy, government, infrastructure and military of 267 125 years of US healthcare data including claim-level Medicare data, epidemiology and population statistics.
          9. NHS Health and Social Care Information Centre Health data sets from the UK National Health Service.
          10. UNICEF offers statistics on the situation of women and children worldwide.
          11. World Health Organization offers world hunger, health, and disease statistics.
          12. Amazon Web Services public datasets Huge resource of public data, including the 1000 Genome Project, an attempt to build the most comprehensive database of human genetic information and NASA ’s database of satellite imagery of Earth.
          13. Facebook FB +0.23% Graph Although much of the information on users’ Facebook profile is private, a lot isn’t – Facebook provide the Graph API as a way of querying the huge amount of information that its users are happy to share with the world (or can’t hide because they haven’t worked out how the privacy settings work).
          14. A fascinating tool for facial recognition data.
          15. UCLA makes some of the data from its courses public.
          16. Data Market is a place to check out  data related to economics, healthcare, food and agriculture, and the automotive industry.
          17. Google Public data explorer includes data from world development indicators, OECD, and human development indicators, mostly related to economics data and the world.
          18. Junar is a data scraping service that also includes data feeds.
          19. Buzzdata is a social data sharing service that allows you to upload your own data and connect with others who are uploading their data.
          20. Gapminder Compilation of data from sources including the World Health Organization and World Bank covering economic, medical and social statistics from around the world.
          21. Google GOOGL +0.25% Trends Statistics on search volume (as a proportion of total search) for any given term, since 2004.
          22. Google Finance 40 years’ worth of stock market data, updated in real time.
          23. Google Books Ngrams Search and analyze the full text of any of the millions of books digitised as part of the Google Books project.
          24. National Climatic Data Center Huge collection of environmental, meteorological and climate data sets from the US National Climatic Data Center. The world’s largest archive of weather data.
          25. DBPedia Wikipedia is comprised of millions of pieces of data, structured and unstructured on every subject under the sun. DBPedia is an ambitious project to catalogue and create a public, freely distributable database allowing anyone to analyze this data.
          26. New York Times NYT -0.42% Searchable, indexed archive of news articles going back to 1851.
          27. Freebase A community-compiled database of structured data about people, places and things, with over 45 million entries.
          28. Million Song Data Set Metadata on over a million songs and pieces of music. Part of Amazon Web Services.
          29. UCI Machine Learning Repository is a dataset specifically pre-processed for machine learning.
          30. Financial Data Finder at OSU offers a large catalog of financial data sets.
          31. Pew Research Center offers its raw data from its fascinating research into American life.
          32. The BROAD Institute offers a number of cancer-related datasets.



          Source: Caesar0301 Awesome Data Sets








          Complex Networks


          Computer Networks


          Data Challenges


          Earth Science
















          Image Processing


          Machine Learning




          Natural Language








          Public Domains


          Search Engines


          Social Networks


          Social Sciences






          Time Series




          Source: United Nations


          1AWS (Amazon Web Services) Public Data Sets, provides a centralized repository of public data sets that can be seamlessly integrated into AWS cloud-based applications.
          2BigML big list of public data sources.
          3Bioassay data, described in Virtual screening of bioassay data, by Amanda Schierz, J. of Cheminformatics, with 21 Bioassay datasets (Active / Inactive compounds) available for download.
          4Bitly data, anonymized clicks on gov links.
          5Canada Open Data, pilot project with many government and geospatial datasets.
          6Causality Workbench data repository.
          7Corral Big Data repository at Texas Advanced Computing Center, supporting data-centric science.
          8Data Source Handbook, A Guide to Public Data, by Pete Warden, O'Reilly (Jan 2011).
, open government data from US, EU, Canada, CKAN, and more.
, publicly available data from UK (also London datastore.)
, central guide for education data resources including high-value data sets, data visualization tools, resources for the classroom, applications created from open data and more.
          12DataMarket, visualize the world's economy, societies, nature, and industries, with 100 million time series from UN, World Bank, Eurostat and other important data providers.
          13Datamob, public data put to good use.
, a clearinghouse of datasets available from the City & County of San Francisco, CA.
          15DataFerrett, a data mining tool that accesses and manipulates TheDataWeb, a collection of many on-line US Goverment datasets.
          16Delve, Data for Evaluating Learning in Valid Experiments
          17EconData, thousands of economic time series, produced by a number of US Government agencies.
          18Enron Email Dataset, data from about 150 users, mostly senior management of Enron.
          19Europeana Data, contains open metadata on 20 million texts, images, videos and sounds gathered by Europeana - the trusted and comprehensive resource for European cultural heritage content.
          20FEDSTATS, a comprehensive source of US statistics and more
          21FIMI repository for frequent itemset mining, implementations and datasets.
          22Financial Data Finder at OSU, a large catalog of financial data sets.
          23GDELT: The Global Data on Events, Location and Tone, described by Guardian as "a big data history of life, the universe and everything."
          24GEO (GEO Gene Expression Omnibus), a gene expression/molecular abundance repository supporting MIAME compliant data submissions, and a curated, online resource for gene expression data browsing, query and retrieval.
          25GeoDa Center, geographical and spatial data.
          26Google ngrams datasets, text from millions of books scanned by Google.
          27Grain Market Research, financial data including stocks, futures, etc.
          28Hilary Mason research-quality Big Data sets collection - many text and image datasets.
          29HitCompanies Datasets, comprehensive data on random 10,000 UK companies sampled from HitCompanies, updated automatically using AI/Machine Learning.
          30ICWSM-2009 dataset contains 44 million blog posts made between August 1st and October 1st, 2008.
          31Infochimps, an open catalog and marketplace for data. You can share, sell, curate, and download data about anything and everything.
          32Investor Links, includes financial data
          33KDD Cup center, with all data, tasks, and results.
          34Kevin Chai list of datasets, for text, SNA, and other fields.
          35KONECT, the Koblenz Network Collection, with large network datasets of all types in order to perform research in the area of network mining.
          36Linking Open Data project, at making data freely available to everyone.
          37Million Song Dataset
          38MIT Cancer Genomics gene expression datasets and publications, from MIT Whitehead Center for Genome Research.
          39ML Data, the data repository of the EU Pascal2 networks.
          40NASDAQ Data Store, provides access to market data.
          41National Government Statistical Web Sites, data, reports, statistical yearbooks, press releases, and more from about 70 web sites, including countries from Africa, Europe, Asia, and Latin America.
          42National Space Science Data Center (NSSDC), NASA data sets from planetary exploration, space and solar physics, life sciences, astrophysics, and more.
          43Open Data Census, assesses the state of open data around the world.
          44OpenData from Socrata, access to over 10,000 datasets including business, education, government, and fun.
          45Open Source Sports, many sports databases, including Baseball, Football, Basketball, and Hockey.
          46Peter Skomoroch dataset Bookmarks
          47PubGene(TM) Gene Database and Tools, genomic-related publications database
          48Quandl, a collaboratively curated portal to millions of financial and economic time-series datasets.
          49qunb, a platform to find and visualize quantitative data.
          50Robert Schiller data on housing, stock market, and more from his book Irrational Exuberance.
          51SMD: Stanford Microarray Database, stores raw and normalized data from microarray experiments.
          52Jerry Smith dataset collection, with Finance, Government, Machine Learning, Science, and other data.
 Research Data, includes historic and status statistics on approximately 100,000 projects and over 1 million registered users' activities at the project management web site.
          54StatLib, CMU Datasets Archive.
          55STATOO Datasets part 1 and STATOO Datasets part 2
          56Time Series Data Library
          57Visual Analytics Benchmark Repository.
          58UCI KDD Database Repository for large datasets used in machine learning and knowledge discovery research.
          59UCI Machine Learning Repository.
          60UCR Time Series Data Archive, offering datasets, papers, links, and code.
          61United States Census Bureau.
          62Wikiposit, a (virtual) amalgamation of (mostly financial) data from many different sites, allowing users to merge data from different sources
          63Wolfram Alpha disease and patient level dat.
          64Yahoo Sandbox datasets, Language, Graph, Ratings, Advertising and Marketing, Competition
          65Yelp Academic Dataset, all the data and reviews of the 250 closest businesses for 30 universities for students and academics to explore and research.


          Public data catalogs, portals, and services

          • AWS (Amazon Web Services) Public Data Sets, provides a centralized repository of public data sets that can be seamlessly integrated into AWS cloud-based applications.
          •, open government data from US, EU, Canada, CKAN, and more.
          • DataMarket, visualize the world's economy, societies, nature, and industries, with 100 million time series from UN, World Bank, Eurostat and other important data providers.
          • datamob, Public data put to good use.
          • Enigma, "Google for public data", provides easy access to government, NGO, and other public domain datasets.
          • Freebase, a community-curated database of well-known people, places, and things.
          • Google Public Data, with dynamic visualization and exploration tools.
          • Knoema World Data Atlas, over 1000 indicators on all countries
          • National Government Statistical Web Sites, data, reports, statistical yearbooks, press releases, and more from about 70 web sites, including countries from Africa, Europe, Asia, and Latin America.
          • Open Data Census, assesses the state of open data around the world.
          • Open Data Institute, catalysing the evolution of open data culture to create economic, environmental, and social value.
          • Socrata OpenData, provides social data discovery services for opening government, healthcare, energy, education, or environment data.
          • Visualing Data big collection of sites and services for accessing data.

          Global, International, UN

          • The World Bank, a comprehensive set of data about development in countries around the globe.
          • UN data, a data access system to UN databases
          • UNICEF statistics, data analysis and other data about UNICEF work.


          USA: Federal


          USA: State, City, and Local



          • Europeana Data, contains open metadata on 20 million texts, images, videos and sounds gathered by Europeana - the trusted and comprehensive resource for European cultural heritage content.
          • Eurostat, the leading provider of high quality statistics on Europe.
          • OECD Data Lab, data visualisations and European data downloads.
          •, access to open, freely reusable datasets from local, regional and national public bodies across Europe.
          • Data Publica, l'annuaire des donnees en France, public data about France.
          • Paris data.







          • Census India, data on population, economic activity, literacy, education, housing, urbanisation, fertility, mortality, and more.

          Australia, NZ, and Pacific

          • provides an easy way to find, access and reuse public datasets from the Australian Government.
          • Australian Bureau of Statistics, access to the full range of ABS statistical and reference information.
          • Wiki New Zealand, a collaborative website making data about New Zealand accessible for everyone.


          • Open Data for Africa, supporting statistical development in Africa as a sound basis for designing and managing effective development policies for reducing poverty on the continent.






          Available Public Data Sets on AWS

          Click here for the detailed list of available data sets. Here are some examples of popular Public Data Sets:

          • NASA NEX: A collection of Earth science data sets maintained by NASA, including climate change projections and satellite images of the Earth's surface
          • Common Crawl Corpus: A corpus of web crawl data composed of over 5 billion web pages
          • 1000 Genomes Project: A detailed map of human genetic variation
            Google Books Ngrams: A data set containing Google Books n-gram corpuses
          • US Census Data: US demographic data from 1980, 1990, and 2000 US Censuses
          • Freebase Data Dump: A data dump of all the current facts and assertions in the Freebase system, an open database covering millions of topics




          Blog articles which provide dataset directories

 – excellent article listing available data sets in the area of machine learning and inference

 – has blog, tag cloud, wiki dataset categories

 – Article containing a list of available dataset websites

          Dataset directories

 – Public datasets listed on a Quora Q&A thread.
 – Content Analysis for the Web 2.0 (CAW 2.0) Workshop – part of 18th International Conference of the World Wide Web. Contains training and test datasets from Twitter, MySpace, Slashdot, Ciao and Kongregate.
 – has a machine learning repository
 – listing of links to various datasets
 – Linguistic data consortium catalog

 – google research has stated that will soon host open-source scientific datasets – – watch this space.

 – 800 datasets in ARFF format for different problems and application domains

 – The Global Social Change Research Project – social, political and economic datasets

          Data sets for a specific field

 – machine learning competitions with data provided by organisations with prize money
 – good list here – pay attention to web/news/blogs and Text/Language categories as well as trust network data
 – look under data sets
 – look under corpora
 – Reuters Corpora – contains large collection of news stories for use in Natural Language Processing, Information Retrieval and Machine Learning Systems (need to order CDs)

 – Text retrieval. Has spam, web, question answering, blog and ad hoc (e.g. relevance judgement) tracks
 (300MB) – Spam Corpus 2005
 (75MB – english, 60MB chinese) – Spam Corpus 2006
 – Relevance Judgement
 (25GB – costs 400 GBP) – Blog 06 data
 – Question Answering (many tracks)
 – Novelty (some relevance) -

 – languages
 – lexicon
 – lexical

 – Lexical database that is handy for computational linguistics and natural language processing
 – Machine learning datasets
 – Machine learning datasets – benchmark data for comparing different algorithms of your classifier is recommended from

 – Trust datasets – includes Epinions
 – Metafilter – contains posts, comments, tags, favourites, contact and user data
 – YouTube dataset
 – social network dataset
 – newsgroup dataset
 – Webspam datasets

          Link Analysis

          Recommender systems

 – MovieLens
 – Jester
 – Netflix
 – Book Crossing


 – + user ratings of posts


 – Spam blogs (splogs)
 – 14 million posts, 3 million weblogs – apparently no longer available since Dec 8, 2006
 – but costs 400 GBP!


 – wikipedia 3 providing wikipedia datasets
 – official wikipedia database dumps (very large)
 – English wikipedia articles that have been transformed into XML – all files ~ 55GB
 – structured information from wikipedia – dataset of this is available


 – 85 billion webpages archived since 1996


 – Stock data
 – miscellaneous datasets
 – datasets from Journal of the American Statistical Association
 – music dataset
 – directory of company & business professional dataset
 – library catalogue
 – media library
 – article talking about integrating Wordnet and Wikipedia with YAGO (an extensible and light-weight ontology)
 – country maps
 – open directory project dataset





          Cross-disciplinary data repositories, data collections and data search engines:


          Single datasets and data repositories




          Some others:




          datasets-packageThe R Datasets Package

          -- A --

          ability.covAbility and Intelligence Tests
          airmilesPassenger Miles on Commercial US Airlines, 1937-1960
          AirPassengersMonthly Airline Passenger Numbers 1949-1960
          airqualityNew York Air Quality Measurements
          anscombeAnscombe's Quartet of 'Identical' Simple Linear Regressions
          attenuThe Joyner-Boore Attenuation Data
          attitudeThe Chatterjee-Price Attitude Data
          austresQuarterly Time Series of the Number of Australian Residents

          -- B --

          beaver1Body Temperature Series of Two Beavers
          beaver2Body Temperature Series of Two Beavers
          beaversBody Temperature Series of Two Beavers
          BJsalesSales Data with Leading Indicator
          BJsales.leadSales Data with Leading Indicator
          BODBiochemical Oxygen Demand

          -- C --

          carsSpeed and Stopping Distances of Cars
          ChickWeightWeight versus age of chicks on different diets
          chickwtsChicken Weights by Feed Type
          CO2Carbon Dioxide Uptake in Grass Plants
          co2Mauna Loa Atmospheric CO2 Concentration
          crimtabStudent's 3000 Criminals Data

          -- D --

          datasetsThe R Datasets Package
          discoveriesYearly Numbers of Important Discoveries
          DNaseElisa assay of DNase

          -- E --

          esophSmoking, Alcohol and (O)esophageal Cancer
          euroConversion Rates of Euro Currencies
          euro.crossConversion Rates of Euro Currencies
          eurodistDistances Between European Cities
          EuStockMarketsDaily Closing Prices of Major European Stock Indices, 1991-1998

          -- F --

          faithfulOld Faithful Geyser Data
          fdeathsMonthly Deaths from Lung Diseases in the UK
          FormaldehydeDetermination of Formaldehyde
          freenyFreeny's Revenue Data
          freeny.xFreeny's Revenue Data
          freeny.yFreeny's Revenue Data

          -- H --

          HairEyeColorHair and Eye Color of Statistics Students
          Harman23.corHarman Example 2.3
          Harman74.corHarman Example 7.4

          -- I --

          IndomethPharmacokinetics of Indomethacin
          infertInfertility after Spontaneous and Induced Abortion
          InsectSpraysEffectiveness of Insect Sprays
          irisEdgar Anderson's Iris Data
          iris3Edgar Anderson's Iris Data
          islandsAreas of the World's Major Landmasses

          -- J --

          JohnsonJohnsonQuarterly Earnings per Johnson & Johnson Share

          -- L --

          LakeHuronLevel of Lake Huron 1875-1972
          ldeathsMonthly Deaths from Lung Diseases in the UK
          lhLuteinizing Hormone in Blood Samples
          LifeCycleSavingsIntercountry Life-Cycle Savings Data
          LoblollyGrowth of Loblolly pine trees
          longleyLongley's Economic Regression Data
          lynxAnnual Canadian Lynx trappings 1821-1934

          -- M --

          mdeathsMonthly Deaths from Lung Diseases in the UK
          morleyMichelson Speed of Light Data
          mtcarsMotor Trend Car Road Tests

          -- N --

          nhtempAverage Yearly Temperatures in New Haven
          NileFlow of the River Nile
          nottemAverage Monthly Temperatures at Nottingham, 1920-1939
          npkClassical N, P, K Factorial Experiment

          -- O --

          occupationalStatusOccupational Status of Fathers and their Sons
          OrangeGrowth of Orange Trees
          OrchardSpraysPotency of Orchard Sprays

          -- P --

          PlantGrowthResults from an Experiment on Plant Growth
          precipAnnual Precipitation in US Cities
          presidentsQuarterly Approval Ratings of US Presidents
          pressureVapor Pressure of Mercury as a Function of Temperature
          PuromycinReaction Velocity of an Enzymatic Reaction

          -- Q --

          quakesLocations of Earthquakes off Fiji

          -- R --

          randuRandom Numbers from Congruential Generator RANDU
          riversLengths of Major North American Rivers
          rockMeasurements on Petroleum Rock Samples

          -- S --

          SeatbeltsRoad Casualties in Great Britain 1969-84
          sleepStudent's Sleep Data
          stack.lossBrownlee's Stack Loss Plant Data
          stack.xBrownlee's Stack Loss Plant Data
          stacklossBrownlee's Stack Loss Plant Data
          stateUS State Facts and Figures
          state.abbUS State Facts and Figures
          state.areaUS State Facts and Figures
          state.centerUS State Facts and Figures
          state.divisionUS State Facts and Figures
          state.nameUS State Facts and Figures
          state.regionUS State Facts and Figures
          state.x77US State Facts and Figures
          sunspot.monthMonthly Sunspot Data, from 1749 to "Present"
          sunspot.yearYearly Sunspot Data, 1700-1988
          sunspotsMonthly Sunspot Numbers, 1749-1983
          swissSwiss Fertility and Socioeconomic Indicators (1888) Data

          -- T --

          TheophPharmacokinetics of Theophylline
          TitanicSurvival of passengers on the Titanic
          ToothGrowthThe Effect of Vitamin C on Tooth Growth in Guinea Pigs
          treeringYearly Treering Data, -6000-1979
          treesGirth, Height and Volume for Black Cherry Trees

          -- U --

          UCBAdmissionsStudent Admissions at UC Berkeley
          UKDriverDeathsRoad Casualties in Great Britain 1969-84
          UKgasUK Quarterly Gas Consumption
          UKLungDeathsMonthly Deaths from Lung Diseases in the UK
          USAccDeathsAccidental Deaths in the US 1973-1978
          USArrestsViolent Crime Rates by US State
          USJudgeRatingsLawyers' Ratings of State Judges in the US Superior Court
          USPersonalExpenditurePersonal Expenditure Data
          uspopPopulations Recorded by the US Census

          -- V --

          VADeathsDeath Rates in Virginia (1940)
          volcanoTopographic Information on Auckland's Maunga Whau Volcano

          -- W --

          warpbreaksThe Number of Breaks in Yarn during Weaving
          womenAverage Heights and Weights for American Women
          WorldPhonesThe World's Telephones
          WWWusageInternet Usage per Minute






          Official Business Cycle Dates

          "The American Business Cycle: Continuity and Change"   Historic Data Tables

          Experimental Coincident, Leading and Recession Indexes
          Stock, Watson

          Index of African Governance
          Rotberg, Gisselquist

          Penn-World Tables
          Feenstra, Inklaar, Timmer

          Barro, Lee

          Cross-country Historical Adoption of Technology (CHAT) data
          Comin, Hobijn

          Economic Policy Uncertainty
          Baker, Bloom, Davis

          A History of U.S. Foreign-Exchange-Market Interventions
          Bordo, Humpage, Schwartz

          Occupational Wages around the World
          Freeman, Oostendorp

          Macro History Database

          Savings, Investment, and Gold in 13 countries (1850-1945)
          Jones, Obstfeld

          Social Security Pension Reform in Europe
          Feldstein, Siebert

          Historical Cross-Country Technological Adoption: Dataset
          Comin, Hobijn

          Facts and Fantasies about Commodity Futures
          Gorton, Rouwenhorst

          US Industrial Production Index 1790 - 1915

          Industry, Productivity, and Digitization Data

          Job Creation and Destruction Data
          Haltiwanger et al

          Management Practices Data
          Bloom, Van Reenen

          Manufacturing Industry Productivity Database
          Becker, Gray, Marvakov

          Internet and Economy Digitization Report

          Public Sector Collective Bargaining Law Data
          Valletta, Freeman

          Form 990 data on tax exempt organizations

          International Trade Data

          Price Quantity Indexes and Values for U.S. Exports and Imports, 1879-1923

          SITC Rev 2 and NAICS (1997)

          U.S. Trade by 1972-SIC category, 1958-1994

          U.S. Trade by 1987-SIC, 1972-2005; NAICS 1989-2005; HS 1989-2008
          Concordance between HS and SIC/NAICS; Concordance of HS codes over time
          Pierce and Schott

          U.S. Imports by TSUSA, HS, SITC, 1972-2001

          U.S. Imports by SAS and Stata, 1972-2001

          U.S. Exports by TSUSA, HS, SITC, 1972-2001

          U.S. Exports by SAS and Stata, 1972-2001

          U.S. Tariffs, 1989-2001

          U.S. Antidumping Database and Links

          World Trade Data ( choose World Import and Export Data )
          Feenstra, Lipsey

          Individual Data

          Angrist Archive
          Joshua Angrist

          Boston Youth Labor (Market) Survey, 1980, 1989
          Freeman, Katz

          Collaborative Perinatal (CPP)

          Consumer Expenditure Survey Extracts
          Harris, Sabelhaus (CBO)

          Current Population Survey

          Fatality Analysis Reporting System (FARS) Data

          Gould Sample

          National Health and Nutrition Examination Survey (NHANES)

          Reading National Health Interview Survey (NHIS) Data with SAS, SPSS, or Stata

          Survey of Economic Expectations
          Dominitz, Manski

          Survey of Income and Program Participation

          Survey of Program Dynamics

          Thorndike, Hagen

          Union Army Data Set

          Worker Representation survey
          Freeman, Rogers

          Hospital/Provider Data

          CMS' Prospective Payment System (PPS)

          Reading CMS'  Healthcare Cost Report Information System (HCRIS) datasets using SAS

          CMS's National Plan and Provider Enumeration System (NPPES) Files

          CMS' National Provider Identifier (NPI) to Unique Physician Identification Number (UPIN) Crosswalk

          CMS' National Provider Identifier (NPI) to State License Crosswalk

          CMS' Provider of Service (POS) files

          CMS' Medicare Provider Charge Data

          CMS' ICD-9-CM to and from ICD-10-CM and ICD-10-PCS Crosswalk or General Equivalence Mappings

          CMS's CBSA, MSA, and State Wage Index Files

          CMS' SSA to FIPS CBSA and MSA County Crosswalks

          CMS' SSA to FIPS State and County Crosswalks

          Demographic and Vital Statistics

          Vital Statistics Books ( Historical )

          Vital Statistics Births

          Interactive index to Vital Statistics Births 1931-1968

          Reading SEER U.S. County Population Data with SAS, SPSS, or Stata 1969-on

          Vital Statistics Births and Infant Mortality 1920-1945
          Cutler, Norberg, Norton

          Vital Statistics Births 1940-1968
          Finkelstein, Heidi Williams

          Vital Statistics Mortality Data

          Vital Statistics Deaths - Historical 1900 - 1936
          Grant Miller

          Vital Statistics Marriage and Divorce

          US Decennial Population by County and State 1900-1990

          US Intercensal Population by County and State 1970-2009
          Roth, James Wang

          US Intercensal Population by State, Age and Sex 1970-1999

          Work-Family Policies and Other Data
          Waldfogel, Han, Ruhm


          Patent and Scientific Papers Data


          U.S. Patents
          Hall, Jaffe, Tratjenberg

          NBER-Rensselaer Polytechnic Institute Scientific Papers Database
          Adams, Clemmons

          Nobel Laureate Data
          Jones, Weinberg

          Other Data


          • NBER
          • NCES
          • Feenberg
          • Cutler, Glaeser, Vigdor
          • Wallis
          • Lichtenberg
          • Borenstein
          • Roth
          • Roth
          • Roth
          • Olken
          • Lahey
          • NBER
          • Norberg






          • World Economic Outlook Databases (WEO) updated
          • International Financial Statistics (IFS)
          • Principal Global Indicators (PGI)
          • Balance of Payments Statistics (BOPS)
          • Coordinated Direct Investment Survey (CDIS)
          • Coordinated Portfolio Investment Survey (CPIS) updated
          • Currency Composition of Official Foreign Exchange Reserves (COFER)
          • Data Template on International Reserves and Foreign Currency Liquidity
          • Financial Access Survey (FAS)
          • Financial Soundness Indicators (FSIs)
          • G-20 Surveillance Notes
          • Joint External Debt Hub
          • Monitoring of Fund Arrangements Database (MONA)
          • Primary Commodity Prices
          • Public Sector Debt Statistics Online Centralized Database
          • Quarterly External Debt Statistics (QEDS)



           Government and political data

          • This is the  go-to resource for government-related data. It claims to have up to 400,000 data sets, both raw data and geo spatial, in a variety of formats.
          • The only caveat in using the data sets is you have to make sure you clean them, since many have missing values and characters.
          • Socrata is another good place to explore government-related data. One great thing about Socrata is they have some visualization tools that make exploring the data easier.
          • City-specific government data: Some cities have their own data portals setup to browse through city-related data. For example, at San Francisco Data you can browse through everything from crime statistics to parking spot available in the city.
          • The UN and UN-related sites like UNICEF and the World Health Organization are rich with all kinds of data, from mortality rates to world hunger statistics.
          • The Census Bureau houses a ton of information about our lives around income, race, education, population and business.

          Data aggregators

          These are the places that house data from all kinds of sources. Sometimes it’s easier to find something here related to a specific category.

          • Programmable Web: A really useful resource to explore API’s and also mashups of different API’s.
          • Infochimps have a data marketplace that offers thousands of public and propietary data sets for download and API access, in a wide range of categories, from historical Twitter and OK Cupid data, to geo locations data, in different formats. You can even upload you own data if you like.
          • Data Market is a good place to explore data related to economics, healthcare, food and agriculture, and the automotive industry.
          • Google Public data explorer houses a lot of data from world development indicators, OECD and human development indicators, mostly related to economics data and the world.
          • Junar is a great data scraping service that also houses data feeds.
          • Buzzdata is a social data sharing service that allows you to upload your own data and connect and follow others who are uploading their own data.

          3. Social data

          Usually, the best place to get social data for an API is the site itself: InstagramGetGlue, Foursquare, pretty much all social media sites have their own API’s. Here are more details on the most popular ones.

          • Twitter: Access to the Twitter API for historical uses is fairly limited, to 3200 tweets. For more, check out PeopleBrowsr,  Gnip (also offers historical access to the WP Automattic data feed),DataSiftInfochimpsTopsy.
          • Foursquare: They have their own API and you can get it through Infochimps, as well.
          • FacebookThe Facebook graph API is the best resource for Facebook.
          • A great tool for facial recognition data.

          4. Weather data

          • Wunderground has detailed weather information and also let’s you search historical data by zip code or city. It gives temperature, wind, precipitation and hourly observations for that day.
          • Weatherbase has detailed weather stats on temperature, rain and humidity of nearly 27,000 cities.

          5. Sports data

          These three sites have comprehensive information on teams, players coaches and leaders by season.

          ESPN recently came up with its own API, too. You have to be a partner to get access to their data. 

          6. Universities and research

          Searching the work of academics who specialize in a particular area is always a great place to find some interesting data.

          If you come across specific data that you would like to use, say, in a research paper, the best way to go is to contact the professor directly. (That is how we got the data for our What are the Odds piece, which is one of the most-viewed infographics on the web.)

          One university that makes some of the datasets used in its courses publicly available is UCLA.

          7. News data

          The New York Times has a great API and a really good explorer to access any article in the publication. The data is returned in json format.

          The Guardian Data Blog regularly posts visualizations and makes data available through a Google docs format. The great thing about this is that that the data has already been cleaned.

          CDC Data - Source:

          Behavioral Risk Factor Surveillance System (BRFSS)
          The BRFSS is a telephone survey that tracks national and state-specific health risk behaviors of adults, 18 years of age or older, residing in the United States. The BRFSS is conducted by the 50 states, the District of Columbia, and three territories (Guam, Puerto Rico, and the U.S. Virgin Islands) and is administered and supported by the Division of Adult and Community Health, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention (CDC).

          National Health Interview Survey (NHIS)
          The NHIS is a multi-purpose, nationwide household health survey of the U.S. civilian noninstitutionalized population conducted annually by the National Center for Health Statistics (NCHS), CDC, to produce national estimates for a variety of health indicators. In 1994 and 1995, the NHIS included a special supplement on disability.

          National Health and Nutrition Examination Survey (NHANES) 
          NHANES is a program of studies designed to assess the health and nutritional status of adults and children in the United States. The survey is unique in that it combines information from interviews and physical examinations.

          National Survey of Family Growth (NSFG)
          The NSFG gathers information on family life, marriage and divorce, pregnancy, infertility, use of contraception, and men's and women's health. The survey results are used by the U.S. Department of Health and Human Services and others to plan health services and health education programs, and to do statistical studies of families, fertility, and health.

          American Community Survey (ACS)
          The ACS is a mail survey that provides demographic, socioeconomic, and housing information about communities in between the 10-year census. The ACS is conducted by the U.S. Census Bureau. The survey is sent to a sample of households in the United States. The ACS identifies serious difficulty in four basic areas of functioning: vision, hearing, ambulation, and cognition. The ACS also includes two questions to identify people with difficulties that might affect their ability to live independently.

          Medical Expenditure Panel Survey (MEPS) 
          The MEPS comprise a set of large-scale surveys of families and individuals, their medical providers, and employers across the United States. The MEPS is the most complete source of data on the cost and use of health care and health insurance coverage.

          Survey of Income and Program Participation (SIPP)
          The SIPP is a multipanel, longitudinal survey conducted by the U.S. Census Bureau. The SIPP covers the civilian, noninstitutionalized population of residents of the United States, and collects data on the sources and amount of individual income, labor force information, program participation and eligibility data, and general demographic characteristics. The SIPP also includes disability supplements that ask questions to determine individual disability status.

          Current Population Survey (CPS) 
          The CPS is a monthly survey of about 50,000 households conducted by the U.S. Bureau of the Census for the Bureau of Labor Statistics. The survey has been conducted for more than 50 years. In June 2008, questions were added to the CPS to identify people with a disability among the civilian noninstitutional population 16 years of age or older. Monthly labor force data are released from the CPS for people with a disability. The collection of these data is sponsored by the Department of Labor’s Office of Disability Employment.

          Personality Testing - Source:

          5/14/2014Answers to Cattell's 16 Personality Factors Test with items from the IPIP.163 likert rated items, gender, age, country and accuracy.4915916PF
          9/6/2012Answers to the Narcissistic Personality Inventory, constructed with the version from Raskin and Terry (1988).40 multiple choice, gender, age, time elapsed11243NPI
          6/18/2012Answers to the Machivallianism Test, a version of the MACH-IV from Christie and Geis (1970).20 likert rated items, gender, age, time elapsed13156MACH2
          5/18/2014Answers to the Big Five Personality Test, constructed with items from the International Personality Item Pool.50 likert rated statements, gender, age, race, native language, country19719BIG5
          7/22/2012Answers to the Taylor Manifest Anxiety Scale, from Taylor (1953).50 true false statements, gender, age5410TMA
          9/6/2012Answers to the Humor Styles Questionnaire, from Martin et. al. (2003).32 likert rated items, gender, age, self-rated accuracy1071HSQ
          7/16/2012Answers to the Empathizing-Systemizing Test, a combined version of Simon Baron-Cohen's empathizing and systemizing quotients.120 likert rated items, gender, age, self-rated accuracy13256EQSQ
          8/5/2013Answers to the Holland Code (RIASEC) Test, constructed with public domain items from the Interest Item Pool.48 likert rated statements, gender, age, country, time elapsed and self-rated accuracy.8855RIASEC
          7/16/2012Answers to the Sexual Compulsivity Scale from Kalichman and Rompa (1995).10 likert rated statements, gender, age3376SCS
          7/18/2012Answers to the IPIP Assertiveness, Social confidence, Adventurousness, and Dominance scales used as part of an experimental personality test.40 likert rated items, gender, age1005AS+SC+AD+DO
          2/15/2014Answers to the Rosenberg Self-Esteem Scale.10 scale rated items, gender, age, country47974RSE
          5/25/2012Answers to an experimental IQ Test previously offered on this website.25 questions/answers, age, gender.400IQ1
          5/25/2012Answers to a sentence completion survey appended to the Holland Code and big five personality tests; at completion of either test takers were solicited to participate (most did).6 incomplete sentence responses, gender, age, and big five or RIASEC traits.1425SENTANCES1
          8/6/2013Answers to the Experinces in Close Relationships Scale.36 likert rated items, gender, age, county.17386ECR
          9/26/2012Answers to the Consideration of Future Consequences Scale.12 likert rated items, gender, age, self-rated accuracy.614CFCS
          8/7/2012Answers to the Kentucky Inventory of Mindfulness Skills from Baer, Smith and Allen (2004).39 likert rated items, gender, age.601KIMS
          9/6/2012Answers to the Multidimensional Sexual Self-Concept Questionnaire.100 likert rated items, gender, age and context.289MSSCQ
          8/8/2013Answers to the Woodworth Psychoneurotic Inventory.116 yes/no questions, gender, age and country.6019WPI
          12/8/2013Answers to the Hypersensitive Narcissism Scale and The Dirty Dozen.22 scale rated items, gender, age, accuracy and country.53981HSNS+DD
          3/8/2014Answers to the Short Dark Triad by Paulhus and Jones (2011).27 scale rated items and country.18192SD3
          4/21/2014Answers to the Feminist Perspectives Scale, from Henley, N.; Meng, K.; O'Brien, D.; McCarthy, W.; Sockloskie, R. (1998). "Developing a Scale to Measure the Diversity of Feminist Attitudes". Psychology of Women Quarterly, 22(2), 317-348.60 scale rated items, gender, age, country.13477FPS
          5/21/2014Answers to the Wagner Preference Inventory, from Wagner, Rudolph F., and Kelly A. Wells. "A refined neurobehavioral inventory of hemispheric preference." Journal of clinical psychology 41.5 (1985): 671-676.12 multiple choice questions, country13502Wagner
          5/23/2014A user generated corpus of personality test items from a short survey were users prompted to generate descriptions of what was unqiue about their personality.3 free response, age, gender, native language, country2722itemsgen
          6/21/2014Answers to the IPIP HEXACO equivalent scales.240 scale rated items, country22786HEXACO



          Source: Awesome Public Datasets








          Complex Networks


          Computer Networks


          Contextual Data


          Data Challenges


















          Image Processing


          Machine Learning




          Natural Language






          Public Domains


          Search Engines


          Social Networks


          Social Sciences






          Time Series




          Complementary Collections


          Source: Neo4J

          Source:  Vincent Arel-Bundock



          datasetsAirPassengersMonthly Airline Passenger Numbers 1949-1960CSV DOC
          datasetsBJsalesSales Data with Leading IndicatorCSV DOC
          datasetsBODBiochemical Oxygen DemandCSV DOC
          datasetsCO2Carbon Dioxide Uptake in Grass PlantsCSV DOC
          datasetsFormaldehydeDetermination of FormaldehydeCSV DOC
          datasetsHairEyeColorHair and Eye Color of Statistics StudentsCSV DOC
          datasetsInsectSpraysEffectiveness of Insect SpraysCSV DOC
          datasetsJohnsonJohnsonQuarterly Earnings per Johnson & Johnson ShareCSV DOC
          datasetsLakeHuronLevel of Lake Huron 1875-1972CSV DOC
          datasetsLifeCycleSavingsIntercountry Life-Cycle Savings DataCSV DOC
          datasetsNileFlow of the River NileCSV DOC
          datasetsOrchardSpraysPotency of Orchard SpraysCSV DOC
          datasetsPlantGrowthResults from an Experiment on Plant GrowthCSV DOC
          datasetsPuromycinReaction Velocity of an Enzymatic ReactionCSV DOC
          datasetsTitanicSurvival of passengers on the TitanicCSV DOC
          datasetsToothGrowthThe Effect of Vitamin C on Tooth Growth in Guinea PigsCSV DOC
          datasetsUCBAdmissionsStudent Admissions at UC BerkeleyCSV DOC
          datasetsUKDriverDeathsRoad Casualties in Great Britain 1969-84CSV DOC
          datasetsUKgasUK Quarterly Gas ConsumptionCSV DOC
          datasetsUSAccDeathsAccidental Deaths in the US 1973-1978CSV DOC
          datasetsUSArrestsViolent Crime Rates by US StateCSV DOC
          datasetsUSJudgeRatingsLawyers' Ratings of State Judges in the US Superior CourtCSV DOC
          datasetsUSPersonalExpenditurePersonal Expenditure DataCSV DOC
          datasetsVADeathsDeath Rates in Virginia (1940)CSV DOC
          datasetsWWWusageInternet Usage per MinuteCSV DOC
          datasetsWorldPhonesThe World's TelephonesCSV DOC
          datasetsairmilesPassenger Miles on Commercial US Airlines, 1937-1960CSV DOC
          datasetsairqualityNew York Air Quality MeasurementsCSV DOC
          datasetsanscombeAnscombe's Quartet of 'Identical' Simple Linear RegressionsCSV DOC
          datasetsattenuThe Joyner-Boore Attenuation DataCSV DOC
          datasetsattitudeThe Chatterjee-Price Attitude DataCSV DOC
          datasetsaustresQuarterly Time Series of the Number of Australian ResidentsCSV DOC
          datasetscarsSpeed and Stopping Distances of CarsCSV DOC
          datasetschickwtsChicken Weights by Feed TypeCSV DOC
          datasetsco2Mauna Loa Atmospheric CO2 ConcentrationCSV DOC
          datasetscrimtabStudent's 3000 Criminals DataCSV DOC
          datasetsdiscoveriesYearly Numbers of Important DiscoveriesCSV DOC
          datasetsesophSmoking, Alcohol and (O)esophageal CancerCSV DOC
          datasetseuroConversion Rates of Euro CurrenciesCSV DOC
          datasetsfaithfulOld Faithful Geyser DataCSV DOC
          datasetsfreenyFreeny's Revenue DataCSV DOC
          datasetsinfertInfertility after Spontaneous and Induced AbortionCSV DOC
          datasetsirisEdgar Anderson's Iris DataCSV DOC
          datasetsislandsAreas of the World's Major LandmassesCSV DOC
          datasetslhLuteinizing Hormone in Blood SamplesCSV DOC
          datasetslongleyLongley's Economic Regression DataCSV DOC
          datasetslynxAnnual Canadian Lynx trappings 1821-1934CSV DOC
          datasetsmorleyMichelson Speed of Light DataCSV DOC
          datasetsmtcarsMotor Trend Car Road TestsCSV DOC
          datasetsnhtempAverage Yearly Temperatures in New HavenCSV DOC
          datasetsnottemAverage Monthly Temperatures at Nottingham, 1920-1939CSV DOC
          datasetsnpkClassical N, P, K Factorial ExperimentCSV DOC
          datasetsoccupationalStatusOccupational Status of Fathers and their SonsCSV DOC
          datasetsprecipAnnual Precipitation in US CitiesCSV DOC
          datasetspresidentsQuarterly Approval Ratings of US PresidentsCSV DOC
          datasetspressureVapor Pressure of Mercury as a Function of TemperatureCSV DOC
          datasetsquakesLocations of Earthquakes off FijiCSV DOC
          datasetsranduRandom Numbers from Congruential Generator RANDUCSV DOC
          datasetsriversLengths of Major North American RiversCSV DOC
          datasetsrockMeasurements on Petroleum Rock SamplesCSV DOC
          datasetssleepStudent's Sleep DataCSV DOC
          datasetsstacklossBrownlee's Stack Loss Plant DataCSV DOC
          datasetssunspot.monthMonthly Sunspot Data, from 1749 to "Present"CSV DOC
          datasetssunspot.yearYearly Sunspot Data, 1700-1988CSV DOC
          datasetssunspotsMonthly Sunspot Numbers, 1749-1983CSV DOC
          datasetsswissSwiss Fertility and Socioeconomic Indicators (1888) DataCSV DOC
          datasetstreeringYearly Treering Data, -6000-1979CSV DOC
          datasetstreesGirth, Height and Volume for Black Cherry TreesCSV DOC
          datasetsuspopPopulations Recorded by the US CensusCSV DOC
          datasetsvolcanoTopographic Information on Auckland's Maunga Whau VolcanoCSV DOC
          datasetswarpbreaksThe Number of Breaks in Yarn during WeavingCSV DOC
          datasetswomenAverage Heights and Weights for American WomenCSV DOC
          bootacmeMonthly Excess ReturnsCSV DOC
          bootaidsDelay in AIDS Reporting in England and WalesCSV DOC
          bootairconditFailures of Air-conditioning EquipmentCSV DOC
          bootaircondit7Failures of Air-conditioning EquipmentCSV DOC
          bootamisCar Speeding and Warning SignsCSV DOC
          bootamlRemission Times for Acute Myelogenous LeukaemiaCSV DOC
          bootbigcityPopulation of U.S. CitiesCSV DOC
          bootbramblesSpatial Location of Bramble CanesCSV DOC
          bootbreslowSmoking Deaths Among DoctorsCSV DOC
          bootcalciumCalcium Uptake DataCSV DOC
          bootcaneSugar-cane Disease DataCSV DOC
          bootcapabilitySimulated Manufacturing Process DataCSV DOC
          bootcatsMWeight Data for Domestic CatsCSV DOC
          bootcavPosition of Muscle CaveolaeCSV DOC
          bootcd4CD4 Counts for HIV-Positive PatientsCSV DOC
          bootchanningChanning House DataCSV DOC
          bootcityPopulation of U.S. CitiesCSV DOC
          bootclaridgeGenetic Links to Left-handednessCSV DOC
          bootclothNumber of Flaws in ClothCSV DOC
          bootco.transferCarbon Monoxide TransferCSV DOC
          bootcoalDates of Coal Mining DisastersCSV DOC
          bootdarwinDarwin's Plant Height DifferencesCSV DOC
          bootdogsCardiac Data for Domestic DogsCSV DOC
          bootdowns.bcIncidence of Down's Syndrome in British ColumbiaCSV DOC
          bootducksBehavioral and Plumage Characteristics of Hybrid DucksCSV DOC
          bootfirCounts of Balsam-fir SeedlingsCSV DOC
          bootfretsHead Dimensions in BrothersCSV DOC
          bootgravAcceleration Due to GravityCSV DOC
          bootgravityAcceleration Due to GravityCSV DOC
          boothiroseFailure Time of PET FilmCSV DOC
          bootislayJura Quartzite Azimuths on IslayCSV DOC
          bootmanausAverage Heights of the Rio Negro river at ManausCSV DOC
          bootmelanomaSurvival from Malignant MelanomaCSV DOC
          bootmotorData from a Simulated Motorcycle AccidentCSV DOC
          bootneuroNeurophysiological Point Process DataCSV DOC
          bootnitrofenToxicity of Nitrofen in Aquatic SystemsCSV DOC
          bootnodalNodal Involvement in Prostate CancerCSV DOC
          bootnuclearNuclear Power Station Construction DataCSV DOC
          bootpaulsenNeurotransmission in Guinea Pig BrainsCSV DOC
          bootpoisonsAnimal Survival TimesCSV DOC
          bootpolarPole Positions of New Caledonian LateritesCSV DOC
          bootremissionCancer Remission and Cell ActivityCSV DOC
          bootsalinityWater Salinity and River DischargeCSV DOC
          bootsurvivalSurvival of Rats after Radiation DosesCSV DOC
          boottauTau Particle Decay ModesCSV DOC
          boottunaTuna Sighting DataCSV DOC
          booturineUrine Analysis DataCSV DOC
          bootwoolAustralian Relative Wool PricesCSV DOC
          KMsurvaidsdata from Section 1.19CSV DOC
          KMsurvalloautodata from Section 1.9CSV DOC
          KMsurvallograftdata from Exercise 13.1, p418CSV DOC
          KMsurvaztdata from Exercise 4.7, p122CSV DOC
          KMsurvbaboondata from Exercise 5.8, p147CSV DOC
          KMsurvbcdeterdata from Section 1.18CSV DOC
          KMsurvbfeeddata from Section 1.14CSV DOC
          KMsurvbmtdata from Section 1.3CSV DOC
          KMsurvbnctdata from Exercise 7.7, p223CSV DOC
          KMsurvbtrialdata from Section 1.5CSV DOC
          KMsurvburndata from Section 1.6CSV DOC
          KMsurvchanningdata from Section 1.16CSV DOC
          KMsurvdrug6mpdata from Section 1.2CSV DOC
          KMsurvdrughivdata from Exercise 7.6, p222CSV DOC
          KMsurvhodgdata from Section 1.10CSV DOC
          KMsurvkidneydata from Section 1.4CSV DOC
          KMsurvkidrecurrData on 38 individuals using a kidney dialysis machineCSV DOC
          KMsurvkidtrandata from Section 1.7CSV DOC
          KMsurvlarynxdata from Section 1.8CSV DOC
          KMsurvlungdata from Exercise 4.4, p120CSV DOC
          KMsurvpneumondata from Section 1.13CSV DOC
          KMsurvpsychdata from Section 1.15CSV DOC
          KMsurvratsdata from Exercise 7.13, p225CSV DOC
          KMsurvstddata from Section 1.12CSV DOC
          KMsurvstddiagdata from Exercise 5.6, p146CSV DOC
          KMsurvtonguedata from Section 1.11CSV DOC
          KMsurvtwinsdata from Exercise 7.14, p225CSV DOC
          robustbaseAnimals2Brain and Body Weights for 65 Species of Land AnimalsCSV DOC
          robustbaseCrohnDCrohn's Disease Adverse Events DataCSV DOC
          robustbaseNOxEmissionsNOx Air Pollution DataCSV DOC
          robustbaseSiegelsExSiegel's Exact Fit Example DataCSV DOC
          robustbaseaircraftAircraft DataCSV DOC
          robustbaseairmayAir Quality DataCSV DOC
          robustbasealcoholAlcohol Solubility in Water DataCSV DOC
          robustbaseambientNOxCHDaily Means of NOx (mono-nitrogen oxides) in airCSV DOC
          robustbasebiomassTillBiomass Tillage DataCSV DOC
          robustbasebushfireCampbell Bushfire DataCSV DOC
          robustbasecarrotsInsect Damages on CarrotsCSV DOC
          robustbasecloudCloud point of a LiquidCSV DOC
          robustbasecolemanColeman Data SetCSV DOC
          robustbasecondrozCondroz DataCSV DOC
          robustbasecushnyCushny and Peebles Prolongation of Sleep DataCSV DOC
          robustbasedeliveryDelivery Time DataCSV DOC
          robustbaseeducationEducation Expenditure DataCSV DOC
          robustbaseepilepsyEpilepsy Attacks Data SetCSV DOC
          robustbaseexAMExample Data of Antille and May - for Simple RegressionCSV DOC
          robustbasefoodstampFood Stamp Program ParticipationCSV DOC
          robustbasehbkHawkins, Bradu, Kass's Artificial DataCSV DOC
          robustbaseheartHeart Catherization DataCSV DOC
          robustbasekootenayWaterflow Measurements of Kootenay River in Libby and NewgateCSV DOC
          robustbaselacticLactic Acid Concentration Measurement DataCSV DOC
          robustbasemilkDaudin's Milk Composition DataCSV DOC
          robustbasepensionPension Funds DataCSV DOC
          robustbasephosphorPhosphorus Content DataCSV DOC
          robustbasepilotPilot-Plant DataCSV DOC
          robustbasepossumDivPossum Diversity DataCSV DOC
          robustbasepulpfiberPulp Fiber and Paper DataCSV DOC
          robustbaseradarImageSatellite Radar Image Data from near MunichCSV DOC
          robustbasesalinitySalinity DataCSV DOC
          robustbasestarsCYGHertzsprung-Russell Diagram Data of Star Cluster CYG OB1CSV DOC
          robustbasetelefNumber of International Calls from BelgiumCSV DOC
          robustbasetoxicityToxicity of Carboxylic Acids DataCSV DOC
          robustbasevasoVaso Constriction Skin Data SetCSV DOC
          robustbasewagnerGrowthWagner's Hannover Employment Growth DataCSV DOC
          robustbasewoodModified Data on Wood Specific GravityCSV DOC
          carAMSsurveyAmerican Math Society Survey DataCSV DOC
          carAdlerExperimenter ExpectationsCSV DOC
          carAngellMoral Integration of American CitiesCSV DOC
          carAnscombeU. S. State Public-School ExpendituresCSV DOC
          carBaumannMethods of Teaching Reading ComprehensionCSV DOC
          carBfoxCanadian Women's Labour-Force ParticipationCSV DOC
          carBlackmoreExercise Histories of Eating-Disordered and Control SubjectsCSV DOC
          carBurtFraudulent Data on IQs of Twins Raised ApartCSV DOC
          carCanPopCanadian Population DataCSV DOC
          carChileVoting Intentions in the 1988 Chilean PlebisciteCSV DOC
          carChirotThe 1907 Romanian Peasant RebellionCSV DOC
          carCowlesCowles and Davis's Data on VolunteeringCSV DOC
          carDavisSelf-Reports of Height and WeightCSV DOC
          carDavisThinDavis's Data on Drive for ThinnessCSV DOC
          carDepredationsMinnesota Wolf Depredation DataCSV DOC
          carDuncanDuncan's Occupational Prestige DataCSV DOC
          carEricksenThe 1980 U.S. Census UndercountCSV DOC
          carFloridaFlorida County VotingCSV DOC
          carFreedmanCrowding and Crime in U. S. Metropolitan AreasCSV DOC
          carFriendlyFormat Effects on RecallCSV DOC
          carGinzbergData on DepressionCSV DOC
          carGreeneRefugee AppealsCSV DOC
          carGuyerAnonymity and CooperationCSV DOC
          carHartnagelCanadian Crime-Rates Time SeriesCSV DOC
          carHighway1Highway AccidentsCSV DOC
          carKosteckiDillonTreatment of Migraine HeadachesCSV DOC
          carLeinhardtData on Infant-MortalityCSV DOC
          carLoBDCancer drug data use to provide an example of the use of the skew power distributions.CSV DOC
          carMandelContrived Collinear DataCSV DOC
          carMigrationCanadian Interprovincial Migration DataCSV DOC
          carMooreStatus, Authoritarianism, and ConformityCSV DOC
          carMrozU.S. Women's Labor-Force ParticipationCSV DOC
          carOBrienKaiserO'Brien and Kaiser's Repeated-Measures DataCSV DOC
          carOrnsteinInterlocking Directorates Among Major Canadian FirmsCSV DOC
          carPotteryChemical Composition of PotteryCSV DOC
          carPrestigePrestige of Canadian OccupationsCSV DOC
          carQuartetFour Regression DatasetsCSV DOC
          carRobeyFertility and ContraceptionCSV DOC
          carSLIDSurvey of Labour and Income DynamicsCSV DOC
          carSahlinsAgricultural Production in Mazulu VillageCSV DOC
          carSalariesSalaries for ProfessorsCSV DOC
          carSoilsSoil Compositions of Physical and Chemical CharacteristicsCSV DOC
          carStatesEducation and Related Statistics for the U.S. StatesCSV DOC
          carTransactTransaction dataCSV DOC
          carUNGDP and Infant MortalityCSV DOC
          carUSPopPopulation of the United StatesCSV DOC
          carVocabVocabulary and EducationCSV DOC
          carWeightLossWeight Loss DataCSV DOC
          carWomenlfCanadian Women's Labour-Force ParticipationCSV DOC
          carWongPost-Coma Recovery of IQCSV DOC
          carWoolWool dataCSV DOC
          clusteragricultureEuropean Union Agricultural WorkforcesCSV DOC
          clusteranimalsAttributes of AnimalsCSV DOC
          clusterchorSubSubset of C-horizon of Kola DataCSV DOC
          clusterflowerFlower CharacteristicsCSV DOC
          clusterplantTraitsPlant Species Traits DataCSV DOC
          clusterplutonIsotopic Composition Plutonium BatchesCSV DOC
          clusterruspiniRuspini DataCSV DOC
          clustervotes.repubVotes for Republican Candidate in Presidential ElectionsCSV DOC
          clusterxclaraBivariate Data Set with 3 ClustersCSV DOC
          COUNTaffairsaffairsCSV DOC
          COUNTazcabgptcaazcabgptcaCSV DOC
          COUNTazdrg112azdrg112CSV DOC
          COUNTazproazproCSV DOC
          COUNTazprocedureazprocedureCSV DOC
          COUNTbadhealthbadhealthCSV DOC
          COUNTfasttrakgfasttrakgCSV DOC
          COUNTfishingfishingCSV DOC
          COUNTlbwlbwCSV DOC
          COUNTlbwgrplbwgrpCSV DOC
          COUNTloomisloomisCSV DOC
          COUNTmdvismdvisCSV DOC
          COUNTmedparmedparCSV DOC
          COUNTnutsnutsCSV DOC
          COUNTrwmrwmCSV DOC
          COUNTrwm1984rwm1984CSV DOC
          COUNTrwm5yrrwm5yrCSV DOC
          COUNTshipsshipsCSV DOC
          COUNTsmokingsmokingCSV DOC
          COUNTtitanictitanicCSV DOC
          COUNTtitanicgrptitanicgrpCSV DOC
          EcdatAccidentShip AccidentsCSV DOC
          EcdatAirlineCost for U.S. AirlinesCSV DOC
          EcdatAirqAir Quality for Californian Metropolitan AreasCSV DOC
          EcdatBenefitsUnemployement of Blue Collar WorkersCSV DOC
          EcdatBidsBids Received By U.S. FirmsCSV DOC
          EcdatBudgetFoodBudget Share of Food for Spanish HouseholdsCSV DOC
          EcdatBudgetItalyBudget Shares for Italian HouseholdsCSV DOC
          EcdatBudgetUKBudget Shares of British HouseholdsCSV DOC
          EcdatBwagesWages in BelgiumCSV DOC
          EcdatCPSch3Earnings from the Current Population SurveyCSV DOC
          EcdatCRANpackagesGrowth of CRANCSV DOC
          EcdatCapmStock Market DataCSV DOC
          EcdatCarStated Preferences for Car ChoiceCSV DOC
          EcdatCaschoolThe California Test Score Data SetCSV DOC
          EcdatCatsupChoice of Brand for CatsupCSV DOC
          EcdatCigarCigarette ConsumptionCSV DOC
          EcdatCigaretteThe Cigarette Consumption Panel Data SetCSV DOC
          EcdatClothingSales Data of Men's Fashion StoresCSV DOC
          EcdatComputersPrices of Personal ComputersCSV DOC
          EcdatCrackerChoice of Brand for CrakersCSV DOC
          EcdatCrimeCrime in North CarolinaCSV DOC
          EcdatDMDM Dollar Exchange RateCSV DOC
          EcdatDiamondPricing the C's of Diamond StonesCSV DOC
          EcdatDoctorNumber of Doctor VisitsCSV DOC
          EcdatDoctorAUSDoctor Visits in AustraliaCSV DOC
          EcdatDoctorContactsContacts With Medical DoctorCSV DOC
          EcdatEarningsEarnings for Three Age GroupsCSV DOC
          EcdatElectricityCost Function for Electricity ProducersCSV DOC
          EcdatFairExtramarital Affairs DataCSV DOC
          EcdatFatalityDrunk Driving Laws and Traffic DeathsCSV DOC
          EcdatFishingChoice of Fishing ModeCSV DOC
          EcdatForwardExchange Rates of US Dollar Against Other CurrenciesCSV DOC
          EcdatFriendFoeData from the Television Game Show Friend Or Foe ?CSV DOC
          EcdatGarchDaily Observations on Exchange Rates of the US Dollar Against Other CurrenciesCSV DOC
          EcdatGasolineGasoline ConsumptionCSV DOC
          EcdatGrilichesWage DatasCSV DOC
          EcdatGrunfeldGrunfeld Investment DataCSV DOC
          EcdatHCHeating and Cooling System Choice in Newly Built Houses in CaliforniaCSV DOC
          EcdatHHSCyberSecurityBreachesCybersecurity breaches reported to the US Department of Health and Human ServicesCSV DOC
          EcdatHIHealth Insurance and Hours Worked By WivesCSV DOC
          EcdatHdmaThe Boston HDMA Data SetCSV DOC
          EcdatHeatingHeating System Choice in California HousesCSV DOC
          EcdatHedonicHedonic Prices of Cencus Tracts in BostonCSV DOC
          EcdatHousingSales Prices of Houses in the City of WindsorCSV DOC
          EcdatIcecreamIce Cream ConsumptionCSV DOC
          EcdatJournalsEconomic Journals Dat SetCSV DOC
          EcdatKakaduWillingness to Pay for the Preservation of the Kakadu National ParkCSV DOC
          EcdatKetchupChoice of Brand for KetchupCSV DOC
          EcdatKleinKlein's Model ICSV DOC
          EcdatLaborSupplyWages and Hours WorkedCSV DOC
          EcdatLabourBelgian FirmsCSV DOC
          EcdatMCASThe Massashusets Test Score Data SetCSV DOC
          EcdatMalesWages and Education of Young MalesCSV DOC
          EcdatMathlevelLevel of Calculus Attained for Students Taking Advanced Micro-economicsCSV DOC
          EcdatMedExpStructure of Demand for Medical CareCSV DOC
          EcdatMetalProduction for SIC 33CSV DOC
          EcdatModeMode ChoiceCSV DOC
          EcdatModeChoiceData to Study Travel Mode ChoiceCSV DOC
          EcdatMofaInternational Expansion of U.S. Mofa's (majority-owned Foreign Affiliates in Fire (finance, Insurance and Real Estate)CSV DOC
          EcdatMrozLabor Supply DataCSV DOC
          EcdatMunExpMunicipal Expenditure DataCSV DOC
          EcdatNaturalParkWillingness to Pay for the Preservation of the Alentejo Natural ParkCSV DOC
          EcdatNerloveCost Function for Electricity Producers, 1955CSV DOC
          EcdatOFPVisits to Physician OfficeCSV DOC
          EcdatOilOil InvestmentCSV DOC
          EcdatPSIDPanel Survey of Income DynamicsCSV DOC
          EcdatParticipationLabor Force ParticipationCSV DOC
          EcdatPatentsHGHDynamic Relation Between Patents and R&DCSV DOC
          EcdatPatentsRDPatents, R&D and Technological Spillovers for a Panel of FirmsCSV DOC
          EcdatPoundPound-dollar Exchange RateCSV DOC
          EcdatProducUs States ProductionCSV DOC
          EcdatRetSchoolReturn to SchoolingCSV DOC
          EcdatSP500Returns on Standard & Poor's 500 IndexCSV DOC
          EcdatSchoolingWages and SchoolingCSV DOC
          EcdatSomervilleVisits to Lake SomervilleCSV DOC
          EcdatStarEffects on Learning of Small Class SizesCSV DOC
          EcdatStrikeStrike Duration DataCSV DOC
          EcdatStrikeDurStrikes DurationCSV DOC
          EcdatStrikeNbNumber of Strikes in Us ManufacturingCSV DOC
          EcdatSumHesThe Penn TableCSV DOC
          EcdatTobaccoHouseholds Tobacco Budget ShareCSV DOC
          EcdatTrainStated Preferences for Train TravelingCSV DOC
          EcdatTranspEqStatewide Data on Transportation Equipment ManufacturingCSV DOC
          EcdatTreatmentEvaluating Treatment Effect of Training on EarningsCSV DOC
          EcdatTunaChoice of Brand for TunaCSV DOC
          EcdatUSFinanceIndustryUS Finance Industry ProfitsCSV DOC
          EcdatUSclassifiedDocumentsOfficial Secrecy of the United States GovernmentCSV DOC
          EcdatUSstateAbbreviationsStandard abbreviations for states of the United StatesCSV DOC
          EcdatUStaxWordsNumber of Words in US Tax LawCSV DOC
          EcdatUnempDurUnemployment DurationCSV DOC
          EcdatUnemploymentUnemployment DurationCSV DOC
          EcdatUniversityProvision of University Teaching and ResearchCSV DOC
          EcdatVietNamHMedical Expenses in Viet-nam (household Level)CSV DOC
          EcdatVietNamIMedical Expenses in Viet-nam (individual Level)CSV DOC
          EcdatWagesPanel Datas of Individual WagesCSV DOC
          EcdatWages1Wages, Experience and SchoolingCSV DOC
          EcdatWorkinghoursWife Working HoursCSV DOC
          EcdatYenYen-dollar Exchange RateCSV DOC
          EcdatYogurtChoice of Brand for YogurtsCSV DOC
          EcdatbankingCrisesCountries in Banking CrisesCSV DOC
          EcdatbreachesCyber Security BreachesCSV DOC
          EcdatincomeInequalityIncome Inequality in the USCSV DOC
          EcdatnonEnglishNamesNames with Character Set ProblemsCSV DOC
          EcdatpoliticalKnowledgePolitical knowledge in the US and EuropeCSV DOC
          gapPDA study of Parkinson's disease and APOE, LRRK2, SNCA makersCSV DOC
          gapaldh2ALDH2 markers and AlcoholismCSV DOC
          gapapoeapocAPOE/APOC1 markers and Alzheimer'sCSV DOC
          gapcfCystic fibrosis dataCSV DOC
          gapcrohnCrohn's disease dataCSV DOC
          gapfaFriedreich Ataxia dataCSV DOC
          gapfsnpsA case-control data involving four SNPs with missing genotypeCSV DOC
          gaphlaThe HLA dataCSV DOC
          gaphr1420An example data for Manhattan plot with annotationCSV DOC
          gapl51An example pedigree dataCSV DOC
          gaplukasAn example pedigreeCSV DOC
          gapmaoA study of Parkinson's disease and MAO geneCSV DOC
          gapmeyerA pedigree data on 282 animals deriving from two generationsCSV DOC
          gapmfblongExample data for ACEnucfamCSV DOC
          gapmhtdataAn example data for Manhattan plotCSV DOC
          gapnep499A study of Alzheimer's disease with eight SNPs and APOECSV DOC
          ggplot2luv_colours'colors()' in Luv space.CSV DOC
          HistDataArbuthnotArbuthnot's data on male and female birth ratios in London from 1629-1710.CSV DOC
          HistDataArmadaLa Felicisima ArmadaCSV DOC
          HistDataBowleyBowley's data on values of British and Irish trade, 1855-1899CSV DOC
          HistDataCavendishCavendish's Determinations of the Density of the EarthCSV DOC
          HistDataChestSizesChest measurements of 5738 Scottish MilitiamenCSV DOC
          HistDataCushnyPeeblesCushny-Peebles Data: Soporific Effects of Scopolamine DerivativesCSV DOC
          HistDataCushnyPeeblesNCushny-Peebles Data: Soporific Effects of Scopolamine DerivativesCSV DOC
          HistDataDactylEdgeworth's counts of dactyls in Virgil's AeneidCSV DOC
          HistDataDrinksWagesElderton and Pearson's (1910) data on drinking and wagesCSV DOC
          HistDataFingerprintsWaite's data on Patterns in FingerprintsCSV DOC
          HistDataGaltonGalton's data on the heights of parents and their childrenCSV DOC
          HistDataGaltonFamiliesGalton's data on the heights of parents and their children, by childCSV DOC
          HistDataGuerryData from A.-M. Guerry, "Essay on the Moral Statistics of France"CSV DOC
          HistDataJevonsW. Stanley Jevons' data on numerical discriminationCSV DOC
          HistDataLangren.allvan Langren's Data on Longitude Distance between Toledo and RomeCSV DOC
          HistDataLangren1644van Langren's Data on Longitude Distance between Toledo and RomeCSV DOC
          HistDataMacdonellMacdonell's Data on Height and Finger Length of Criminals, used by Gosset (1908)CSV DOC
          HistDataMacdonellDFMacdonell's Data on Height and Finger Length of Criminals, used by Gosset (1908)CSV DOC
          HistDataMichelsonMichelson's Determinations of the Velocity of LightCSV DOC
          HistDataMichelsonSetsMichelson's Determinations of the Velocity of LightCSV DOC
          HistDataMinard.citiesData from Minard's famous graphic map of Napoleon's march on MoscowCSV DOC
          HistDataMinard.tempData from Minard's famous graphic map of Napoleon's march on MoscowCSV DOC
          HistDataMinard.troopsData from Minard's famous graphic map of Napoleon's march on MoscowCSV DOC
          HistDataNightingaleFlorence Nightingale's data on deaths from various causes in the Crimean WarCSV DOC
          HistDataOldMapsLatitudes and Longitudes of 39 Points in 11 Old MapsCSV DOC
          HistDataPearsonLeePearson and Lee's data on the heights of parents and children classified by genderCSV DOC
          HistDataPolioTrialsPolio Field Trials DataCSV DOC
          HistDataProstitutesParent-Duchatelet's time-series data on the number of prostitutes in ParisCSV DOC
          HistDataPyxTrial of the PyxCSV DOC
          HistDataQuarrelsStatistics of Deadly QuarrelsCSV DOC
          HistDataSnow.deathsJohn Snow's map and data on the 1854 London Cholera outbreakCSV DOC
          HistDataSnow.deaths2John Snow's map and data on the 1854 London Cholera outbreakCSV DOC
          HistDataSnow.polygonsJohn Snow's map and data on the 1854 London Cholera outbreakCSV DOC
          HistDataSnow.pumpsJohn Snow's map and data on the 1854 London Cholera outbreakCSV DOC
          HistDataSnow.streetsJohn Snow's map and data on the 1854 London Cholera outbreakCSV DOC
          HistDataWheatPlayfair's Data on Wages and the Price of WheatCSV DOC
          HistDataWheat.monarchsPlayfair's Data on Wages and the Price of WheatCSV DOC
          HistDataYeastStudent's (1906) Yeast Cell CountsCSV DOC
          HistDataYeastD.matStudent's (1906) Yeast Cell CountsCSV DOC
          HistDataZeaMaysDarwin's Heights of Cross- and Self-fertilized Zea May PairsCSV DOC
          latticebarleyYield data from a Minnesota barley trialCSV DOC
          latticeenvironmentalAtmospheric environmental conditions in New York CityCSV DOC
          latticeethanolEngine exhaust fumes from burning ethanolCSV DOC
          latticemelanomaMelanoma skin cancer incidenceCSV DOC
          latticesingerHeights of New York Choral Society singersCSV DOC
          MASSAids2Australian AIDS Survival DataCSV DOC
          MASSAnimalsBrain and Body Weights for 28 SpeciesCSV DOC
          MASSBostonHousing Values in Suburbs of BostonCSV DOC
          MASSCars93Data from 93 Cars on Sale in the USA in 1993CSV DOC
          MASSCushingsDiagnostic Tests on Patients with Cushing's SyndromeCSV DOC
          MASSDDTDDT in KaleCSV DOC
          MASSGAGurineLevel of GAG in Urine of ChildrenCSV DOC
          MASSInsuranceNumbers of Car Insurance claimsCSV DOC
          MASSMelanomaSurvival from Malignant MelanomaCSV DOC
          MASSOMETests of Auditory Perception in Children with OMECSV DOC
          MASSPima.teDiabetes in Pima Indian WomenCSV DOC
          MASSPima.trDiabetes in Pima Indian WomenCSV DOC
          MASSPima.tr2Diabetes in Pima Indian WomenCSV DOC
          MASSRabbitBlood Pressure in RabbitsCSV DOC
          MASSRubberAccelerated Testing of Tyre RubberCSV DOC
          MASSSP500Returns of the Standard and Poors 500CSV DOC
          MASSSitkaGrowth Curves for Sitka Spruce Trees in 1988CSV DOC
          MASSSitka89Growth Curves for Sitka Spruce Trees in 1989CSV DOC
          MASSSkyeAFM Compositions of Aphyric Skye LavasCSV DOC
          MASSTrafficEffect of Swedish Speed Limits on AccidentsCSV DOC
          MASSUScerealNutritional and Marketing Information on US CerealsCSV DOC
          MASSUScrimeThe Effect of Punishment Regimes on Crime RatesCSV DOC
          MASSVAVeteran's Administration Lung Cancer TrialCSV DOC
          MASSabbeyDeterminations of Nickel ContentCSV DOC
          MASSaccdeathsAccidental Deaths in the US 1973-1978CSV DOC
          MASSanorexiaAnorexia Data on Weight ChangeCSV DOC
          MASSbacteriaPresence of Bacteria after Drug TreatmentsCSV DOC
          MASSbeav1Body Temperature Series of Beaver 1CSV DOC
          MASSbeav2Body Temperature Series of Beaver 2CSV DOC
          MASSbiopsyBiopsy Data on Breast Cancer PatientsCSV DOC
          MASSbirthwtRisk Factors Associated with Low Infant Birth WeightCSV DOC
          MASScabbagesData from a cabbage field trialCSV DOC
          MASScaithColours of Eyes and Hair of People in CaithnessCSV DOC
          MASScatsAnatomical Data from Domestic CatsCSV DOC
          MASScementHeat Evolved by Setting CementsCSV DOC
          MASSchemCopper in Wholemeal FlourCSV DOC
          MASScoopCo-operative Trial in Analytical ChemistryCSV DOC
          MASScpusPerformance of Computer CPUsCSV DOC
          MASScrabsMorphological Measurements on Leptograpsus CrabsCSV DOC
          MASSdeathsMonthly Deaths from Lung Diseases in the UKCSV DOC
          MASSdriversDeaths of Car Drivers in Great Britain 1969-84CSV DOC
          MASSeaglesForaging Ecology of Bald EaglesCSV DOC
          MASSepilSeizure Counts for EpilepticsCSV DOC
          MASSfarmsEcological Factors in Farm ManagementCSV DOC
          MASSfglMeasurements of Forensic Glass FragmentsCSV DOC
          MASSforbesForbes' Data on Boiling Points in the AlpsCSV DOC
          MASSgalaxiesVelocities for 82 GalaxiesCSV DOC
          MASSgehanRemission Times of Leukaemia PatientsCSV DOC
          MASSgenotypeRat Genotype DataCSV DOC
          MASSgeyserOld Faithful Geyser DataCSV DOC
          MASSgilgaisLine Transect of Soil in Gilgai TerritoryCSV DOC
          MASShillsRecord Times in Scottish Hill RacesCSV DOC
          MASShousingFrequency Table from a Copenhagen Housing Conditions SurveyCSV DOC
          MASSimmerYields from a Barley Field TrialCSV DOC
          MASSleukSurvival Times and White Blood Counts for Leukaemia PatientsCSV DOC
          MASSmammalsBrain and Body Weights for 62 Species of Land MammalsCSV DOC
          MASSmcycleData from a Simulated Motorcycle AccidentCSV DOC
          MASSmenarcheAge of Menarche in WarsawCSV DOC
          MASSmichelsonMichelson's Speed of Light DataCSV DOC
          MASSminn38Minnesota High School Graduates of 1938CSV DOC
          MASSmotorsAccelerated Life Testing of MotorettesCSV DOC
          MASSmuscleEffect of Calcium Chloride on Muscle Contraction in Rat HeartsCSV DOC
          MASSnewcombNewcomb's Measurements of the Passage Time of LightCSV DOC
          MASSnlschoolsEighth-Grade Pupils in the NetherlandsCSV DOC
          MASSnpkClassical N, P, K Factorial ExperimentCSV DOC
          MASSnpr1US Naval Petroleum Reserve No. 1 dataCSV DOC
          MASSoatsData from an Oats Field TrialCSV DOC
          MASSpaintersThe Painter's Data of de PilesCSV DOC
          MASSpetrolN. L. Prater's Petrol Refinery DataCSV DOC
          MASSquineAbsenteeism from School in Rural New South WalesCSV DOC
          MASSroadRoad Accident Deaths in US StatesCSV DOC
          MASSrotiferNumbers of Rotifers by Fluid DensityCSV DOC
          MASSshipsShips Damage DataCSV DOC
          MASSshrimpPercentage of Shrimp in Shrimp CocktailCSV DOC
          MASSshuttleSpace Shuttle Autolander ProblemCSV DOC
          MASSsnailsSnail Mortality DataCSV DOC
          MASSsteamThe Saturated Steam Pressure DataCSV DOC
          MASSstormerThe Stormer Viscometer DataCSV DOC
          MASSsurveyStudent Survey DataCSV DOC
          MASSsynth.teSynthetic Classification ProblemCSV DOC
          MASSsynth.trSynthetic Classification ProblemCSV DOC
          MASStopoSpatial Topographic DataCSV DOC
          MASSwadersCounts of Waders at 15 Sites in South AfricaCSV DOC
          MASSwhitesideHouse Insulation: Whiteside's DataCSV DOC
          MASSwtlossWeight Loss Data from an Obese PatientCSV DOC
          plmCigarCigarette ConsumptionCSV DOC
          plmCrimeCrime in North CarolinaCSV DOC
          plmEmplUKEmployment and Wages in the United KingdomCSV DOC
          plmGasolineGasoline ConsumptionCSV DOC
          plmGrunfeldGrunfeld's Investment DataCSV DOC
          plmHedonicHedonic Prices of Census Tracts in the Boston AreaCSV DOC
          plmLaborSupplyWages and Hours WorkedCSV DOC
          plmMalesWages and Education of Young MalesCSV DOC
          plmParityPurchasing Power Parity and other parity relationshipsCSV DOC
          plmProducUS States ProductionCSV DOC
          plmRiceFarmsProduction of Rice in IndiaCSV DOC
          plmSnmespEmployment and Wages in SpainCSV DOC
          plmSumHesThe Penn World Table, v. 5CSV DOC
          plmWagesPanel Data of Individual WagesCSV DOC
          plyrbaseballYearly batting records for all major league baseball playersCSV DOC
          psclAustralianElectionPollingPolitical opinion polls in Australia, 2004-07CSV DOC
          psclAustralianElectionselections to Australian House of Representatives, 1949-2007CSV DOC
          psclEfronMorrisBatting Averages for 18 major league baseball players, 1970CSV DOC
          psclRockTheVoteVoter turnout experiment, using Rock The Vote adsCSV DOC
          psclUKHouseOfCommons1992 United Kingdom electoral returnsCSV DOC
          psclabsenteeAbsentee and Machine Ballots in Pennsylvania State Senate RacesCSV DOC
          pscladmitApplications to a Political Science PhD ProgramCSV DOC
          psclbioChemistsarticle production by graduate students in biochemistry Ph.D. programsCSV DOC
          psclca2006California Congressional Districts in 2006CSV DOC
          pscliraqVoteU.S. Senate vote on the use of force against Iraq, 2002.CSV DOC
          psclpoliticalInformationInterviewer ratings of respondent levels of political informationCSV DOC
          psclpresidentialElectionselections for U.S. President, 1932-2012, by stateCSV DOC
          psclprussianPrussian army horse kick dataCSV DOC
          psclunionDensitycross national rates of trade union densityCSV DOC
          psclvote92Reports of voting in the 1992 U.S. Presidential election.CSV DOC
          reshape2french_friesSensory data from a french fries experiment.CSV DOC
          reshape2smithsDemo data describing the Smiths.CSV DOC
          reshape2tipsTipping dataCSV DOC
          rpartcar.test.frameAutomobile Data from 'Consumer Reports' 1990CSV DOC
          rpartcar90Automobile Data from 'Consumer Reports' 1990CSV DOC
          rpartcu.summaryAutomobile Data from 'Consumer Reports' 1990CSV DOC
          rpartkyphosisData on Children who have had Corrective Spinal SurgeryCSV DOC
          rpartsolderSoldering of Components on Printed-Circuit BoardsCSV DOC
          rpartstagecStage C Prostate CancerCSV DOC
          sandwichPublicSchoolsUS Expenditures for Public SchoolsCSV DOC
          semBollenBollen's Data on Industrialization and Political DemocracyCSV DOC
          semCNESVariables from the 1997 Canadian National Election StudyCSV DOC
          semKleinKlein's Data on the U. S. EconomyCSV DOC
          semKmentaPartly Artificial Data on the U. S. EconomyCSV DOC
          semTestsSix Mental TestsCSV DOC
          survivalbladderBladder Cancer RecurrencesCSV DOC
          survivalcancerNCCTG Lung Cancer DataCSV DOC
          survivalcgdChronic Granulotomous Disease dataCSV DOC
          survivalcolonChemotherapy for Stage B/C colon cancerCSV DOC
          survivalflchainAssay of serum free light chain for 7874 subjects.CSV DOC
          survivalgenfanGenerator fansCSV DOC
          survivalheartStanford Heart Transplant dataCSV DOC
          survivalkidneyKidney catheter dataCSV DOC
          survivalleukemiaAcute Myelogenous Leukemia survival dataCSV DOC
          survivalloganData from the 1972-78 GSS data used by LoganCSV DOC
          survivallungNCCTG Lung Cancer DataCSV DOC
          survivalmgusMonoclonal gammapothy dataCSV DOC
          survivalmgus2Monoclonal gammapothy dataCSV DOC
          survivalnwtcoData from the National Wilm's Tumor StudyCSV DOC
          survivalovarianOvarian Cancer Survival DataCSV DOC
          survivalpbcMayo Clinic Primary Biliary Cirrhosis DataCSV DOC
          survivalratsRat treatment data from Mantel et alCSV DOC
          survivalretinopathyDiabetic RetinopathyCSV DOC
          survivalstanford2More Stanford Heart Transplant dataCSV DOC
          survivaltobinTobin's Tobit dataCSV DOC
          survivaltransplantLiver transplant waiting listCSV DOC
          survivalveteranVeterans' Administration Lung Cancer studyCSV DOC
          vcdArthritisArthritis Treatment DataCSV DOC
          vcdBaseballBaseball DataCSV DOC
          vcdBrokenMarriageBroken Marriage DataCSV DOC
          vcdBundesligaErgebnisse der Fussball-BundesligaCSV DOC
          vcdBundestag2005Votes in German Bundestag Election 2005CSV DOC
          vcdButterflyButterfly Species in MalayaCSV DOC
          vcdCoalMinersBreathlessness and Wheeze in Coal MinersCSV DOC
          vcdDanishWelfareDanish Welfare Study DataCSV DOC
          vcdEmploymentEmployment StatusCSV DOC
          vcdFederalist'May' in Federalist PapersCSV DOC
          vcdHittersHitters DataCSV DOC
          vcdHorseKicksDeath by Horse KicksCSV DOC
          vcdHospitalHospital dataCSV DOC
          vcdJobSatisfactionJob Satisfaction DataCSV DOC
          vcdJointSportsOpinions About Joint SportsCSV DOC
          vcdLifeboatsLifeboats on the TitanicCSV DOC
          vcdNonResponseNon-Response Survey DataCSV DOC
          vcdOvaryCancerOvary Cancer DataCSV DOC
          vcdPreSexPre-marital Sex and DivorceCSV DOC
          vcdPunishmentCorporal Punishment DataCSV DOC
          vcdRepVictRepeat Victimization DataCSV DOC
          vcdSaxonyFamilies in SaxonyCSV DOC
          vcdSexualFunSex is FunCSV DOC
          vcdSpaceShuttleSpace Shuttle O-ring FailuresCSV DOC
          vcdSuicideSuicide Rates in GermanyCSV DOC
          vcdTrucksTruck Accidents DataCSV DOC
          vcdUKSoccerUK Soccer ScoresCSV DOC
          vcdVisualAcuityVisual Acuity in Left and Right EyesCSV DOC
          vcdVonBortVon Bortkiewicz Horse Kicks DataCSV DOC
          vcdWeldonDiceWeldon's Dice DataCSV DOC
          vcdWomenQueueWomen in QueuesCSV DOC
          ZeligMatchIt.urlTable of links for ZeligCSV DOC
          ZeligPEriskPolitical Economic Risk Data from 62 Countries in 1987CSV DOC
          ZeligSupremeCourtU.S. Supreme Court Vote MatrixCSV DOC
          ZeligWeimar1932 Weimar election dataCSV DOC
          ZeligZelig.urlTable of links for ZeligCSV DOC
          ZeligapprovalU.S. Presidential Approval DataCSV DOC
          ZeligbivariateSample data for bivariate probit regressionCSV DOC
          ZeligcoalitionCoalition Dissolution in Parliamentary DemocraciesCSV DOC
          Zeligcoalition2Coalition Dissolution in Parliamentary Democracies, Modified VersionCSV DOC
          ZeligeidatSimulation Data for Ecological InferenceCSV DOC
          Zeligfree1Freedom of Speech DataCSV DOC
          Zeligfree2Freedom of Speech DataCSV DOC
          ZeligfriendshipSimulated Example of Schoolchildren Friendship NetworkCSV DOC
          ZeliggrunfeldSimulation Data for model Seemingly Unrelated Regression (sur) that corresponds to method SUR of systemfitCSV DOC
          ZelighoffSocial Security Expenditure DataCSV DOC
          ZelighomerunSample Data on Home Runs Hit By Mark McGwire and Sammy Sosa in 1998.CSV DOC
          Zeligimmi1Individual Preferences Over Immigration PolicyCSV DOC
          Zeligimmi2Individual Preferences Over Immigration PolicyCSV DOC
          Zeligimmi3Individual Preferences Over Immigration PolicyCSV DOC
          Zeligimmi4Individual Preferences Over Immigration PolicyCSV DOC
          Zeligimmi5Individual Preferences Over Immigration PolicyCSV DOC
          ZeligimmigrationIndividual Preferences Over Immigration PolicyCSV DOC
          ZeligkleinSimulation Data for model Two-Stage Least Square (twosls) that corresponds to method 2SLS of systemfitCSV DOC
          ZeligkmentaSimulation Data for model Three-Stage Least Square (threesls) that corresponds to method 3SLS of systemfitCSV DOC
          ZeligmacroMacroeconomic DataCSV DOC
          ZeligmexicoVoting Data from the 1988 Mexican Presidental ElectionCSV DOC
          ZeligmidMilitarized Interstate DisputesCSV DOC
          ZelignewpaintersThe Discretized Painter's Data of de PilesCSV DOC
          ZeligsanctionMultilateral Economic SanctionsCSV DOC
          ZeligseatshareLeft Party Seat Share in 11 OECD CountriesCSV DOC
          Zeligsna.exSimulated Example of Social Network DataCSV DOC
          ZeligswissSwiss Fertility and Socioeconomic Indicators (1888) DataCSV DOC
          ZeligtobinTobin's Tobit DataCSV DOC
          ZeligturnoutTurnout Data Set from the National Election SurveyCSV DOC
          ZeligvoteincomeSample Turnout and Demographic Data from the 2000 Current Population SurveyCSV DOC
          HSAURBCGBCG Vaccine DataCSV DOC
          HSAURBtheBBeat the Blues DataCSV DOC
          HSAURCYGOB1CYG OB1 Star Cluster DataCSV DOC
          HSAURForbes2000The Forbes 2000 Ranking of the World's Biggest Companies (Year 2004)CSV DOC
          HSAURGHQGeneral Health QuestionnaireCSV DOC
          HSAURLanzaPrevention of Gastointestinal DamagesCSV DOC
          HSAURagefatTotal Body Composision DataCSV DOC
          HSAURaspirinAspirin DataCSV DOC
          HSAURbirthdeathratesBirth and Death Rates DataCSV DOC
          HSAURbladdercancerBladder Cancer DataCSV DOC
          HSAURcloudsCloud Seeding DataCSV DOC
          HSAURepilepsyEpilepsy DataCSV DOC
          HSAURfosterFoster Feeding ExperimentCSV DOC
          HSAURheptathlonOlympic Heptathlon Seoul 1988CSV DOC
          HSAURmastectomySurvival Times after Mastectomy of Breast Cancer PatientsCSV DOC
          HSAURmeteoMeteorological Measurements for 11 YearsCSV DOC
          HSAURorallesionsOral Lesions in Rural IndiaCSV DOC
          HSAURphosphatePhosphate Level DataCSV DOC
          HSAURpistonringsPiston Rings FailuresCSV DOC
          HSAURplanetsExoplanets DataCSV DOC
          HSAURplasmaBlood Screening DataCSV DOC
          HSAURpolypsFamilial Andenomatous PolyposisCSV DOC
          HSAURpolyps3Familial Andenomatous PolyposisCSV DOC
          HSAURpotteryRomano-British Pottery DataCSV DOC
          HSAURrearrestsRearrests of Juvenile FelonsCSV DOC
          HSAURrespiratoryRespiratory Illness DataCSV DOC
          HSAURroomwidthStudents Estimates of Lecture Room WidthCSV DOC
          HSAURschizophreniaAge of Onset of Schizophrenia DataCSV DOC
          HSAURschizophrenia2Schizophrenia DataCSV DOC
          HSAURschooldaysDays not Spent at SchoolCSV DOC
          HSAURskullsEgyptian SkullsCSV DOC
          HSAURsmokingNicotine Gum and Smoking CessationCSV DOC
          HSAURstudentsStudent Risk TakingCSV DOC
          HSAURsuicidesCrowd Baiting Behaviour and SuicidesCSV DOC
          HSAURtoothpasteToothpaste DataCSV DOC
          HSAURvotingHouse of Representatives Voting DataCSV DOC
          HSAURwaterMortality and Water HardnessCSV DOC
          HSAURwatervolesWater Voles DataCSV DOC
          HSAURwavesElectricity from Wave Power at SeaCSV DOC
          HSAURweightgainGain in Weight of RatsCSV DOC
          HSAURwomensroleWomens Role in SocietyCSV DOC
          psychBechtoldtSeven data sets showing a bifactor solution.CSV DOC
          psychBechtoldt.1Seven data sets showing a bifactor solution.CSV DOC
          psychBechtoldt.2Seven data sets showing a bifactor solution.CSV DOC
          psychDwyer8 cognitive variables used by Dwyer for an example.CSV DOC
          psychGleserExample data from Gleser, Cronbach and Rajaratnam (1965) to show basic principles of generalizability theory.CSV DOC
          psychGorsuchExample data set from Gorsuch (1997) for an example factor extension.CSV DOC
          psychHarman.55 socio-economic variables from Harman (1967)CSV DOC
          psychHarman.8Correlations of eight physical variables (from Harman, 1966)CSV DOC
          psychHarman.politicalEight political variables used by Harman (1967) as example 8.17CSV DOC
          psychHolzingerSeven data sets showing a bifactor solution.CSV DOC
          psychHolzinger.9Seven data sets showing a bifactor solution.CSV DOC
          psychReiseSeven data sets showing a bifactor solution.CSV DOC
          psychSchmid12 variables created by Schmid and Leiman to show the Schmid-Leiman TransformationCSV DOC
          psychThurstoneSeven data sets showing a bifactor solution.CSV DOC
          psychThurstone.33Seven data sets showing a bifactor solution.CSV DOC
          psychTucker9 Cognitive variables discussed by Tucker and Lewis (1973)CSV DOC
          psychability16 ability items scored as correct or incorrect.CSV DOC
          psychaffectTwo data sets of affect and arousal scores as a function of personality and movie conditionsCSV DOC
          psychbfi25 Personality items representing 5 factorsCSV DOC
          psychbfi.dictionary25 Personality items representing 5 factorsCSV DOC
          psychblotBond's Logical Operations Test - BLOTCSV DOC
          psychburt11 emotional variables from Burt (1915)CSV DOC
          psychcitiesDistances between 11 US citiesCSV DOC
          psychcubitsGalton's example of the relationship between height and 'cubit' or forearm lengthCSV DOC
          psychcushnyA data set from Cushny and Peebles (1905) on the effect of three drugs on hours of sleep, used by Student (1908)CSV DOC
          psychepiEysenck Personality Inventory (EPI) data for 3570 participantsCSV DOC
          psychepi.bfi13 personality scales from the Eysenck Personality Inventory and Big 5 inventoryCSV DOC
          psychepi.dictionaryEysenck Personality Inventory (EPI) data for 3570 participantsCSV DOC
          psychgaltonGalton's Mid parent child height dataCSV DOC
          psychheightsA data.frame of the Galton (1888) height and cubit data set.CSV DOC
          psychincomeUS family income from US census 2008CSV DOC
          psychiqitems16 multiple choice IQ itemsCSV DOC
          psychmsq75 mood items from the Motivational State Questionnaire for 3896 participantsCSV DOC
          psychneoNEO correlation matrix from the NEO_PI_R manualCSV DOC
          psychpeasGalton's PeasCSV DOC
          psychsat.act3 Measures of ability: SATV, SATQ, ACTCSV DOC
          psychwithinBetweenAn example of the distinction between within group and between group correlationsCSV DOC
          quantregBoscoBoscovich DataCSV DOC
          quantregCobarOreCobar Ore dataCSV DOC
          quantregMammalsGarland(1983) Data on Running Speed of MammalsCSV DOC
          quantregbarroBarro DataCSV DOC
          quantregengelEngel DataCSV DOC
          quantreggaspriceTime Series of US Gasoline PricesCSV DOC
          quantreguisUIS Drug Treatment study dataCSV DOC
          geepackdietoxGrowth curves of pigs in a 3x3 factorial experimentCSV DOC
          geepackkochOrdinal Data from KochCSV DOC
          geepackohioOhio Children Wheeze StatusCSV DOC
          geepackrespdisClustered Ordinal Respiratory DisorderCSV DOC
          geepackrespiratoryData from a clinical trial comparing two treatments for a respiratory illnessCSV DOC
          geepackseizureEpiliptic SeizuresCSV DOC
          geepacksitka89Growth of Sitka Spruce TreesCSV DOC
          geepackspruceLog-size of 79 Sitka spruce treesCSV DOC
          texmexliverLiver related laboratory dataCSV DOC
          texmexportpirieRain, wavesurge and portpirie datasets.CSV DOC
          texmexrainRain, wavesurge and portpirie datasets.CSV DOC
          texmexsummerAir pollution data, separately for summer and winter monthsCSV DOC
          texmexwavesurgeRain, wavesurge and portpirie datasets.CSV DOC
          texmexwinterAir pollution data, separately for summer and winter monthsCSV DOC
          multgeearthritisRheumatoid Arthritis Clinical TrialCSV DOC
          multgeehousingHomeless DataCSV DOC
          evirbmwDaily Log Returns on BMW Share PriceCSV DOC
          evirdanishDanish Fire Insurance ClaimsCSV DOC
          evirnidd.annualThe River Nidd DataCSV DOC
          evirnidd.threshThe River Nidd DataCSV DOC
          evirsiemensDaily Log Returns on Siemens Share PriceCSV DOC
          evirsp.rawSP Data to June 1993CSV DOC
          evirspto87SP Return Data to October 1987CSV DOC
          lme4ArabidopsisArabidopsis clipping/fertilization dataCSV DOC
          lme4DyestuffYield of dyestuff by batchCSV DOC
          lme4Dyestuff2Yield of dyestuff by batchCSV DOC
          lme4InstEvalUniversity Lecture/Instructor Evaluations by Students at ETHCSV DOC
          lme4PastesPaste strength by batch and caskCSV DOC
          lme4PenicillinVariation in penicillin testingCSV DOC
          lme4VerbAggVerbal Aggression item responsesCSV DOC
          lme4cakeBreakage Angle of Chocolate CakesCSV DOC
          lme4cbppContagious bovine pleuropneumoniaCSV DOC
          lme4grouseticksData on red grouse ticks from Elston et al. 2001CSV DOC
          lme4sleepstudyReaction times in a sleep deprivation studyCSV DOC
          mosaicDataAlcoholAlcohol Consumption per CapitaCSV DOC
          mosaicDataBirthdaysUS Births in 1969 - 1988CSV DOC
          mosaicDataBirthsUS BirthsCSV DOC
          mosaicDataBirths78US Births in 1978CSV DOC
          mosaicDataCPS85Data from the 1985 Current Population Survey (CPS85)CSV DOC
          mosaicDataCoolingWaterCoolingWaterCSV DOC
          mosaicDataCountriesCountriesCSV DOC
          mosaicDataDimesWeight of dimesCSV DOC
          mosaicDataGaltonGalton's dataset of parent and child heightsCSV DOC
          mosaicDataGestationData from the Child Health and Development StudiesCSV DOC
          mosaicDataGoosePermitsGoose Permit StudyCSV DOC
          mosaicDataHELPfullHealth Evaluation and Linkage to Primary CareCSV DOC
          mosaicDataHELPmissHealth Evaluation and Linkage to Primary CareCSV DOC
          mosaicDataHELPrctHealth Evaluation and Linkage to Primary CareCSV DOC
          mosaicDataHeatXData from a heat exchanger laboratoryCSV DOC
          mosaicDataKidsFeetFoot measurements in childrenCSV DOC
          mosaicDataMarriageMarriage recordsCSV DOC
          mosaicDataMitesMites and Wilt DiseaseCSV DOC
          mosaicDataRailTrailVolume of Users of a Rail TrailCSV DOC
          mosaicDataRidersVolume of Users of a Massachusetts Rail TrailCSV DOC
          mosaicDataSATState by State SAT dataCSV DOC
          mosaicDataSaratogaHousesHouses in Saratoga County (2006)CSV DOC
          mosaicDataSnowGRSnowfall data for Grand Rapids, MICSV DOC
          mosaicDataSwimRecords100 m Swimming World RecordsCSV DOC
          mosaicDataTenMileRaceCherry Blossom RaceCSV DOC
          mosaicDataUtilitiesUtility billsCSV DOC
          mosaicDataUtilities2Utility billsCSV DOC
          mosaicDataWhickhamData from the Whickham surveyCSV DOC
          ISLRAutoAuto Data SetCSV DOC
          ISLRCaravanThe Insurance Company (TIC) BenchmarkCSV DOC
          ISLRCarseatsSales of Child Car SeatsCSV DOC
          ISLRCollegeU.S. News and World Report's College DataCSV DOC
          ISLRDefaultCredit Card Default DataCSV DOC
          ISLRHittersBaseball DataCSV DOC
          ISLROJOrange Juice DataCSV DOC
          ISLRPortfolioPortfolio DataCSV DOC
          ISLRSmarketS&P Stock Market DataCSV DOC
          ISLRWageMid-Atlantic Wage DataCSV DOC
          ISLRWeeklyWeekly S&P Stock Market DataCSV DOC
          Stat2DataAlfalfaAlfalfaCSV DOC
          Stat2DataArcheryDataArcheryDataCSV DOC
          Stat2DataAutoPollutionAutoPollutionCSV DOC
          Stat2DataBackpackBackpackCSV DOC
          Stat2DataBaseballTimesBaseballTimesCSV DOC
          Stat2DataBeeStingsBeeStingsCSV DOC
          Stat2DataBirdNestBirdNestCSV DOC
          Stat2DataBlood1Blood1CSV DOC
          Stat2DataBlueJaysBlue JaysCSV DOC
          Stat2DataBritishUnionsBritishUnionsCSV DOC
          Stat2DataCAFECAFECSV DOC
          Stat2DataCO2CO2CSV DOC
          Stat2DataCalciumBPCalciumBPCSV DOC
          Stat2DataCancerSurvivalCancerSurvivalCSV DOC
          Stat2DataCaterpillarsCaterpillarsCSV DOC
          Stat2DataCerealCerealCSV DOC
          Stat2DataChemoTHCChemoTHCCSV DOC
          Stat2DataChildSpeaksChildSpeaksCSV DOC
          Stat2DataClothingClothingCSV DOC
          Stat2DataCloudSeedingCloud SeedingCSV DOC
          Stat2DataCloudSeeding2Cloud Seeding 2CSV DOC
          Stat2DataCrackerFiberCracker Fiber in DietsCSV DOC
          Stat2DataCuckooCuckooCSV DOC
          Stat2DataDay1SurveyDay1SurveyCSV DOC
          Stat2DataDiamondsDiamondsCSV DOC
          Stat2DataDiamonds2Diamonds2CSV DOC
          Stat2DataElection08Election08CSV DOC
          Stat2DataEthanolEthanolCSV DOC
          Stat2DataFGByDistanceFGByDistanceCSV DOC
          Stat2DataFantasyBaseballFantasyBaseballCSV DOC
          Stat2DataFertilityFertilityCSV DOC
          Stat2DataFilmFilmCSV DOC
          Stat2DataFinalFourIzzoFinalFourIzzoCSV DOC
          Stat2DataFinalFourLongFinalFourLongCSV DOC
          Stat2DataFinalFourShortFinalFourShortCSV DOC
          Stat2DataFingersFingersCSV DOC
          Stat2DataFirstYearGPAFirstYearGPACSV DOC
          Stat2DataFishEggsFishEggsCSV DOC
          Stat2DataFlightResponseFlightResponseCSV DOC
          Stat2DataFluorescenceFluorescenceCSV DOC
          Stat2DataFruitFliesFruitFliesCSV DOC
          Stat2DataGoldenrodGoldenrod GallsCSV DOC
          Stat2DataGroceryGroceryCSV DOC
          Stat2DataGunnelsGunnelsCSV DOC
          Stat2DataHawkTailHawkTailCSV DOC
          Stat2DataHawkTail2HawkTail2CSV DOC
          Stat2DataHawksHawksCSV DOC
          Stat2DataHearingTestHearingTestCSV DOC
          Stat2DataHighPeaksHighPeaksCSV DOC
          Stat2DataHoopsHoopsCSV DOC
          Stat2DataHorsePricesHorsePricesCSV DOC
          Stat2DataHousesHousesCSV DOC
          Stat2DataICUICUCSV DOC
          Stat2DataInfantMortalityInfantMortalityCSV DOC
          Stat2DataInsuranceVoteInsuranceVoteCSV DOC
          Stat2DataJurorsJurorsCSV DOC
          Stat2DataKids198Kids198CSV DOC
          Stat2DataLeafHoppersLeafHoppersCSV DOC
          Stat2DataLeukemiaLeukemiaCSV DOC
          Stat2DataLongJumpOlympicsLongJumpOlympicsCSV DOC
          Stat2DataLostLetterLostLetterCSV DOC
          Stat2DataMLB2007StandingsMLB2007StandingsCSV DOC
          Stat2DataMarathonMarathonCSV DOC
          Stat2DataMarketsMarketsCSV DOC
          Stat2DataMathEnrollmentMath EnrollmentsCSV DOC
          Stat2DataMathPlacementMath PlacementCSV DOC
          Stat2DataMedGPAMedGPACSV DOC
          Stat2DataMentalHealthMental Health AdmissionsCSV DOC
          Stat2DataMetabolicRateMetabolic Rate of CaterpillarsCSV DOC
          Stat2DataMetroHealth83MetroHealth83CSV DOC
          Stat2DataMilgramMilgramCSV DOC
          Stat2DataMothEggsMoth EggsCSV DOC
          Stat2DataNCbirthsNCbirthsCSV DOC
          Stat2DataNFL2007StandingsNFL2007StandingsCSV DOC
          Stat2DataNursingNursingCSV DOC
          Stat2DataOlivesOlivesCSV DOC
          Stat2DataOringsOringsCSV DOC
          Stat2DataOverdrawnOverdrawnCSV DOC
          Stat2DataPalmBeachPalmBeachCSV DOC
          Stat2DataPedometerPedometerCSV DOC
          Stat2DataPerchPerchCSV DOC
          Stat2DataPigFeedPigFeedCSV DOC
          Stat2DataPinesPinesCSV DOC
          Stat2DataPoliticalPoliticalCSV DOC
          Stat2DataPollster08Pollster08CSV DOC
          Stat2DataPopcornPopcornCSV DOC
          Stat2DataPorscheJaguarPorscheJaguarCSV DOC
          Stat2DataPorschePricePorschePriceCSV DOC
          Stat2DataPulsePulseCSV DOC
          Stat2DataPutts1Putts1CSV DOC
          Stat2DataPutts2Putts2CSV DOC
          Stat2DataReligionGDPReligionGDPCSV DOC
          Stat2DataRetirementRetirementCSV DOC
          Stat2DataRiverElementsRiverElementsCSV DOC
          Stat2DataRiverIronRiver IronCSV DOC
          Stat2DataSATGPASAT scores and GPACSV DOC
          Stat2DataSampleFGSampleFGCSV DOC
          Stat2DataSandwichAntsSandwich AntsCSV DOC
          Stat2DataSeaSlugsSea SlugsCSV DOC
          Stat2DataSparrowsSparrowsCSV DOC
          Stat2DataSpeciesAreaSpecies AreaCSV DOC
          Stat2DataSpeedSpeedCSV DOC
          Stat2DataSwahiliSwahiliCSV DOC
          Stat2DataTMSTMSCSV DOC
          Stat2DataTextPricesText PricesCSV DOC
          Stat2DataThreeCarsThree CarsCSV DOC
          Stat2DataTipJokeTip JokeCSV DOC
          Stat2DataTitanicTitanicCSV DOC
          Stat2DataTomlinsonRushLaDainian Tomlinson Rushing YardsCSV DOC
          Stat2DataTwinsLungsTwinsLungsCSV DOC
          Stat2DataUSstampsUSstampsCSV DOC
          Stat2DataVoltsVoltsCSV DOC
          Stat2DataWalkingBabiesWalkingBabiesCSV DOC
          Stat2DataWeightLossIncentiveWeightLossIncentiveCSV DOC
          Stat2DataWeightLossIncentive4WeightLossIncentive4CSV DOC
          Stat2DataWeightLossIncentive7WeightLossIncentive7CSV DOC
          Stat2DataWordMemoryWordMemoryCSV DOC
          Stat2DataYouthRisk2007YouthRisk2007CSV DOC
          Stat2DataYouthRisk2009YouthRisk2009CSV DOC


          Source:  r-dir (r-directory)


          World Bank Data - Literally hundreds of datasets spanning many decades, sortable by topic or country. Data is downloadable in Excel or XML formats, or you can make API calls. This is an outstanding resource.

          Gapminder - Hundreds of datasets on world health, economics, population, etc. All of it is viewable online within Google Docs, and downloadable as spreadsheets.

          The Data Hub - Hosted by CKAN. Most of these datasets come from the government.

          Datamob - List of public datasets.

          Numbrary - Lists of datasets.

          Kaggle - Kaggle is a site that hosts data mining competitions. Each competition provides a data set that's free for download.

          SNAP - Stanford's Large Network Dataset Collection. This list has several datasets related to social networking. Lots of fun in here!

          KONECT - The Koblenz Network Collection. Several datasets related to social networking & Wikipedia.

          Million Song Dataset - This is a collection of audio features and metadata for a million contemporary popular music tracks.

          Energy Information Administration - This site offers a number of datasets on energy production, consumption, sources, etc.

          GeoDa Center - This is a collection of geospatial datasets offered by Arizona State Univerisity's Center for Geospatial Analysis & Computation.

          Reddit Datasets - This last one isn't a dataset itself, but rather a social news site devoted to datasets. It's updated regularly with news about newly available datasets.

          Quandl - This is a web-based front end to a number of public data sets. What's nice about this website is that it allows for the combination of data from a number of sources, and can export the data in a number of formats.

          1,001 Datasets - This is a list of lists of datasets. There's not much organization here, but there really are a LOT of datasets. Dive in and have fun.

          Yahoo! Webscope - A reference library of interesting and scientifically useful datasets for non-commercial use by academics and other scientists.

          Time Series Data Library - Curated by Professor Rob Hyndman of Monash University in Australia, this is a collection of over 500 datasets containing time-series data, organized by category.

          Awesome Public Datasets - Curated list of hundreds of public datasets, organized by topic.

          Common Crawl - Massive dataset of billions of pages scraped from the web. The data itself is on Amazon Public Datasets, so its easy to load it into an EC2 instance there. The dataset is updated with a new scrape about once per month.


          SOURCE: Amazon Public Datasets - Collection of datasets that are ready to be loaded into an EC2 instance.

          A Multi-wavelength Infrared Atlas of the Galactic Plane Open Source tools were used to combine images from five major infrared surveys of the Galactic Plane, archived at the NASA/IPAC Infrared Science Archive (IRSA). The result is a 16-wavelength infrared Atlas of the Galactic Plane that coves the wavelength range 1 μm to 24 μm.

          CCAFS-Climate Data High resolution climate data to help assess the impacts of climate change primarily on agriculture. These open access datasets of climate projections will help researchers make climate change impact assessments.

          NASA NEX Three NASA NEX datasets are now available, including climate projections and satellite images of Earth.

          Human Microbiome Project Human Microbiome Project Data Set

          Enron Email Data Enron email data publicly released as part of FERC's Western Energy Markets investigation converted to industry standard formats by EDRM. The data set consists of 1,227,255 emails with 493,384 attachments covering 151 custodians. The email is provided in Microsoft PST, IETF MIME, and EDRM XML formats.

          Japan Census Data Multiple data sets including: (1) Population Census of Japan (1995, 2000, 2005, 2010), (2) Establishment and Enterprise Census of Japan (1999, 2001, 2004, 2006), and (3) Economic Census of Japan (2009).

          Apache Software Foundation Public Mail Archives  A collection of all publicly available Apache Software Foundation mail archives as of July 11, 2011

          Freebase Simple Topic Dump A data dump of the basic identifying facts about every topic in Freebase

          Freebase Quad Dump A data dump of all the current facts and assertions in Freebase

          Wikipedia Page Traffic Statistic V3 This dataset contains a 150 GB sample of the data used to power It includes a full 3 months of hourly page traffic statistics from Wikipedia (1/1/2011-3/31/2011).

          Material Safety Data Sheets 230,000 Material Safety Data Sheets.

          Million Song Dataset The Million Songs Collection is a collection of 28 datasets containing audio features and metadata for a million contemporary popular music tracks.

          Million Song Sample Dataset This is a 10,000 song subset of audio features and metadata from the Million Songs collection - a collection of 28 datasets containing audio features and metadata for a million contemporary popular music tracks.

          Marvel Universe Social Graph This dataset is an example of a social collaboration network based on the characters in The Marvel Universe, that is, the artificial world that takes place in the universe of the Marvel comic books.

          Google Books Ngrams A data set containing Google Books n-gram corpora. This data set is freely available on Amazon S3 in a Hadoop friendly file format and is licensed under a Creative Commons Attribution 3.0 Unported License. The original dataset is available from

          The WestburyLab USENET corpus The WestburyLab USENET corpus is an anonymized compilation of postings from 47,860 English-language newsgroups from 2005-2010.

          1000 Genomes Project The 1000 Genomes Project, initiated in 2008, is an international public-private consortium that aims to build the most detailed map of human genetic variation available.

          Wikipedia Traffic Statistics V2 Contains 16 months of hourly pageview statistics for all articles in Wikipedia

          M-Lab dataset: Network Diagnostic Tool (NDT) NDT test results created through Measurement Lab (M-Lab) between February 2009 and September 2009

          M-Lab dataset: Network Path and Application Diagnosis tool (NPAD) NPAD test results created through Measurement Lab (M-Lab) between February 2009 and September 2009

          Petroleum Public Data Set (working Title) Public-domain data for the oil & gas industry, assembled from the contributions of participating agencies in the United States, Canada and around the world. This data provides industry stakeholders with an opportunity to focus their efforts on the analysis and interpretation of this data without concern for the trivial and time-consuming tasks of locating, downloading, reformatting and integrating the data prior to value-added work being performed.

          Sloan Digital Sky Survey DR6 Subset The Sloan Digital Sky Survey is the most ambitious astronomical survey ever undertaken.

          Wikipedia Page Traffic Statistics Contains 7 months of hourly pageview statistics for all articles in Wikipedia

          Wikipedia XML Data A complete copy of all Wikimedia wikis, in the form of wikitext source and metadata embedded in XML.

          Federal Reserve Economic Data - Fred Database of 20,059 U.S. economic time series.

          Twilio/ Street Vector Data Set Twilio/ database of mapped US street names and address ranges.

          Federal Contracts from the Federal Procurement Data Center ( A data dump of all federal contracts from the Federal Procurement Data Center found at

          University of Florida Sparse Matrix Collection The University of Florida Sparse Matrix Collection is a large, widely available, and actively growing set of sparse matrices that arise in real applications.

          2008 TIGER/Line Shapefiles Census 2000 and Current United States shapefiles

          Wikipedia Extraction (WEX) A processed dump of the English language Wikipedia

          Business and Industry Summary Data US Business and Industry Summary Data

          2003-2006 US Economic Data US Economic Data for years 2003 to 2006

          Freebase Data Dump Freebase is an open database of the world's information, covering millions of topics in hundreds of categories

          DBpedia 3.5.1 DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web

          1980 US Census Data from the 1980 US Census

          1990 US Census Data from the 1990 US Census

          2000 US Census Data from the 2000 US Census

          Transportation Databases Various transportation statistics

          Labor Statistics Databases Various Labor Statistics












          Enjoy! As mentioned above - 100% of this data is reposted - original source is in links - if I've missed any citations, please let me know and will fix

          Interested in more content by this author?

          loading, please wait...

          Followed By

          • Amitabh Modi
          • Budi Wins

          Recommended By

          • Atul Loona
          • Budi Wins

          About this document

          1001 Datasets and Data repositories ( List of lists of lists ) - rough list to compile - a rough lists of lists

          Created: December 28, 2013

          This Document Appears in

          You might also like