New to Dream to Learn? Check out our Quick Start guide!  

doc header

1001 Datasets and Data repositories ( List of lists of lists )

1001 Datasets and Data repositories ( List of lists of lists )

This is a LIST of.... "lists of lists".

Messy presentation (mainly for my own use) to pull together Raw Datasets for when I'm in the mood to get creative - search text on a single page as a starting point for exploration.......  later will look at better format.  If you have a suggestion for a list of lists, to add to this list :) please message me or post comment..

100% of the links below are from external sources (not mine)

________________________________________________________________________

( if you're looking for APIs/Services to hit with unstructured data - some of these are good - especially Alchemy (full disclosure - I'm am IBMer ;)

https://www.ibm.com/watson/developercloud/services-catalog.html )

 

IF you are a total newbie - this may be interesting - tool suite, notebooks, demos, code from my IBM colleauges

https://datascience.ibm.com/docs/content/getting-started/welcome-main.html

_______________________________________________________________________

DATA ---  DATA --- DATA --- DATA --- DATA

Source:  Quora:

https://www.quora.com/Where-can-I-find-large-datasets-open-to-the-public

Cross-disciplinary data repositories, data collections and data search engines:

  1. http://datasource.kapsarc.org
  2. https://www.kaggle.com/datasets
  3. http://www.assetmacro.com
  4. http://usgovxml.com
  5. http://aws.amazon.com/datasets
  6. http://databib.org
  7. http://datacite.org
  8. http://figshare.com
  9. http://linkeddata.org
  10. http://reddit.com/r/datasets
  11. http://thewebminer.com/
  12. http://thedatahub.org alias http://ckan.net
  13. http://quandl.com
  14. Social Network Analysis Interactive Dataset Library (Social Network Datasets)
  15. Datasets for Data Mining
  16. Enigma Public
  17. http://www.ufindthem.com/
  18. http://NetworkRepository.com - The First Interactive Network Data Repository
  19. http://MLvis.com
  20. Open Data Inception - A Comprehensive List of 2500+ Open Data Portals in the World
  21. http://data.opendatasoft.com OpenDataSoft catalog

Single datasets and data repositories

  1. http://archive.ics.uci.edu/ml/
  2. http://crawdad.org/
  3. http://data.austintexas.gov
  4. http://data.cityofchicago.org
  5. http://data.govloop.com
  6. http://data.gov.uk/
  7. data.gov.in
  8. http://data.medicare.gov
  9. http://data.seattle.gov
  10. http://data.sfgov.org
  11. http://data.sunlightlabs.com
  12. https://datamarket.azure.com/
  13. http://developer.yahoo.com/geo/g...
  14. http://econ.worldbank.org/datasets
  15. http://en.wikipedia.org/wiki/Wik...
  16. http://factfinder.census.gov/ser...
  17. http://ftp.ncbi.nih.gov/
  18. http://gettingpastgo.socrata.com
  19. http://googleresearch.blogspot.c...
  20. http://books.google.com/ngrams/
  21. http://medihal.archives-ouvertes.fr
  22. http://public.resource.org/
  23. http://rechercheisidore.fr
  24. http://snap.stanford.edu/data/in...
  25. http://timetric.com/public-data/
  26. https://wist.echo.nasa.gov/~wist...
  27. http://www2.jpl.nasa.gov/srtm
  28. http://www.archives.gov/research...
  29. http://www.bls.gov/
  30. http://www.crunchbase.com/
  31. http://www.dartmouthatlas.org/
  32. http://www.data.gov/
  33. http://www.datakc.org
  34. http://dbpedia.org
  35. http://www.delicious.com/jbaldwi...
  36. http://www.faa.gov/data_research/
  37. http://www.factual.com/
  38. http://research.stlouisfed.org/f...
  39. http://www.freebase.com/
  40. http://www.google.com/publicdata...
  41. http://www.guardian.co.uk/news/d...
  42. http://www.infochimps.com
  43. http://www.kaggle.com/
  44. http://build.kiva.org/
  45. http://www.nationalarchives.gov....
  46. http://www.nyc.gov/html/datamine...
  47. http://www.ordnancesurvey.co.uk/...
  48. http://www.philwhln.com/how-to-g...
  49. http://www.imdb.com/interfaces
  50. http://imat-relpred.yandex.ru/en...
  51. http://www.dados.gov.pt/pt/catal...
  52. http://knoema.com
  53. http://daten.berlin.de/
  54. http://www.qunb.com
  55. http://databib.org/
  56. http://datacite.org/
  57. http://data.reegle.info/
  58. http://data.wien.gv.at/
  59. http://data.gov.bc.ca
  60. https://pslcdatashop.web.cmu.edu/ (interaction data in learning environments)
  61. http://www.icpsr.umich.edu/icpsrweb/CPES/ - Collaborative Psychiatric Epidemiology Surveys: (A collection of three national surveys focused on each of the major ethnic groups to study psychiatric illnesses and health services use)
  62. http://www.dati.gov.it
  63. http://dati.trentino.it
  64. http://www.databagg.com/
  65. http://networkrepository.com - Network/ML data repository w/ visual interactive analytics
  66. Home (United Nations Environment Programme Grid Genava a lot of GIS datasets

Source: Google Search

r-directory > Reference Links > Free Data Sets  https://r-dir.com/reference/datasets.html
Big Data Made Simple - 70 WebSites - http://bigdata-madesimple.com/70-websites-to-get-large-data-repositories-for-free/
18 places to find data sets for data science projects https://www.dataquest.io/blog/free-datasets-for-projects/

Source:  IBM -  https://apsportal.ibm.com/community

 
OrNotebookto default project
 

Source: http://www.data.gov/

Check out Data.gov’s new Metrics Pag - July 31, 2017  By Data.gov

=====

===

SOURCE -

http://www.forbes.com/sites/bernardmarr/2016/02/12/big-data-35-brilliant-and-free-data-sources-for-2016/#70f9792d6796

 

 

  1. Data.gov http://data.gov The US Government pledged last year to make all government data available freely online. This site is the first stage and acts as a portal to all sorts of amazing information on everything from climate to crime.
  2. US Census Bureau http://www.census.gov/data.html A wealth of information on the lives of US citizens covering population data, geographic data and education.
  3. Socrata is another interesting place to explore government-related data, with some visualisation tools built-in.
  4. European Union Open Data Portal http://open-data.europa.eu/en/data/ As the above, but based on data from European Union institutions.
  5. Data.gov.uk http://data.gov.uk/ Data from the UK Government, including the British National Bibliography – metadata on all UK books and publications since 1950.
  6. Canada Open Data is a pilot project with many government and geospatial datasets.
  7. Datacatalogs.org offers open government data from US, EU, Canada, CKAN, and more.
  8. The CIA World Factbook https://www.cia.gov/library/publications/the-world-factbook/ Information on history, population, economy, government, infrastructure and military of 267 countries.Healthdata.gov https://www.healthdata.gov/ 125 years of US healthcare data including claim-level Medicare data, epidemiology and population statistics.
  9. NHS Health and Social Care Information Centre http://www.hscic.gov.uk/home Health data sets from the UK National Health Service.
  10. UNICEF offers statistics on the situation of women and children worldwide.
  11. World Health Organization offers world hunger, health, and disease statistics.
  12. Amazon Web Services public datasets http://aws.amazon.com/datasets Huge resource of public data, including the 1000 Genome Project, an attempt to build the most comprehensive database of human genetic information and NASA ’s database of satellite imagery of Earth.
  13. Facebook FB +0.23% Graph https://developers.facebook.com/docs/graph-api Although much of the information on users’ Facebook profile is private, a lot isn’t – Facebook provide the Graph API as a way of querying the huge amount of information that its users are happy to share with the world (or can’t hide because they haven’t worked out how the privacy settings work).
  14. Face.com: A fascinating tool for facial recognition data.
  15. UCLA makes some of the data from its courses public.
  16. Data Market is a place to check out  data related to economics, healthcare, food and agriculture, and the automotive industry.
  17. Google Public data explorer includes data from world development indicators, OECD, and human development indicators, mostly related to economics data and the world.
  18. Junar is a data scraping service that also includes data feeds.
  19. Buzzdata is a social data sharing service that allows you to upload your own data and connect with others who are uploading their data.
  20. Gapminder http://www.gapminder.org/data/ Compilation of data from sources including the World Health Organization and World Bank covering economic, medical and social statistics from around the world.
  21. Google GOOGL +0.25% Trends http://www.google.com/trends/explore Statistics on search volume (as a proportion of total search) for any given term, since 2004.
  22. Google Finance https://www.google.com/finance 40 years’ worth of stock market data, updated in real time.
  23. Google Books Ngrams http://storage.googleapis.com/books/ngrams/books/datasetsv2.html Search and analyze the full text of any of the millions of books digitised as part of the Google Books project.
  24. National Climatic Data Center http://www.ncdc.noaa.gov/data-access/quick-links#loc-clim Huge collection of environmental, meteorological and climate data sets from the US National Climatic Data Center. The world’s largest archive of weather data.
  25. DBPedia http://wiki.dbpedia.org Wikipedia is comprised of millions of pieces of data, structured and unstructured on every subject under the sun. DBPedia is an ambitious project to catalogue and create a public, freely distributable database allowing anyone to analyze this data.
  26. New York Times NYT -0.42% http://developer.nytimes.com/docs Searchable, indexed archive of news articles going back to 1851.
  27. Freebase http://www.freebase.com/ A community-compiled database of structured data about people, places and things, with over 45 million entries.
  28. Million Song Data Set http://aws.amazon.com/datasets/6468931156960467 Metadata on over a million songs and pieces of music. Part of Amazon Web Services.
  29. UCI Machine Learning Repository is a dataset specifically pre-processed for machine learning.
  30. Financial Data Finder at OSU offers a large catalog of financial data sets.
  31. Pew Research Center offers its raw data from its fascinating research into American life.
  32. The BROAD Institute offers a number of cancer-related datasets.

 

====

Source: Caesar0301 Awesome Data Sets

https://github.com/caesar0301/awesome-public-datasets

 

Agriculture

 

Biology

 

Climate/Weather

 

Complex Networks

 

Computer Networks

 

Data Challenges

 

Earth Science

 

Economics

 

Education

 

Energy

 

Finance

 

GIS

 

Government

 

Healthcare

 

Image Processing

 

Machine Learning

 

Museums

 

Natural Language

 

Neuroscience

 

Physics

 

Psychology/Cognition

 

Public Domains

 

Search Engines

 

Social Networks

 

Social Sciences

 

Software

 

Sports

 

Time Series

 

Transportation

 

Source: United Nations http://data.un.org/DataMartInfo.aspx

Source: http://www.kdnuggets.com/datasets/index.html

1AWS (Amazon Web Services) Public Data Sets, provides a centralized repository of public data sets that can be seamlessly integrated into AWS cloud-based applications.
2BigML big list of public data sources.
3Bioassay data, described in Virtual screening of bioassay data, by Amanda Schierz, J. of Cheminformatics, with 21 Bioassay datasets (Active / Inactive compounds) available for download.
4Bitly 1.usa.gov data, anonymized clicks on gov links.
5Canada Open Data, pilot project with many government and geospatial datasets.
6Causality Workbench data repository.
7Corral Big Data repository at Texas Advanced Computing Center, supporting data-centric science.
8Data Source Handbook, A Guide to Public Data, by Pete Warden, O'Reilly (Jan 2011).
9Datacatalogs.org, open government data from US, EU, Canada, CKAN, and more.
10Data.gov.uk, publicly available data from UK (also London datastore.)
11Data.gov/Education, central guide for education data resources including high-value data sets, data visualization tools, resources for the classroom, applications created from open data and more.
12DataMarket, visualize the world's economy, societies, nature, and industries, with 100 million time series from UN, World Bank, Eurostat and other important data providers.
13Datamob, public data put to good use.
14DataSF.org, a clearinghouse of datasets available from the City & County of San Francisco, CA.
15DataFerrett, a data mining tool that accesses and manipulates TheDataWeb, a collection of many on-line US Goverment datasets.
16Delve, Data for Evaluating Learning in Valid Experiments
17EconData, thousands of economic time series, produced by a number of US Government agencies.
18Enron Email Dataset, data from about 150 users, mostly senior management of Enron.
19Europeana Data, contains open metadata on 20 million texts, images, videos and sounds gathered by Europeana - the trusted and comprehensive resource for European cultural heritage content.
20FEDSTATS, a comprehensive source of US statistics and more
21FIMI repository for frequent itemset mining, implementations and datasets.
22Financial Data Finder at OSU, a large catalog of financial data sets.
23GDELT: The Global Data on Events, Location and Tone, described by Guardian as "a big data history of life, the universe and everything."
24GEO (GEO Gene Expression Omnibus), a gene expression/molecular abundance repository supporting MIAME compliant data submissions, and a curated, online resource for gene expression data browsing, query and retrieval.
25GeoDa Center, geographical and spatial data.
26Google ngrams datasets, text from millions of books scanned by Google.
27Grain Market Research, financial data including stocks, futures, etc.
28Hilary Mason research-quality Big Data sets collection - many text and image datasets.
29HitCompanies Datasets, comprehensive data on random 10,000 UK companies sampled from HitCompanies, updated automatically using AI/Machine Learning.
30ICWSM-2009 dataset contains 44 million blog posts made between August 1st and October 1st, 2008.
31Infochimps, an open catalog and marketplace for data. You can share, sell, curate, and download data about anything and everything.
32Investor Links, includes financial data
33KDD Cup center, with all data, tasks, and results.
34Kevin Chai list of datasets, for text, SNA, and other fields.
35KONECT, the Koblenz Network Collection, with large network datasets of all types in order to perform research in the area of network mining.
36Linking Open Data project, at making data freely available to everyone.
37Million Song Dataset
38MIT Cancer Genomics gene expression datasets and publications, from MIT Whitehead Center for Genome Research.
39ML Data, the data repository of the EU Pascal2 networks.
40NASDAQ Data Store, provides access to market data.
41National Government Statistical Web Sites, data, reports, statistical yearbooks, press releases, and more from about 70 web sites, including countries from Africa, Europe, Asia, and Latin America.
42National Space Science Data Center (NSSDC), NASA data sets from planetary exploration, space and solar physics, life sciences, astrophysics, and more.
43Open Data Census, assesses the state of open data around the world.
44OpenData from Socrata, access to over 10,000 datasets including business, education, government, and fun.
45Open Source Sports, many sports databases, including Baseball, Football, Basketball, and Hockey.
46Peter Skomoroch dataset Bookmarks
47PubGene(TM) Gene Database and Tools, genomic-related publications database
48Quandl, a collaboratively curated portal to millions of financial and economic time-series datasets.
49qunb, a platform to find and visualize quantitative data.
50Robert Schiller data on housing, stock market, and more from his book Irrational Exuberance.
51SMD: Stanford Microarray Database, stores raw and normalized data from microarray experiments.
52Jerry Smith dataset collection, with Finance, Government, Machine Learning, Science, and other data.
53SourceForge.net Research Data, includes historic and status statistics on approximately 100,000 projects and over 1 million registered users' activities at the project management web site.
54StatLib, CMU Datasets Archive.
55STATOO Datasets part 1 and STATOO Datasets part 2
56Time Series Data Library
57Visual Analytics Benchmark Repository.
58UCI KDD Database Repository for large datasets used in machine learning and knowledge discovery research.
59UCI Machine Learning Repository.
60UCR Time Series Data Archive, offering datasets, papers, links, and code.
61United States Census Bureau.
62Wikiposit, a (virtual) amalgamation of (mostly financial) data from many different sites, allowing users to merge data from different sources
63Wolfram Alpha disease and patient level dat.
64Yahoo Sandbox datasets, Language, Graph, Ratings, Advertising and Marketing, Competition
65Yelp Academic Dataset, all the data and reviews of the 250 closest businesses for 30 universities for students and academics to explore and research.

Source: http://www.kdnuggets.com/datasets/government-local-public.html

Public data catalogs, portals, and services

  • AWS (Amazon Web Services) Public Data Sets, provides a centralized repository of public data sets that can be seamlessly integrated into AWS cloud-based applications.
  • Datacatalogs.org, open government data from US, EU, Canada, CKAN, and more.
  • DataMarket, visualize the world's economy, societies, nature, and industries, with 100 million time series from UN, World Bank, Eurostat and other important data providers.
  • datamob, Public data put to good use.
  • Enigma, "Google for public data", provides easy access to government, NGO, and other public domain datasets.
  • Freebase, a community-curated database of well-known people, places, and things.
  • Google Public Data, with dynamic visualization and exploration tools.
  • Knoema World Data Atlas, over 1000 indicators on all countries
  • National Government Statistical Web Sites, data, reports, statistical yearbooks, press releases, and more from about 70 web sites, including countries from Africa, Europe, Asia, and Latin America.
  • Open Data Census, assesses the state of open data around the world.
  • Open Data Institute, catalysing the evolution of open data culture to create economic, environmental, and social value.
  • Socrata OpenData, provides social data discovery services for opening government, healthcare, energy, education, or environment data.
  • Visualing Data big collection of sites and services for accessing data.

Global, International, UN

  • The World Bank, a comprehensive set of data about development in countries around the globe.
  • UN data, a data access system to UN databases
  • UNICEF statistics, data analysis and other data about UNICEF work.

 

USA: Federal

 

USA: State, City, and Local

Canada

Europe

  • Europeana Data, contains open metadata on 20 million texts, images, videos and sounds gathered by Europeana - the trusted and comprehensive resource for European cultural heritage content.
  • Eurostat, the leading provider of high quality statistics on Europe.
  • OECD Data Lab, data visualisations and European data downloads.
  • PublicData.eu, access to open, freely reusable datasets from local, regional and national public bodies across Europe.
  • Data Publica, l'annuaire des donnees en France, public data about France.
  • Paris data.

Germany

Ireland

Russia

UK

Asia

India

  • Census India, data on population, economic activity, literacy, education, housing, urbanisation, fertility, mortality, and more.

Australia, NZ, and Pacific

  • Data.gov.au provides an easy way to find, access and reuse public datasets from the Australian Government.
  • Australian Bureau of Statistics, access to the full range of ABS statistical and reference information.
  • Wiki New Zealand, a collaborative website making data about New Zealand accessible for everyone.

Africa

  • Open Data for Africa, supporting statistical development in Africa as a sound basis for designing and managing effective development policies for reducing poverty on the continent.

Source:

List

 

Source: http://aws.amazon.com/publicdatasets/

List

Available Public Data Sets on AWS

Click here for the detailed list of available data sets. Here are some examples of popular Public Data Sets:

  • NASA NEX: A collection of Earth science data sets maintained by NASA, including climate change projections and satellite images of the Earth's surface
  • Common Crawl Corpus: A corpus of web crawl data composed of over 5 billion web pages
  • 1000 Genomes Project: A detailed map of human genetic variation
    Google Books Ngrams: A data set containing Google Books n-gram corpuses
  • US Census Data: US demographic data from 1980, 1990, and 2000 US Censuses
  • Freebase Data Dump: A data dump of all the current facts and assertions in the Freebase system, an open database covering millions of topics

 

Source: http://kevinchai.net/datasets

List

Blog articles which provide dataset directories

http://conflate.net/inductio/2008/02/a-meta-index-of-data-sets/ – excellent article listing available data sets in the area of machine learning and inference
http://www.datawrangling.com/some-datasets-available-on-the-web.html
http://www.daniel-lemire.com/blog/data-for-data-mining/ – has blog, tag cloud, wiki dataset categories
http://www.kirix.com/blog/category/data-tagssearch/
http://mobblog.cs.ucl.ac.uk/datasets/
http://www.readwriteweb.com/archives/where_to_find_open_data_on_the.php – Article containing a list of available dataset websites

Dataset directories

http://www.quora.com/Data/Where-can-I-get-large-datasets-open-to-the-public – Public datasets listed on a Quora Q&A thread.
http://caw2.barcelonamedia.org/node/7 – Content Analysis for the Web 2.0 (CAW 2.0) Workshop – part of 18th International Conference of the World Wide Web. Contains training and test datasets from Twitter, MySpace, Slashdot, Ciao and Kongregate.
http://kdd.ics.uci.edu/ – has a machine learning repository
http://archive.ics.uci.edu/ml/datasets.html http://ckan.net/ – listing of links to various datasets
http://www.ldc.upenn.edu/Obtaining/ – Linguistic data consortium catalog
http://www.swivel.com/data_sets
http://datamob.org/datasets
http://infochimps.org/
http://www.freebase.com/
http://numbrary.com/
http://theinfo.org/
http://www.trustlet.org/wiki/Repositories_of_datasets
http://del.icio.us/kirixstrata/publicdata
http://services.alphaworks.ibm.com/manyeyes/browse/data?q=null
http://googleresearch.blogspot.com/ – google research has stated thathttp://research.google.com will soon host open-source scientific datasets –http://blog.wired.com/wiredscience/2008/01/google-to-provi.html – watch this space.
http://data.un.org/
http://www.data360.org/index.aspx
http://tunedit.org/search?q=arff – 800 datasets in ARFF format for different problems and application domains
http://wikiposit.com
http://gsociology.icaap.org/dataupload.html – The Global Social Change Research Project – social, political and economic datasets

Data sets for a specific field

http://kaggle.com/ – machine learning competitions with data provided by organisations with prize money
http://theinfo.org/get/data – good list here – pay attention to web/news/blogs and Text/Language categories as well as trust network data
http://research.microsoft.com/nlp/ – look under data sets
http://nlp.stanford.edu/links/statnlp.html – look under corpora
http://trec.nist.gov/data/reuters/reuters.html – Reuters Corpora – contains large collection of news stories for use in Natural Language Processing, Information Retrieval and Machine Learning Systems (need to order CDs)

http://trec.nist.gov/data.html – Text retrieval. Has spam, web, question answering, blog and ad hoc (e.g. relevance judgement) tracks
http://plg.uwaterloo.ca/~gvcormac/treccorpus/ (300MB) – Spam Corpus 2005
http://plg.uwaterloo.ca/~gvcormac/treccorpus06/ (75MB – english, 60MB chinese) – Spam Corpus 2006
http://trec.nist.gov/data/reljudge_eng.html – Relevance Judgement
http://ir.dcs.gla.ac.uk/test_collections/blog06info.html (25GB – costs 400 GBP) – Blog 06 data
http://trec.nist.gov/data/qamain.html – Question Answering (many tracks)
http://trec.nist.gov/data/novelty.html – Novelty (some relevance) -

http://infochimps.org/tag/language/datasets – languages
http://infochimps.org/tag/lexicon/datasets – lexicon
http://infochimps.org/tag/lexical/datasets – lexical

http://wordnet.princeton.edu/ – Lexical database that is handy for computational linguistics and natural language processing
http://www.dmoz.org/Computers/Artificial_Intelligence/Machine_Learning/Datasets/ – Machine learning datasets
http://cervisia.org/machine_learning_data.php – Machine learning datasets – benchmark data for comparing different algorithms of your classifier is recommended fromhttp://www.ci.tuwien.ac.at/~meyer/benchdata/
http://mill.ucsd.edu/index.php?page=Datasets&subpage=Overview
http://www.trustlet.org/wiki/Trust_network_datasets#Released_datasets – Trust datasets – includes Epinions
http://stuff.metafilter.com/infodump/ – Metafilter – contains posts, comments, tags, favourites, contact and user data
http://an.kaist.ac.kr/traces/IMC2007.html – YouTube dataset
http://socialnetworks.mpi-sws.mpg.de/ – social network dataset
http://people.csail.mit.edu/jrennie/20Newsgroups/ – newsgroup dataset
http://www.yr-bcn.es/webspam/datasets/ – Webspam datasets

Link Analysis

http://www.cs.toronto.edu/~tsap/experiments/datasets/index.html
http://www.cs.toronto.edu/~tsap/experiments/download/download.html

Recommender systems

http://www.grouplens.org/ – MovieLens
http://www.ieor.berkeley.edu/~goldberg/jester-data/ – Jester
http://www.netflixprize.com/ – Netflix
http://www.informatik.uni-freiburg.de/~cziegler/BX/ – Book Crossing

Forums

http://weimo.de/node/642 – Nabble.com + user ratings of posts

Blogs

http://ebiquity.umbc.edu/resource/html/id/212/Splog-Blog-Dataset – Spam blogs (splogs)
http://www.icwsm.org/data.html – 14 million posts, 3 million weblogs – apparently no longer available since Dec 8, 2006
http://ir.dcs.gla.ac.uk/test_collections/blog06info.html – but costs 400 GBP!

Wikis

http://labs.systemone.at/wikipedia3 – wikipedia 3 providing wikipedia datasets
http://download.wikipedia.org/ – official wikipedia database dumps (very large)
http://download.freebase.com/wex/ – English wikipedia articles that have been transformed into XML – all files ~ 55GB
http://dbpedia.org/About – structured information from wikipedia – dataset of this is available

Webpages

http://www.archive.org/web/web.php – 85 billion webpages archived since 1996

Misc

http://opentick.com/ – Stock data
http://lib.stat.cmu.edu/datasets/ – miscellaneous datasets
http://lib.stat.cmu.edu/jasadata/ – datasets from Journal of the American Statistical Association
http://musicbrainz.org/ – music dataset
http://www.jigsaw.com/ – directory of company & business professional dataset
http://www.librarything.com/ – library catalogue
http://www.imeem.com/developers – media library
http://www.scribd.com/doc/9582/integrating-wikipediawordnet – article talking about integrating Wordnet and Wikipedia with YAGO (an extensible and light-weight ontology)
http://wiki.openstreetmap.org/index.php/Potential_Datasources – country maps
http://rdf.dmoz.org/ – open directory project dataset

 

 

Source: http://www.quora.com/Data/Where-can-I-find-large-datasets-open-to-the-public

List

Cross-disciplinary data repositories, data collections and data search engines:

 


Single datasets and data repositories

http://archive.ics.uci.edu/ml/
http://crawdad.org/
http://data.austintexas.gov
http://data.cityofchicago.org
http://data.govloop.com
http://data.gov.uk/
http://data.medicare.gov
http://data.seattle.gov
http://data.sfgov.org
http://data.sunlightlabs.com
https://datamarket.azure.com/
http://developer.yahoo.com/geo/g...
http://econ.worldbank.org/datasets
http://en.wikipedia.org/wiki/Wik...
http://factfinder.census.gov/ser...
http://ftp.ncbi.nih.gov/
http://gettingpastgo.socrata.com
http://googleresearch.blogspot.c...
http://books.google.com/ngrams/
http://medihal.archives-ouvertes.fr
http://public.resource.org/
http://rechercheisidore.fr
http://snap.stanford.edu/data/in...
http://timetric.com/public-data/
https://wist.echo.nasa.gov/~wist...
http://www2.jpl.nasa.gov/srtm
http://www.archives.gov/research...
http://www.bls.gov/
http://www.crunchbase.com/
http://www.dartmouthatlas.org/
http://www.data.gov/
http://www.datakc.org
http://dbpedia.org
http://www.delicious.com/jbaldwi...
http://www.factual.com/
http://research.stlouisfed.org/f... 
http://www.freebase.com/
http://www.google.com/publicdata...
http://www.guardian.co.uk/news/d...
http://www.infochimps.com
http://www.kaggle.com/
http://build.kiva.org/
http://www.nationalarchives.gov....
http://www.nyc.gov/html/datamine...
http://www.ordnancesurvey.co.uk/...
http://www.philwhln.com/how-to-g...
http://www.imdb.com/interfaces
http://imat-relpred.yandex.ru/en...
http://www.dados.gov.pt/pt/catal...
http://knoema.com
http://daten.berlin.de/
http://www.qunb.com
http://databib.org/
http://datacite.org/
http://data.reegle.info/
http://data.wien.gv.at/
http://data.gov.bc.ca

 


Some others:
 

 

Source: http://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html

 

datasets-packageThe R Datasets Package

-- A --

ability.covAbility and Intelligence Tests
airmilesPassenger Miles on Commercial US Airlines, 1937-1960
AirPassengersMonthly Airline Passenger Numbers 1949-1960
airqualityNew York Air Quality Measurements
anscombeAnscombe's Quartet of 'Identical' Simple Linear Regressions
attenuThe Joyner-Boore Attenuation Data
attitudeThe Chatterjee-Price Attitude Data
austresQuarterly Time Series of the Number of Australian Residents

-- B --

beaver1Body Temperature Series of Two Beavers
beaver2Body Temperature Series of Two Beavers
beaversBody Temperature Series of Two Beavers
BJsalesSales Data with Leading Indicator
BJsales.leadSales Data with Leading Indicator
BODBiochemical Oxygen Demand

-- C --

carsSpeed and Stopping Distances of Cars
ChickWeightWeight versus age of chicks on different diets
chickwtsChicken Weights by Feed Type
CO2Carbon Dioxide Uptake in Grass Plants
co2Mauna Loa Atmospheric CO2 Concentration
crimtabStudent's 3000 Criminals Data

-- D --

datasetsThe R Datasets Package
discoveriesYearly Numbers of Important Discoveries
DNaseElisa assay of DNase

-- E --

esophSmoking, Alcohol and (O)esophageal Cancer
euroConversion Rates of Euro Currencies
euro.crossConversion Rates of Euro Currencies
eurodistDistances Between European Cities
EuStockMarketsDaily Closing Prices of Major European Stock Indices, 1991-1998

-- F --

faithfulOld Faithful Geyser Data
fdeathsMonthly Deaths from Lung Diseases in the UK
FormaldehydeDetermination of Formaldehyde
freenyFreeny's Revenue Data
freeny.xFreeny's Revenue Data
freeny.yFreeny's Revenue Data

-- H --

HairEyeColorHair and Eye Color of Statistics Students
Harman23.corHarman Example 2.3
Harman74.corHarman Example 7.4

-- I --

IndomethPharmacokinetics of Indomethacin
infertInfertility after Spontaneous and Induced Abortion
InsectSpraysEffectiveness of Insect Sprays
irisEdgar Anderson's Iris Data
iris3Edgar Anderson's Iris Data
islandsAreas of the World's Major Landmasses

-- J --

JohnsonJohnsonQuarterly Earnings per Johnson & Johnson Share

-- L --

LakeHuronLevel of Lake Huron 1875-1972
ldeathsMonthly Deaths from Lung Diseases in the UK
lhLuteinizing Hormone in Blood Samples
LifeCycleSavingsIntercountry Life-Cycle Savings Data
LoblollyGrowth of Loblolly pine trees
longleyLongley's Economic Regression Data
lynxAnnual Canadian Lynx trappings 1821-1934

-- M --

mdeathsMonthly Deaths from Lung Diseases in the UK
morleyMichelson Speed of Light Data
mtcarsMotor Trend Car Road Tests

-- N --

nhtempAverage Yearly Temperatures in New Haven
NileFlow of the River Nile
nottemAverage Monthly Temperatures at Nottingham, 1920-1939
npkClassical N, P, K Factorial Experiment

-- O --

occupationalStatusOccupational Status of Fathers and their Sons
OrangeGrowth of Orange Trees
OrchardSpraysPotency of Orchard Sprays

-- P --

PlantGrowthResults from an Experiment on Plant Growth
precipAnnual Precipitation in US Cities
presidentsQuarterly Approval Ratings of US Presidents
pressureVapor Pressure of Mercury as a Function of Temperature
PuromycinReaction Velocity of an Enzymatic Reaction

-- Q --

quakesLocations of Earthquakes off Fiji

-- R --

randuRandom Numbers from Congruential Generator RANDU
riversLengths of Major North American Rivers
rockMeasurements on Petroleum Rock Samples

-- S --

SeatbeltsRoad Casualties in Great Britain 1969-84
sleepStudent's Sleep Data
stack.lossBrownlee's Stack Loss Plant Data
stack.xBrownlee's Stack Loss Plant Data
stacklossBrownlee's Stack Loss Plant Data
stateUS State Facts and Figures
state.abbUS State Facts and Figures
state.areaUS State Facts and Figures
state.centerUS State Facts and Figures
state.divisionUS State Facts and Figures
state.nameUS State Facts and Figures
state.regionUS State Facts and Figures
state.x77US State Facts and Figures
sunspot.monthMonthly Sunspot Data, from 1749 to "Present"
sunspot.yearYearly Sunspot Data, 1700-1988
sunspotsMonthly Sunspot Numbers, 1749-1983
swissSwiss Fertility and Socioeconomic Indicators (1888) Data

-- T --

TheophPharmacokinetics of Theophylline
TitanicSurvival of passengers on the Titanic
ToothGrowthThe Effect of Vitamin C on Tooth Growth in Guinea Pigs
treeringYearly Treering Data, -6000-1979
treesGirth, Height and Volume for Black Cherry Trees

-- U --

UCBAdmissionsStudent Admissions at UC Berkeley
UKDriverDeathsRoad Casualties in Great Britain 1969-84
UKgasUK Quarterly Gas Consumption
UKLungDeathsMonthly Deaths from Lung Diseases in the UK
USAccDeathsAccidental Deaths in the US 1973-1978
USArrestsViolent Crime Rates by US State
USJudgeRatingsLawyers' Ratings of State Judges in the US Superior Court
USPersonalExpenditurePersonal Expenditure Data
uspopPopulations Recorded by the US Census

-- V --

VADeathsDeath Rates in Virginia (1940)
volcanoTopographic Information on Auckland's Maunga Whau Volcano

-- W --

warpbreaksThe Number of Breaks in Yarn during Weaving
womenAverage Heights and Weights for American Women
WorldPhonesThe World's Telephones
WWWusageInternet Usage per Minute

 

Source: http://www.reddit.com/r/datasets/

List

Source: http://www.nber.org/data/

 


Official Business Cycle Dates
NBER

"The American Business Cycle: Continuity and Change"   Historic Data Tables
Gordon

Experimental Coincident, Leading and Recession Indexes
Stock, Watson

Index of African Governance
Rotberg, Gisselquist

Penn-World Tables
Feenstra, Inklaar, Timmer

Barro-Lee
Barro, Lee

Cross-country Historical Adoption of Technology (CHAT) data
Comin, Hobijn

Economic Policy Uncertainty
Baker, Bloom, Davis

A History of U.S. Foreign-Exchange-Market Interventions
Bordo, Humpage, Schwartz

Occupational Wages around the World
Freeman, Oostendorp

Macro History Database
NBER

Savings, Investment, and Gold in 13 countries (1850-1945)
Jones, Obstfeld

Social Security Pension Reform in Europe
Feldstein, Siebert

Historical Cross-Country Technological Adoption: Dataset
Comin, Hobijn

Facts and Fantasies about Commodity Futures
Gorton, Rouwenhorst

US Industrial Production Index 1790 - 1915
Davis

Industry, Productivity, and Digitization Data


Job Creation and Destruction Data
Haltiwanger et al

Management Practices Data
Bloom, Van Reenen

Manufacturing Industry Productivity Database
Becker, Gray, Marvakov

Internet and Economy Digitization Report
Shiller

Public Sector Collective Bargaining Law Data
Valletta, Freeman

Form 990 data on tax exempt organizations
IRS

International Trade Data


Price Quantity Indexes and Values for U.S. Exports and Imports, 1879-1923
Lipsey

SITC Rev 2 and NAICS (1997)
Feenstra,Lipsey

U.S. Trade by 1972-SIC category, 1958-1994
Feenstra

U.S. Trade by 1987-SIC, 1972-2005; NAICS 1989-2005; HS 1989-2008
Concordance between HS and SIC/NAICS; Concordance of HS codes over time
Schott
Pierce and Schott

U.S. Imports by TSUSA, HS, SITC, 1972-2001
Feenstra

U.S. Imports by SAS and Stata, 1972-2001
Feenstra

U.S. Exports by TSUSA, HS, SITC, 1972-2001
Feenstra

U.S. Exports by SAS and Stata, 1972-2001
Feenstra

U.S. Tariffs, 1989-2001
Romalis

U.S. Antidumping Database and Links
Blonigen

World Trade Data ( choose World Import and Export Data )
Feenstra, Lipsey

Individual Data


Angrist Archive
Joshua Angrist

Boston Youth Labor (Market) Survey, 1980, 1989
Freeman, Katz

Collaborative Perinatal (CPP)
NINCDS

Consumer Expenditure Survey Extracts
Harris, Sabelhaus (CBO)

Current Population Survey
BLS

Fatality Analysis Reporting System (FARS) Data
NHTSA

Gould Sample
Costa

National Health and Nutrition Examination Survey (NHANES)
NCHS

Reading National Health Interview Survey (NHIS) Data with SAS, SPSS, or Stata
Roth

Survey of Economic Expectations
Dominitz, Manski

Survey of Income and Program Participation
Census

Survey of Program Dynamics
Census

Thorndike-Hagen
Thorndike, Hagen

Union Army Data Set
Fogel

Worker Representation survey
Freeman, Rogers

Hospital/Provider Data


CMS' Prospective Payment System (PPS)
CMS

Reading CMS'  Healthcare Cost Report Information System (HCRIS) datasets using SAS
CMS

CMS's National Plan and Provider Enumeration System (NPPES) Files
CMS

CMS' National Provider Identifier (NPI) to Unique Physician Identification Number (UPIN) Crosswalk
CMS

CMS' National Provider Identifier (NPI) to State License Crosswalk
CMS

CMS' Provider of Service (POS) files
CMS

CMS' Medicare Provider Charge Data
CMS

CMS' ICD-9-CM to and from ICD-10-CM and ICD-10-PCS Crosswalk or General Equivalence Mappings
CMS

CMS's CBSA, MSA, and State Wage Index Files
CMS

CMS' SSA to FIPS CBSA and MSA County Crosswalks
CMS

CMS' SSA to FIPS State and County Crosswalks
CMS

Demographic and Vital Statistics


Vital Statistics Books ( Historical )
NCHS

Vital Statistics Births
NCHS

Interactive index to Vital Statistics Births 1931-1968
NCHS

Reading SEER U.S. County Population Data with SAS, SPSS, or Stata 1969-on
Roth

Vital Statistics Births and Infant Mortality 1920-1945
Cutler, Norberg, Norton

Vital Statistics Births 1940-1968
Finkelstein, Heidi Williams

Vital Statistics Mortality Data
NCHS

Vital Statistics Deaths - Historical 1900 - 1936
Grant Miller

Vital Statistics Marriage and Divorce
NCHS

US Decennial Population by County and State 1900-1990
Roth

US Intercensal Population by County and State 1970-2009
Roth, James Wang

US Intercensal Population by State, Age and Sex 1970-1999
Census

Work-Family Policies and Other Data
Waldfogel, Han, Ruhm

 

Patent and Scientific Papers Data

 


U.S. Patents
Hall, Jaffe, Tratjenberg

NBER-Rensselaer Polytechnic Institute Scientific Papers Database
Adams, Clemmons

Nobel Laureate Data
Jones, Weinberg

Other Data

 

  • NBER
  • NCES
  • Feenberg
  • Cutler, Glaeser, Vigdor
  • Wallis
  • Lichtenberg
  • Borenstein
  • Roth
  • Roth
  • Roth
  • Olken
  • Lahey
  • NBER
  • Norberg
  •  

 

Source: http://www.wto.org/english/res_e/statis_e/data_pub_e.htm

List

 

Source: http://www.imf.org/external/data.htm

  • World Economic Outlook Databases (WEO) updated
  • International Financial Statistics (IFS)
  • Principal Global Indicators (PGI)
  • Balance of Payments Statistics (BOPS)
  • Coordinated Direct Investment Survey (CDIS)
  • Coordinated Portfolio Investment Survey (CPIS) updated
  • Currency Composition of Official Foreign Exchange Reserves (COFER)
  • Data Template on International Reserves and Foreign Currency Liquidity
  • Financial Access Survey (FAS)
  • Financial Soundness Indicators (FSIs)
  • G-20 Surveillance Notes
  • Joint External Debt Hub
  • Monitoring of Fund Arrangements Database (MONA)
  • Primary Commodity Prices
  • Public Sector Debt Statistics Online Centralized Database
  • Quarterly External Debt Statistics (QEDS)

 

Source: http://blog.visual.ly/data-sources/

 Government and political data

  • Data.gov: This is the  go-to resource for government-related data. It claims to have up to 400,000 data sets, both raw data and geo spatial, in a variety of formats.
  • The only caveat in using the data sets is you have to make sure you clean them, since many have missing values and characters.
  • Socrata is another good place to explore government-related data. One great thing about Socrata is they have some visualization tools that make exploring the data easier.
  • City-specific government data: Some cities have their own data portals setup to browse through city-related data. For example, at San Francisco Data you can browse through everything from crime statistics to parking spot available in the city.
  • The UN and UN-related sites like UNICEF and the World Health Organization are rich with all kinds of data, from mortality rates to world hunger statistics.
  • The Census Bureau houses a ton of information about our lives around income, race, education, population and business.

Data aggregators

These are the places that house data from all kinds of sources. Sometimes it’s easier to find something here related to a specific category.

  • Programmable Web: A really useful resource to explore API’s and also mashups of different API’s.
  • Infochimps have a data marketplace that offers thousands of public and propietary data sets for download and API access, in a wide range of categories, from historical Twitter and OK Cupid data, to geo locations data, in different formats. You can even upload you own data if you like.
  • Data Market is a good place to explore data related to economics, healthcare, food and agriculture, and the automotive industry.
  • Google Public data explorer houses a lot of data from world development indicators, OECD and human development indicators, mostly related to economics data and the world.
  • Junar is a great data scraping service that also houses data feeds.
  • Buzzdata is a social data sharing service that allows you to upload your own data and connect and follow others who are uploading their own data.

3. Social data

Usually, the best place to get social data for an API is the site itself: InstagramGetGlue, Foursquare, pretty much all social media sites have their own API’s. Here are more details on the most popular ones.

  • Twitter: Access to the Twitter API for historical uses is fairly limited, to 3200 tweets. For more, check out PeopleBrowsr,  Gnip (also offers historical access to the WP Automattic data feed),DataSiftInfochimpsTopsy.
  • Foursquare: They have their own API and you can get it through Infochimps, as well.
  • FacebookThe Facebook graph API is the best resource for Facebook.
  • Face.com: A great tool for facial recognition data.

4. Weather data

  • Wunderground has detailed weather information and also let’s you search historical data by zip code or city. It gives temperature, wind, precipitation and hourly observations for that day.
  • Weatherbase has detailed weather stats on temperature, rain and humidity of nearly 27,000 cities.

5. Sports data

These three sites have comprehensive information on teams, players coaches and leaders by season.

ESPN recently came up with its own API, too. You have to be a partner to get access to their data. 

6. Universities and research

Searching the work of academics who specialize in a particular area is always a great place to find some interesting data.

If you come across specific data that you would like to use, say, in a research paper, the best way to go is to contact the professor directly. (That is how we got the data for our What are the Odds piece, which is one of the most-viewed infographics on the web.)

One university that makes some of the datasets used in its courses publicly available is UCLA.

7. News data

The New York Times has a great API and a really good explorer to access any article in the publication. The data is returned in json format.

The Guardian Data Blog regularly posts visualizations and makes data available through a Google docs format. The great thing about this is that that the data has already been cleaned.

CDC Data - Source: http://www.cdc.gov/ncbddd/disabilityandhealth/datasets.html

Behavioral Risk Factor Surveillance System (BRFSS)
The BRFSS is a telephone survey that tracks national and state-specific health risk behaviors of adults, 18 years of age or older, residing in the United States. The BRFSS is conducted by the 50 states, the District of Columbia, and three territories (Guam, Puerto Rico, and the U.S. Virgin Islands) and is administered and supported by the Division of Adult and Community Health, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention (CDC).

National Health Interview Survey (NHIS)
The NHIS is a multi-purpose, nationwide household health survey of the U.S. civilian noninstitutionalized population conducted annually by the National Center for Health Statistics (NCHS), CDC, to produce national estimates for a variety of health indicators. In 1994 and 1995, the NHIS included a special supplement on disability.

National Health and Nutrition Examination Survey (NHANES) 
NHANES is a program of studies designed to assess the health and nutritional status of adults and children in the United States. The survey is unique in that it combines information from interviews and physical examinations.

National Survey of Family Growth (NSFG)
The NSFG gathers information on family life, marriage and divorce, pregnancy, infertility, use of contraception, and men's and women's health. The survey results are used by the U.S. Department of Health and Human Services and others to plan health services and health education programs, and to do statistical studies of families, fertility, and health.

American Community Survey (ACS)
The ACS is a mail survey that provides demographic, socioeconomic, and housing information about communities in between the 10-year census. The ACS is conducted by the U.S. Census Bureau. The survey is sent to a sample of households in the United States. The ACS identifies serious difficulty in four basic areas of functioning: vision, hearing, ambulation, and cognition. The ACS also includes two questions to identify people with difficulties that might affect their ability to live independently.

Medical Expenditure Panel Survey (MEPS) 
The MEPS comprise a set of large-scale surveys of families and individuals, their medical providers, and employers across the United States. The MEPS is the most complete source of data on the cost and use of health care and health insurance coverage.

Survey of Income and Program Participation (SIPP)
The SIPP is a multipanel, longitudinal survey conducted by the U.S. Census Bureau. The SIPP covers the civilian, noninstitutionalized population of residents of the United States, and collects data on the sources and amount of individual income, labor force information, program participation and eligibility data, and general demographic characteristics. The SIPP also includes disability supplements that ask questions to determine individual disability status.

Current Population Survey (CPS) 
The CPS is a monthly survey of about 50,000 households conducted by the U.S. Bureau of the Census for the Bureau of Labor Statistics. The survey has been conducted for more than 50 years. In June 2008, questions were added to the CPS to identify people with a disability among the civilian noninstitutional population 16 years of age or older. Monthly labor force data are released from the CPS for people with a disability. The collection of these data is sponsored by the Department of Labor’s Office of Disability Employment.

Personality Testing - Source: http://personality-testing.info/_rawdata/

UpdatedDescriptionVariablesnDownload
5/14/2014Answers to Cattell's 16 Personality Factors Test with items from the IPIP.163 likert rated items, gender, age, country and accuracy.4915916PF
9/6/2012Answers to the Narcissistic Personality Inventory, constructed with the version from Raskin and Terry (1988).40 multiple choice, gender, age, time elapsed11243NPI
6/18/2012Answers to the Machivallianism Test, a version of the MACH-IV from Christie and Geis (1970).20 likert rated items, gender, age, time elapsed13156MACH2
5/18/2014Answers to the Big Five Personality Test, constructed with items from the International Personality Item Pool.50 likert rated statements, gender, age, race, native language, country19719BIG5
7/22/2012Answers to the Taylor Manifest Anxiety Scale, from Taylor (1953).50 true false statements, gender, age5410TMA
9/6/2012Answers to the Humor Styles Questionnaire, from Martin et. al. (2003).32 likert rated items, gender, age, self-rated accuracy1071HSQ
7/16/2012Answers to the Empathizing-Systemizing Test, a combined version of Simon Baron-Cohen's empathizing and systemizing quotients.120 likert rated items, gender, age, self-rated accuracy13256EQSQ
8/5/2013Answers to the Holland Code (RIASEC) Test, constructed with public domain items from the Interest Item Pool.48 likert rated statements, gender, age, country, time elapsed and self-rated accuracy.8855RIASEC
7/16/2012Answers to the Sexual Compulsivity Scale from Kalichman and Rompa (1995).10 likert rated statements, gender, age3376SCS
7/18/2012Answers to the IPIP Assertiveness, Social confidence, Adventurousness, and Dominance scales used as part of an experimental personality test.40 likert rated items, gender, age1005AS+SC+AD+DO
2/15/2014Answers to the Rosenberg Self-Esteem Scale.10 scale rated items, gender, age, country47974RSE
5/25/2012Answers to an experimental IQ Test previously offered on this website.25 questions/answers, age, gender.400IQ1
5/25/2012Answers to a sentence completion survey appended to the Holland Code and big five personality tests; at completion of either test takers were solicited to participate (most did).6 incomplete sentence responses, gender, age, and big five or RIASEC traits.1425SENTANCES1
8/6/2013Answers to the Experinces in Close Relationships Scale.36 likert rated items, gender, age, county.17386ECR
9/26/2012Answers to the Consideration of Future Consequences Scale.12 likert rated items, gender, age, self-rated accuracy.614CFCS
8/7/2012Answers to the Kentucky Inventory of Mindfulness Skills from Baer, Smith and Allen (2004).39 likert rated items, gender, age.601KIMS
9/6/2012Answers to the Multidimensional Sexual Self-Concept Questionnaire.100 likert rated items, gender, age and context.289MSSCQ
8/8/2013Answers to the Woodworth Psychoneurotic Inventory.116 yes/no questions, gender, age and country.6019WPI
12/8/2013Answers to the Hypersensitive Narcissism Scale and The Dirty Dozen.22 scale rated items, gender, age, accuracy and country.53981HSNS+DD
3/8/2014Answers to the Short Dark Triad by Paulhus and Jones (2011).27 scale rated items and country.18192SD3
4/21/2014Answers to the Feminist Perspectives Scale, from Henley, N.; Meng, K.; O'Brien, D.; McCarthy, W.; Sockloskie, R. (1998). "Developing a Scale to Measure the Diversity of Feminist Attitudes". Psychology of Women Quarterly, 22(2), 317-348.60 scale rated items, gender, age, country.13477FPS
5/21/2014Answers to the Wagner Preference Inventory, from Wagner, Rudolph F., and Kelly A. Wells. "A refined neurobehavioral inventory of hemispheric preference." Journal of clinical psychology 41.5 (1985): 671-676.12 multiple choice questions, country13502Wagner
5/23/2014A user generated corpus of personality test items from a short survey were users prompted to generate descriptions of what was unqiue about their personality.3 free response, age, gender, native language, country2722itemsgen
6/21/2014Answers to the IPIP HEXACO equivalent scales.240 scale rated items, country22786HEXACO
 

 

 

Source: Awesome Public Datasets

https://github.com/caesar0301/awesome-public-datasets

 

Agriculture

 

Biology

 

Climate/Weather

 

Complex Networks

 

Computer Networks

 

Contextual Data

 

Data Challenges

 

Economics

 

Education

 

Energy

 

Finance

 

Geology

 

GIS/Environment

 

Government

 

Healthcare

 

Image Processing

 

Machine Learning

 

Museums

 

Natural Language

 

Physics

 

Psychology/Cognition

 

Public Domains

 

Search Engines

 

Social Networks

 

Social Sciences

 

Software

 

Sports

 

Time Series

 

Transportation

 

Complementary Collections

 

Source: Neo4J

https://neo4j.com/developer/example-data/

Source:  Vincent Arel-Bundock

https://vincentarelbundock.github.io/Rdatasets/datasets.html

List

 

PackageItemTitlecsvdoc
datasetsAirPassengersMonthly Airline Passenger Numbers 1949-1960CSV DOC
datasetsBJsalesSales Data with Leading IndicatorCSV DOC
datasetsBODBiochemical Oxygen DemandCSV DOC
datasetsCO2Carbon Dioxide Uptake in Grass PlantsCSV DOC
datasetsFormaldehydeDetermination of FormaldehydeCSV DOC
datasetsHairEyeColorHair and Eye Color of Statistics StudentsCSV DOC
datasetsInsectSpraysEffectiveness of Insect SpraysCSV DOC
datasetsJohnsonJohnsonQuarterly Earnings per Johnson & Johnson ShareCSV DOC
datasetsLakeHuronLevel of Lake Huron 1875-1972CSV DOC
datasetsLifeCycleSavingsIntercountry Life-Cycle Savings DataCSV DOC
datasetsNileFlow of the River NileCSV DOC
datasetsOrchardSpraysPotency of Orchard SpraysCSV DOC
datasetsPlantGrowthResults from an Experiment on Plant GrowthCSV DOC
datasetsPuromycinReaction Velocity of an Enzymatic ReactionCSV DOC
datasetsTitanicSurvival of passengers on the TitanicCSV DOC
datasetsToothGrowthThe Effect of Vitamin C on Tooth Growth in Guinea PigsCSV DOC
datasetsUCBAdmissionsStudent Admissions at UC BerkeleyCSV DOC
datasetsUKDriverDeathsRoad Casualties in Great Britain 1969-84CSV DOC
datasetsUKgasUK Quarterly Gas ConsumptionCSV DOC
datasetsUSAccDeathsAccidental Deaths in the US 1973-1978CSV DOC
datasetsUSArrestsViolent Crime Rates by US StateCSV DOC
datasetsUSJudgeRatingsLawyers' Ratings of State Judges in the US Superior CourtCSV DOC
datasetsUSPersonalExpenditurePersonal Expenditure DataCSV DOC
datasetsVADeathsDeath Rates in Virginia (1940)CSV DOC
datasetsWWWusageInternet Usage per MinuteCSV DOC
datasetsWorldPhonesThe World's TelephonesCSV DOC
datasetsairmilesPassenger Miles on Commercial US Airlines, 1937-1960CSV DOC
datasetsairqualityNew York Air Quality MeasurementsCSV DOC
datasetsanscombeAnscombe's Quartet of 'Identical' Simple Linear RegressionsCSV DOC
datasetsattenuThe Joyner-Boore Attenuation DataCSV DOC
datasetsattitudeThe Chatterjee-Price Attitude DataCSV DOC
datasetsaustresQuarterly Time Series of the Number of Australian ResidentsCSV DOC
datasetscarsSpeed and Stopping Distances of CarsCSV DOC
datasetschickwtsChicken Weights by Feed TypeCSV DOC
datasetsco2Mauna Loa Atmospheric CO2 ConcentrationCSV DOC
datasetscrimtabStudent's 3000 Criminals DataCSV DOC
datasetsdiscoveriesYearly Numbers of Important DiscoveriesCSV DOC
datasetsesophSmoking, Alcohol and (O)esophageal CancerCSV DOC
datasetseuroConversion Rates of Euro CurrenciesCSV DOC
datasetsfaithfulOld Faithful Geyser DataCSV DOC
datasetsfreenyFreeny's Revenue DataCSV DOC
datasetsinfertInfertility after Spontaneous and Induced AbortionCSV DOC
datasetsirisEdgar Anderson's Iris DataCSV DOC
datasetsislandsAreas of the World's Major LandmassesCSV DOC
datasetslhLuteinizing Hormone in Blood SamplesCSV DOC
datasetslongleyLongley's Economic Regression DataCSV DOC
datasetslynxAnnual Canadian Lynx trappings 1821-1934CSV DOC
datasetsmorleyMichelson Speed of Light DataCSV DOC
datasetsmtcarsMotor Trend Car Road TestsCSV DOC
datasetsnhtempAverage Yearly Temperatures in New HavenCSV DOC
datasetsnottemAverage Monthly Temperatures at Nottingham, 1920-1939CSV DOC
datasetsnpkClassical N, P, K Factorial ExperimentCSV DOC
datasetsoccupationalStatusOccupational Status of Fathers and their SonsCSV DOC
datasetsprecipAnnual Precipitation in US CitiesCSV DOC
datasetspresidentsQuarterly Approval Ratings of US PresidentsCSV DOC
datasetspressureVapor Pressure of Mercury as a Function of TemperatureCSV DOC
datasetsquakesLocations of Earthquakes off FijiCSV DOC
datasetsranduRandom Numbers from Congruential Generator RANDUCSV DOC
datasetsriversLengths of Major North American RiversCSV DOC
datasetsrockMeasurements on Petroleum Rock SamplesCSV DOC
datasetssleepStudent's Sleep DataCSV DOC
datasetsstacklossBrownlee's Stack Loss Plant DataCSV DOC
datasetssunspot.monthMonthly Sunspot Data, from 1749 to "Present"CSV DOC
datasetssunspot.yearYearly Sunspot Data, 1700-1988CSV DOC
datasetssunspotsMonthly Sunspot Numbers, 1749-1983CSV DOC
datasetsswissSwiss Fertility and Socioeconomic Indicators (1888) DataCSV DOC
datasetstreeringYearly Treering Data, -6000-1979CSV DOC
datasetstreesGirth, Height and Volume for Black Cherry TreesCSV DOC
datasetsuspopPopulations Recorded by the US CensusCSV DOC
datasetsvolcanoTopographic Information on Auckland's Maunga Whau VolcanoCSV DOC
datasetswarpbreaksThe Number of Breaks in Yarn during WeavingCSV DOC
datasetswomenAverage Heights and Weights for American WomenCSV DOC
bootacmeMonthly Excess ReturnsCSV DOC
bootaidsDelay in AIDS Reporting in England and WalesCSV DOC
bootairconditFailures of Air-conditioning EquipmentCSV DOC
bootaircondit7Failures of Air-conditioning EquipmentCSV DOC
bootamisCar Speeding and Warning SignsCSV DOC
bootamlRemission Times for Acute Myelogenous LeukaemiaCSV DOC
bootbigcityPopulation of U.S. CitiesCSV DOC
bootbramblesSpatial Location of Bramble CanesCSV DOC
bootbreslowSmoking Deaths Among DoctorsCSV DOC
bootcalciumCalcium Uptake DataCSV DOC
bootcaneSugar-cane Disease DataCSV DOC
bootcapabilitySimulated Manufacturing Process DataCSV DOC
bootcatsMWeight Data for Domestic CatsCSV DOC
bootcavPosition of Muscle CaveolaeCSV DOC
bootcd4CD4 Counts for HIV-Positive PatientsCSV DOC
bootchanningChanning House DataCSV DOC
bootcityPopulation of U.S. CitiesCSV DOC
bootclaridgeGenetic Links to Left-handednessCSV DOC
bootclothNumber of Flaws in ClothCSV DOC
bootco.transferCarbon Monoxide TransferCSV DOC
bootcoalDates of Coal Mining DisastersCSV DOC
bootdarwinDarwin's Plant Height DifferencesCSV DOC
bootdogsCardiac Data for Domestic DogsCSV DOC
bootdowns.bcIncidence of Down's Syndrome in British ColumbiaCSV DOC
bootducksBehavioral and Plumage Characteristics of Hybrid DucksCSV DOC
bootfirCounts of Balsam-fir SeedlingsCSV DOC
bootfretsHead Dimensions in BrothersCSV DOC
bootgravAcceleration Due to GravityCSV DOC
bootgravityAcceleration Due to GravityCSV DOC
boothiroseFailure Time of PET FilmCSV DOC
bootislayJura Quartzite Azimuths on IslayCSV DOC
bootmanausAverage Heights of the Rio Negro river at ManausCSV DOC
bootmelanomaSurvival from Malignant MelanomaCSV DOC
bootmotorData from a Simulated Motorcycle AccidentCSV DOC
bootneuroNeurophysiological Point Process DataCSV DOC
bootnitrofenToxicity of Nitrofen in Aquatic SystemsCSV DOC
bootnodalNodal Involvement in Prostate CancerCSV DOC
bootnuclearNuclear Power Station Construction DataCSV DOC
bootpaulsenNeurotransmission in Guinea Pig BrainsCSV DOC
bootpoisonsAnimal Survival TimesCSV DOC
bootpolarPole Positions of New Caledonian LateritesCSV DOC
bootremissionCancer Remission and Cell ActivityCSV DOC
bootsalinityWater Salinity and River DischargeCSV DOC
bootsurvivalSurvival of Rats after Radiation DosesCSV DOC
boottauTau Particle Decay ModesCSV DOC
boottunaTuna Sighting DataCSV DOC
booturineUrine Analysis DataCSV DOC
bootwoolAustralian Relative Wool PricesCSV DOC
KMsurvaidsdata from Section 1.19CSV DOC
KMsurvalloautodata from Section 1.9CSV DOC
KMsurvallograftdata from Exercise 13.1, p418CSV DOC
KMsurvaztdata from Exercise 4.7, p122CSV DOC
KMsurvbaboondata from Exercise 5.8, p147CSV DOC
KMsurvbcdeterdata from Section 1.18CSV DOC
KMsurvbfeeddata from Section 1.14CSV DOC
KMsurvbmtdata from Section 1.3CSV DOC
KMsurvbnctdata from Exercise 7.7, p223CSV DOC
KMsurvbtrialdata from Section 1.5CSV DOC
KMsurvburndata from Section 1.6CSV DOC
KMsurvchanningdata from Section 1.16CSV DOC
KMsurvdrug6mpdata from Section 1.2CSV DOC
KMsurvdrughivdata from Exercise 7.6, p222CSV DOC
KMsurvhodgdata from Section 1.10CSV DOC
KMsurvkidneydata from Section 1.4CSV DOC
KMsurvkidrecurrData on 38 individuals using a kidney dialysis machineCSV DOC
KMsurvkidtrandata from Section 1.7CSV DOC
KMsurvlarynxdata from Section 1.8CSV DOC
KMsurvlungdata from Exercise 4.4, p120CSV DOC
KMsurvpneumondata from Section 1.13CSV DOC
KMsurvpsychdata from Section 1.15CSV DOC
KMsurvratsdata from Exercise 7.13, p225CSV DOC
KMsurvstddata from Section 1.12CSV DOC
KMsurvstddiagdata from Exercise 5.6, p146CSV DOC
KMsurvtonguedata from Section 1.11CSV DOC
KMsurvtwinsdata from Exercise 7.14, p225CSV DOC
robustbaseAnimals2Brain and Body Weights for 65 Species of Land AnimalsCSV DOC
robustbaseCrohnDCrohn's Disease Adverse Events DataCSV DOC
robustbaseNOxEmissionsNOx Air Pollution DataCSV DOC
robustbaseSiegelsExSiegel's Exact Fit Example DataCSV DOC
robustbaseaircraftAircraft DataCSV DOC
robustbaseairmayAir Quality DataCSV DOC
robustbasealcoholAlcohol Solubility in Water DataCSV DOC
robustbaseambientNOxCHDaily Means of NOx (mono-nitrogen oxides) in airCSV DOC
robustbasebiomassTillBiomass Tillage DataCSV DOC
robustbasebushfireCampbell Bushfire DataCSV DOC
robustbasecarrotsInsect Damages on CarrotsCSV DOC
robustbasecloudCloud point of a LiquidCSV DOC
robustbasecolemanColeman Data SetCSV DOC
robustbasecondrozCondroz DataCSV DOC
robustbasecushnyCushny and Peebles Prolongation of Sleep DataCSV DOC
robustbasedeliveryDelivery Time DataCSV DOC
robustbaseeducationEducation Expenditure DataCSV DOC
robustbaseepilepsyEpilepsy Attacks Data SetCSV DOC
robustbaseexAMExample Data of Antille and May - for Simple RegressionCSV DOC
robustbasefoodstampFood Stamp Program ParticipationCSV DOC
robustbasehbkHawkins, Bradu, Kass's Artificial DataCSV DOC
robustbaseheartHeart Catherization DataCSV DOC
robustbasekootenayWaterflow Measurements of Kootenay River in Libby and NewgateCSV DOC
robustbaselacticLactic Acid Concentration Measurement DataCSV DOC
robustbasemilkDaudin's Milk Composition DataCSV DOC
robustbasepensionPension Funds DataCSV DOC
robustbasephosphorPhosphorus Content DataCSV DOC
robustbasepilotPilot-Plant DataCSV DOC
robustbasepossumDivPossum Diversity DataCSV DOC
robustbasepulpfiberPulp Fiber and Paper DataCSV DOC
robustbaseradarImageSatellite Radar Image Data from near MunichCSV DOC
robustbasesalinitySalinity DataCSV DOC
robustbasestarsCYGHertzsprung-Russell Diagram Data of Star Cluster CYG OB1CSV DOC
robustbasetelefNumber of International Calls from BelgiumCSV DOC
robustbasetoxicityToxicity of Carboxylic Acids DataCSV DOC
robustbasevasoVaso Constriction Skin Data SetCSV DOC
robustbasewagnerGrowthWagner's Hannover Employment Growth DataCSV DOC
robustbasewoodModified Data on Wood Specific GravityCSV DOC
carAMSsurveyAmerican Math Society Survey DataCSV DOC
carAdlerExperimenter ExpectationsCSV DOC
carAngellMoral Integration of American CitiesCSV DOC
carAnscombeU. S. State Public-School ExpendituresCSV DOC
carBaumannMethods of Teaching Reading ComprehensionCSV DOC
carBfoxCanadian Women's Labour-Force ParticipationCSV DOC
carBlackmoreExercise Histories of Eating-Disordered and Control SubjectsCSV DOC
carBurtFraudulent Data on IQs of Twins Raised ApartCSV DOC
carCanPopCanadian Population DataCSV DOC
carChileVoting Intentions in the 1988 Chilean PlebisciteCSV DOC
carChirotThe 1907 Romanian Peasant RebellionCSV DOC
carCowlesCowles and Davis's Data on VolunteeringCSV DOC
carDavisSelf-Reports of Height and WeightCSV DOC
carDavisThinDavis's Data on Drive for ThinnessCSV DOC
carDepredationsMinnesota Wolf Depredation DataCSV DOC
carDuncanDuncan's Occupational Prestige DataCSV DOC
carEricksenThe 1980 U.S. Census UndercountCSV DOC
carFloridaFlorida County VotingCSV DOC
carFreedmanCrowding and Crime in U. S. Metropolitan AreasCSV DOC
carFriendlyFormat Effects on RecallCSV DOC
carGinzbergData on DepressionCSV DOC
carGreeneRefugee AppealsCSV DOC
carGuyerAnonymity and CooperationCSV DOC
carHartnagelCanadian Crime-Rates Time SeriesCSV DOC
carHighway1Highway AccidentsCSV DOC
carKosteckiDillonTreatment of Migraine HeadachesCSV DOC
carLeinhardtData on Infant-MortalityCSV DOC
carLoBDCancer drug data use to provide an example of the use of the skew power distributions.CSV DOC
carMandelContrived Collinear DataCSV DOC
carMigrationCanadian Interprovincial Migration DataCSV DOC
carMooreStatus, Authoritarianism, and ConformityCSV DOC
carMrozU.S. Women's Labor-Force ParticipationCSV DOC
carOBrienKaiserO'Brien and Kaiser's Repeated-Measures DataCSV DOC
carOrnsteinInterlocking Directorates Among Major Canadian FirmsCSV DOC
carPotteryChemical Composition of PotteryCSV DOC
carPrestigePrestige of Canadian OccupationsCSV DOC
carQuartetFour Regression DatasetsCSV DOC
carRobeyFertility and ContraceptionCSV DOC
carSLIDSurvey of Labour and Income DynamicsCSV DOC
carSahlinsAgricultural Production in Mazulu VillageCSV DOC
carSalariesSalaries for ProfessorsCSV DOC
carSoilsSoil Compositions of Physical and Chemical CharacteristicsCSV DOC
carStatesEducation and Related Statistics for the U.S. StatesCSV DOC
carTransactTransaction dataCSV DOC
carUNGDP and Infant MortalityCSV DOC
carUSPopPopulation of the United StatesCSV DOC
carVocabVocabulary and EducationCSV DOC
carWeightLossWeight Loss DataCSV DOC
carWomenlfCanadian Women's Labour-Force ParticipationCSV DOC
carWongPost-Coma Recovery of IQCSV DOC
carWoolWool dataCSV DOC
clusteragricultureEuropean Union Agricultural WorkforcesCSV DOC
clusteranimalsAttributes of AnimalsCSV DOC
clusterchorSubSubset of C-horizon of Kola DataCSV DOC
clusterflowerFlower CharacteristicsCSV DOC
clusterplantTraitsPlant Species Traits DataCSV DOC
clusterplutonIsotopic Composition Plutonium BatchesCSV DOC
clusterruspiniRuspini DataCSV DOC
clustervotes.repubVotes for Republican Candidate in Presidential ElectionsCSV DOC
clusterxclaraBivariate Data Set with 3 ClustersCSV DOC
COUNTaffairsaffairsCSV DOC
COUNTazcabgptcaazcabgptcaCSV DOC
COUNTazdrg112azdrg112CSV DOC
COUNTazproazproCSV DOC
COUNTazprocedureazprocedureCSV DOC
COUNTbadhealthbadhealthCSV DOC
COUNTfasttrakgfasttrakgCSV DOC
COUNTfishingfishingCSV DOC
COUNTlbwlbwCSV DOC
COUNTlbwgrplbwgrpCSV DOC
COUNTloomisloomisCSV DOC
COUNTmdvismdvisCSV DOC
COUNTmedparmedparCSV DOC
COUNTnutsnutsCSV DOC
COUNTrwmrwmCSV DOC
COUNTrwm1984rwm1984CSV DOC
COUNTrwm5yrrwm5yrCSV DOC
COUNTshipsshipsCSV DOC
COUNTsmokingsmokingCSV DOC
COUNTtitanictitanicCSV DOC
COUNTtitanicgrptitanicgrpCSV DOC
EcdatAccidentShip AccidentsCSV DOC
EcdatAirlineCost for U.S. AirlinesCSV DOC
EcdatAirqAir Quality for Californian Metropolitan AreasCSV DOC
EcdatBenefitsUnemployement of Blue Collar WorkersCSV DOC
EcdatBidsBids Received By U.S. FirmsCSV DOC
EcdatBudgetFoodBudget Share of Food for Spanish HouseholdsCSV DOC
EcdatBudgetItalyBudget Shares for Italian HouseholdsCSV DOC
EcdatBudgetUKBudget Shares of British HouseholdsCSV DOC
EcdatBwagesWages in BelgiumCSV DOC
EcdatCPSch3Earnings from the Current Population SurveyCSV DOC
EcdatCRANpackagesGrowth of CRANCSV DOC
EcdatCapmStock Market DataCSV DOC
EcdatCarStated Preferences for Car ChoiceCSV DOC
EcdatCaschoolThe California Test Score Data SetCSV DOC
EcdatCatsupChoice of Brand for CatsupCSV DOC
EcdatCigarCigarette ConsumptionCSV DOC
EcdatCigaretteThe Cigarette Consumption Panel Data SetCSV DOC
EcdatClothingSales Data of Men's Fashion StoresCSV DOC
EcdatComputersPrices of Personal ComputersCSV DOC
EcdatCrackerChoice of Brand for CrakersCSV DOC
EcdatCrimeCrime in North CarolinaCSV DOC
EcdatDMDM Dollar Exchange RateCSV DOC
EcdatDiamondPricing the C's of Diamond StonesCSV DOC
EcdatDoctorNumber of Doctor VisitsCSV DOC
EcdatDoctorAUSDoctor Visits in AustraliaCSV DOC
EcdatDoctorContactsContacts With Medical DoctorCSV DOC
EcdatEarningsEarnings for Three Age GroupsCSV DOC
EcdatElectricityCost Function for Electricity ProducersCSV DOC
EcdatFairExtramarital Affairs DataCSV DOC
EcdatFatalityDrunk Driving Laws and Traffic DeathsCSV DOC
EcdatFishingChoice of Fishing ModeCSV DOC
EcdatForwardExchange Rates of US Dollar Against Other CurrenciesCSV DOC
EcdatFriendFoeData from the Television Game Show Friend Or Foe ?CSV DOC
EcdatGarchDaily Observations on Exchange Rates of the US Dollar Against Other CurrenciesCSV DOC
EcdatGasolineGasoline ConsumptionCSV DOC
EcdatGrilichesWage DatasCSV DOC
EcdatGrunfeldGrunfeld Investment DataCSV DOC
EcdatHCHeating and Cooling System Choice in Newly Built Houses in CaliforniaCSV DOC
EcdatHHSCyberSecurityBreachesCybersecurity breaches reported to the US Department of Health and Human ServicesCSV DOC
EcdatHIHealth Insurance and Hours Worked By WivesCSV DOC
EcdatHdmaThe Boston HDMA Data SetCSV DOC
EcdatHeatingHeating System Choice in California HousesCSV DOC
EcdatHedonicHedonic Prices of Cencus Tracts in BostonCSV DOC
EcdatHousingSales Prices of Houses in the City of WindsorCSV DOC
EcdatIcecreamIce Cream ConsumptionCSV DOC
EcdatJournalsEconomic Journals Dat SetCSV DOC
EcdatKakaduWillingness to Pay for the Preservation of the Kakadu National ParkCSV DOC
EcdatKetchupChoice of Brand for KetchupCSV DOC
EcdatKleinKlein's Model ICSV DOC
EcdatLaborSupplyWages and Hours WorkedCSV DOC
EcdatLabourBelgian FirmsCSV DOC
EcdatMCASThe Massashusets Test Score Data SetCSV DOC
EcdatMalesWages and Education of Young MalesCSV DOC
EcdatMathlevelLevel of Calculus Attained for Students Taking Advanced Micro-economicsCSV DOC
EcdatMedExpStructure of Demand for Medical CareCSV DOC
EcdatMetalProduction for SIC 33CSV DOC
EcdatModeMode ChoiceCSV DOC
EcdatModeChoiceData to Study Travel Mode ChoiceCSV DOC
EcdatMofaInternational Expansion of U.S. Mofa's (majority-owned Foreign Affiliates in Fire (finance, Insurance and Real Estate)CSV DOC
EcdatMrozLabor Supply DataCSV DOC
EcdatMunExpMunicipal Expenditure DataCSV DOC
EcdatNaturalParkWillingness to Pay for the Preservation of the Alentejo Natural ParkCSV DOC
EcdatNerloveCost Function for Electricity Producers, 1955CSV DOC
EcdatOFPVisits to Physician OfficeCSV DOC
EcdatOilOil InvestmentCSV DOC
EcdatPSIDPanel Survey of Income DynamicsCSV DOC
EcdatParticipationLabor Force ParticipationCSV DOC
EcdatPatentsHGHDynamic Relation Between Patents and R&DCSV DOC
EcdatPatentsRDPatents, R&D and Technological Spillovers for a Panel of FirmsCSV DOC
EcdatPoundPound-dollar Exchange RateCSV DOC
EcdatProducUs States ProductionCSV DOC
EcdatRetSchoolReturn to SchoolingCSV DOC
EcdatSP500Returns on Standard & Poor's 500 IndexCSV DOC
EcdatSchoolingWages and SchoolingCSV DOC
EcdatSomervilleVisits to Lake SomervilleCSV DOC
EcdatStarEffects on Learning of Small Class SizesCSV DOC
EcdatStrikeStrike Duration DataCSV DOC
EcdatStrikeDurStrikes DurationCSV DOC
EcdatStrikeNbNumber of Strikes in Us ManufacturingCSV DOC
EcdatSumHesThe Penn TableCSV DOC
EcdatTobaccoHouseholds Tobacco Budget ShareCSV DOC
EcdatTrainStated Preferences for Train TravelingCSV DOC
EcdatTranspEqStatewide Data on Transportation Equipment ManufacturingCSV DOC
EcdatTreatmentEvaluating Treatment Effect of Training on EarningsCSV DOC
EcdatTunaChoice of Brand for TunaCSV DOC
EcdatUSFinanceIndustryUS Finance Industry ProfitsCSV DOC
EcdatUSclassifiedDocumentsOfficial Secrecy of the United States GovernmentCSV DOC
EcdatUSstateAbbreviationsStandard abbreviations for states of the United StatesCSV DOC
EcdatUStaxWordsNumber of Words in US Tax LawCSV DOC
EcdatUnempDurUnemployment DurationCSV DOC
EcdatUnemploymentUnemployment DurationCSV DOC
EcdatUniversityProvision of University Teaching and ResearchCSV DOC
EcdatVietNamHMedical Expenses in Viet-nam (household Level)CSV DOC
EcdatVietNamIMedical Expenses in Viet-nam (individual Level)CSV DOC
EcdatWagesPanel Datas of Individual WagesCSV DOC
EcdatWages1Wages, Experience and SchoolingCSV DOC
EcdatWorkinghoursWife Working HoursCSV DOC
EcdatYenYen-dollar Exchange RateCSV DOC
EcdatYogurtChoice of Brand for YogurtsCSV DOC
EcdatbankingCrisesCountries in Banking CrisesCSV DOC
EcdatbreachesCyber Security BreachesCSV DOC
EcdatincomeInequalityIncome Inequality in the USCSV DOC
EcdatnonEnglishNamesNames with Character Set ProblemsCSV DOC
EcdatpoliticalKnowledgePolitical knowledge in the US and EuropeCSV DOC
gapPDA study of Parkinson's disease and APOE, LRRK2, SNCA makersCSV DOC
gapaldh2ALDH2 markers and AlcoholismCSV DOC
gapapoeapocAPOE/APOC1 markers and Alzheimer'sCSV DOC
gapcfCystic fibrosis dataCSV DOC
gapcrohnCrohn's disease dataCSV DOC
gapfaFriedreich Ataxia dataCSV DOC
gapfsnpsA case-control data involving four SNPs with missing genotypeCSV DOC
gaphlaThe HLA dataCSV DOC
gaphr1420An example data for Manhattan plot with annotationCSV DOC
gapl51An example pedigree dataCSV DOC
gaplukasAn example pedigreeCSV DOC
gapmaoA study of Parkinson's disease and MAO geneCSV DOC
gapmeyerA pedigree data on 282 animals deriving from two generationsCSV DOC
gapmfblongExample data for ACEnucfamCSV DOC
gapmhtdataAn example data for Manhattan plotCSV DOC
gapnep499A study of Alzheimer's disease with eight SNPs and APOECSV DOC
ggplot2luv_colours'colors()' in Luv space.CSV DOC
HistDataArbuthnotArbuthnot's data on male and female birth ratios in London from 1629-1710.CSV DOC
HistDataArmadaLa Felicisima ArmadaCSV DOC
HistDataBowleyBowley's data on values of British and Irish trade, 1855-1899CSV DOC
HistDataCavendishCavendish's Determinations of the Density of the EarthCSV DOC
HistDataChestSizesChest measurements of 5738 Scottish MilitiamenCSV DOC
HistDataCushnyPeeblesCushny-Peebles Data: Soporific Effects of Scopolamine DerivativesCSV DOC
HistDataCushnyPeeblesNCushny-Peebles Data: Soporific Effects of Scopolamine DerivativesCSV DOC
HistDataDactylEdgeworth's counts of dactyls in Virgil's AeneidCSV DOC
HistDataDrinksWagesElderton and Pearson's (1910) data on drinking and wagesCSV DOC
HistDataFingerprintsWaite's data on Patterns in FingerprintsCSV DOC
HistDataGaltonGalton's data on the heights of parents and their childrenCSV DOC
HistDataGaltonFamiliesGalton's data on the heights of parents and their children, by childCSV DOC
HistDataGuerryData from A.-M. Guerry, "Essay on the Moral Statistics of France"CSV DOC
HistDataJevonsW. Stanley Jevons' data on numerical discriminationCSV DOC
HistDataLangren.allvan Langren's Data on Longitude Distance between Toledo and RomeCSV DOC
HistDataLangren1644van Langren's Data on Longitude Distance between Toledo and RomeCSV DOC
HistDataMacdonellMacdonell's Data on Height and Finger Length of Criminals, used by Gosset (1908)CSV DOC
HistDataMacdonellDFMacdonell's Data on Height and Finger Length of Criminals, used by Gosset (1908)CSV DOC
HistDataMichelsonMichelson's Determinations of the Velocity of LightCSV DOC
HistDataMichelsonSetsMichelson's Determinations of the Velocity of LightCSV DOC
HistDataMinard.citiesData from Minard's famous graphic map of Napoleon's march on MoscowCSV DOC
HistDataMinard.tempData from Minard's famous graphic map of Napoleon's march on MoscowCSV DOC
HistDataMinard.troopsData from Minard's famous graphic map of Napoleon's march on MoscowCSV DOC
HistDataNightingaleFlorence Nightingale's data on deaths from various causes in the Crimean WarCSV DOC
HistDataOldMapsLatitudes and Longitudes of 39 Points in 11 Old MapsCSV DOC
HistDataPearsonLeePearson and Lee's data on the heights of parents and children classified by genderCSV DOC
HistDataPolioTrialsPolio Field Trials DataCSV DOC
HistDataProstitutesParent-Duchatelet's time-series data on the number of prostitutes in ParisCSV DOC
HistDataPyxTrial of the PyxCSV DOC
HistDataQuarrelsStatistics of Deadly QuarrelsCSV DOC
HistDataSnow.deathsJohn Snow's map and data on the 1854 London Cholera outbreakCSV DOC
HistDataSnow.deaths2John Snow's map and data on the 1854 London Cholera outbreakCSV DOC
HistDataSnow.polygonsJohn Snow's map and data on the 1854 London Cholera outbreakCSV DOC
HistDataSnow.pumpsJohn Snow's map and data on the 1854 London Cholera outbreakCSV DOC
HistDataSnow.streetsJohn Snow's map and data on the 1854 London Cholera outbreakCSV DOC
HistDataWheatPlayfair's Data on Wages and the Price of WheatCSV DOC
HistDataWheat.monarchsPlayfair's Data on Wages and the Price of WheatCSV DOC
HistDataYeastStudent's (1906) Yeast Cell CountsCSV DOC
HistDataYeastD.matStudent's (1906) Yeast Cell CountsCSV DOC
HistDataZeaMaysDarwin's Heights of Cross- and Self-fertilized Zea May PairsCSV DOC
latticebarleyYield data from a Minnesota barley trialCSV DOC
latticeenvironmentalAtmospheric environmental conditions in New York CityCSV DOC
latticeethanolEngine exhaust fumes from burning ethanolCSV DOC
latticemelanomaMelanoma skin cancer incidenceCSV DOC
latticesingerHeights of New York Choral Society singersCSV DOC
MASSAids2Australian AIDS Survival DataCSV DOC
MASSAnimalsBrain and Body Weights for 28 SpeciesCSV DOC
MASSBostonHousing Values in Suburbs of BostonCSV DOC
MASSCars93Data from 93 Cars on Sale in the USA in 1993CSV DOC
MASSCushingsDiagnostic Tests on Patients with Cushing's SyndromeCSV DOC
MASSDDTDDT in KaleCSV DOC
MASSGAGurineLevel of GAG in Urine of ChildrenCSV DOC
MASSInsuranceNumbers of Car Insurance claimsCSV DOC
MASSMelanomaSurvival from Malignant MelanomaCSV DOC
MASSOMETests of Auditory Perception in Children with OMECSV DOC
MASSPima.teDiabetes in Pima Indian WomenCSV DOC
MASSPima.trDiabetes in Pima Indian WomenCSV DOC
MASSPima.tr2Diabetes in Pima Indian WomenCSV DOC
MASSRabbitBlood Pressure in RabbitsCSV DOC
MASSRubberAccelerated Testing of Tyre RubberCSV DOC
MASSSP500Returns of the Standard and Poors 500CSV DOC
MASSSitkaGrowth Curves for Sitka Spruce Trees in 1988CSV DOC
MASSSitka89Growth Curves for Sitka Spruce Trees in 1989CSV DOC
MASSSkyeAFM Compositions of Aphyric Skye LavasCSV DOC
MASSTrafficEffect of Swedish Speed Limits on AccidentsCSV DOC
MASSUScerealNutritional and Marketing Information on US CerealsCSV DOC
MASSUScrimeThe Effect of Punishment Regimes on Crime RatesCSV DOC
MASSVAVeteran's Administration Lung Cancer TrialCSV DOC
MASSabbeyDeterminations of Nickel ContentCSV DOC
MASSaccdeathsAccidental Deaths in the US 1973-1978CSV DOC
MASSanorexiaAnorexia Data on Weight ChangeCSV DOC
MASSbacteriaPresence of Bacteria after Drug TreatmentsCSV DOC
MASSbeav1Body Temperature Series of Beaver 1CSV DOC
MASSbeav2Body Temperature Series of Beaver 2CSV DOC
MASSbiopsyBiopsy Data on Breast Cancer PatientsCSV DOC
MASSbirthwtRisk Factors Associated with Low Infant Birth WeightCSV DOC
MASScabbagesData from a cabbage field trialCSV DOC
MASScaithColours of Eyes and Hair of People in CaithnessCSV DOC
MASScatsAnatomical Data from Domestic CatsCSV DOC
MASScementHeat Evolved by Setting CementsCSV DOC
MASSchemCopper in Wholemeal FlourCSV DOC
MASScoopCo-operative Trial in Analytical ChemistryCSV DOC
MASScpusPerformance of Computer CPUsCSV DOC
MASScrabsMorphological Measurements on Leptograpsus CrabsCSV DOC
MASSdeathsMonthly Deaths from Lung Diseases in the UKCSV DOC
MASSdriversDeaths of Car Drivers in Great Britain 1969-84CSV DOC
MASSeaglesForaging Ecology of Bald EaglesCSV DOC
MASSepilSeizure Counts for EpilepticsCSV DOC
MASSfarmsEcological Factors in Farm ManagementCSV DOC
MASSfglMeasurements of Forensic Glass FragmentsCSV DOC
MASSforbesForbes' Data on Boiling Points in the AlpsCSV DOC
MASSgalaxiesVelocities for 82 GalaxiesCSV DOC
MASSgehanRemission Times of Leukaemia PatientsCSV DOC
MASSgenotypeRat Genotype DataCSV DOC
MASSgeyserOld Faithful Geyser DataCSV DOC
MASSgilgaisLine Transect of Soil in Gilgai TerritoryCSV DOC
MASShillsRecord Times in Scottish Hill RacesCSV DOC
MASShousingFrequency Table from a Copenhagen Housing Conditions SurveyCSV DOC
MASSimmerYields from a Barley Field TrialCSV DOC
MASSleukSurvival Times and White Blood Counts for Leukaemia PatientsCSV DOC
MASSmammalsBrain and Body Weights for 62 Species of Land MammalsCSV DOC
MASSmcycleData from a Simulated Motorcycle AccidentCSV DOC
MASSmenarcheAge of Menarche in WarsawCSV DOC
MASSmichelsonMichelson's Speed of Light DataCSV DOC
MASSminn38Minnesota High School Graduates of 1938CSV DOC
MASSmotorsAccelerated Life Testing of MotorettesCSV DOC
MASSmuscleEffect of Calcium Chloride on Muscle Contraction in Rat HeartsCSV DOC
MASSnewcombNewcomb's Measurements of the Passage Time of LightCSV DOC
MASSnlschoolsEighth-Grade Pupils in the NetherlandsCSV DOC
MASSnpkClassical N, P, K Factorial ExperimentCSV DOC
MASSnpr1US Naval Petroleum Reserve No. 1 dataCSV DOC
MASSoatsData from an Oats Field TrialCSV DOC
MASSpaintersThe Painter's Data of de PilesCSV DOC
MASSpetrolN. L. Prater's Petrol Refinery DataCSV DOC
MASSquineAbsenteeism from School in Rural New South WalesCSV DOC
MASSroadRoad Accident Deaths in US StatesCSV DOC
MASSrotiferNumbers of Rotifers by Fluid DensityCSV DOC
MASSshipsShips Damage DataCSV DOC
MASSshrimpPercentage of Shrimp in Shrimp CocktailCSV DOC
MASSshuttleSpace Shuttle Autolander ProblemCSV DOC
MASSsnailsSnail Mortality DataCSV DOC
MASSsteamThe Saturated Steam Pressure DataCSV DOC
MASSstormerThe Stormer Viscometer DataCSV DOC
MASSsurveyStudent Survey DataCSV DOC
MASSsynth.teSynthetic Classification ProblemCSV DOC
MASSsynth.trSynthetic Classification ProblemCSV DOC
MASStopoSpatial Topographic DataCSV DOC
MASSwadersCounts of Waders at 15 Sites in South AfricaCSV DOC
MASSwhitesideHouse Insulation: Whiteside's DataCSV DOC
MASSwtlossWeight Loss Data from an Obese PatientCSV DOC
plmCigarCigarette ConsumptionCSV DOC
plmCrimeCrime in North CarolinaCSV DOC
plmEmplUKEmployment and Wages in the United KingdomCSV DOC
plmGasolineGasoline ConsumptionCSV DOC
plmGrunfeldGrunfeld's Investment DataCSV DOC
plmHedonicHedonic Prices of Census Tracts in the Boston AreaCSV DOC
plmLaborSupplyWages and Hours WorkedCSV DOC
plmMalesWages and Education of Young MalesCSV DOC
plmParityPurchasing Power Parity and other parity relationshipsCSV DOC
plmProducUS States ProductionCSV DOC
plmRiceFarmsProduction of Rice in IndiaCSV DOC
plmSnmespEmployment and Wages in SpainCSV DOC
plmSumHesThe Penn World Table, v. 5CSV DOC
plmWagesPanel Data of Individual WagesCSV DOC
plyrbaseballYearly batting records for all major league baseball playersCSV DOC
psclAustralianElectionPollingPolitical opinion polls in Australia, 2004-07CSV DOC
psclAustralianElectionselections to Australian House of Representatives, 1949-2007CSV DOC
psclEfronMorrisBatting Averages for 18 major league baseball players, 1970CSV DOC
psclRockTheVoteVoter turnout experiment, using Rock The Vote adsCSV DOC
psclUKHouseOfCommons1992 United Kingdom electoral returnsCSV DOC
psclabsenteeAbsentee and Machine Ballots in Pennsylvania State Senate RacesCSV DOC
pscladmitApplications to a Political Science PhD ProgramCSV DOC
psclbioChemistsarticle production by graduate students in biochemistry Ph.D. programsCSV DOC
psclca2006California Congressional Districts in 2006CSV DOC
pscliraqVoteU.S. Senate vote on the use of force against Iraq, 2002.CSV DOC
psclpoliticalInformationInterviewer ratings of respondent levels of political informationCSV DOC
psclpresidentialElectionselections for U.S. President, 1932-2012, by stateCSV DOC
psclprussianPrussian army horse kick dataCSV DOC
psclunionDensitycross national rates of trade union densityCSV DOC
psclvote92Reports of voting in the 1992 U.S. Presidential election.CSV DOC
reshape2french_friesSensory data from a french fries experiment.CSV DOC
reshape2smithsDemo data describing the Smiths.CSV DOC
reshape2tipsTipping dataCSV DOC
rpartcar.test.frameAutomobile Data from 'Consumer Reports' 1990CSV DOC
rpartcar90Automobile Data from 'Consumer Reports' 1990CSV DOC
rpartcu.summaryAutomobile Data from 'Consumer Reports' 1990CSV DOC
rpartkyphosisData on Children who have had Corrective Spinal SurgeryCSV DOC
rpartsolderSoldering of Components on Printed-Circuit BoardsCSV DOC
rpartstagecStage C Prostate CancerCSV DOC
sandwichPublicSchoolsUS Expenditures for Public SchoolsCSV DOC
semBollenBollen's Data on Industrialization and Political DemocracyCSV DOC
semCNESVariables from the 1997 Canadian National Election StudyCSV DOC
semKleinKlein's Data on the U. S. EconomyCSV DOC
semKmentaPartly Artificial Data on the U. S. EconomyCSV DOC
semTestsSix Mental TestsCSV DOC
survivalbladderBladder Cancer RecurrencesCSV DOC
survivalcancerNCCTG Lung Cancer DataCSV DOC
survivalcgdChronic Granulotomous Disease dataCSV DOC
survivalcolonChemotherapy for Stage B/C colon cancerCSV DOC
survivalflchainAssay of serum free light chain for 7874 subjects.CSV DOC
survivalgenfanGenerator fansCSV DOC
survivalheartStanford Heart Transplant dataCSV DOC
survivalkidneyKidney catheter dataCSV DOC
survivalleukemiaAcute Myelogenous Leukemia survival dataCSV DOC
survivalloganData from the 1972-78 GSS data used by LoganCSV DOC
survivallungNCCTG Lung Cancer DataCSV DOC
survivalmgusMonoclonal gammapothy dataCSV DOC
survivalmgus2Monoclonal gammapothy dataCSV DOC
survivalnwtcoData from the National Wilm's Tumor StudyCSV DOC
survivalovarianOvarian Cancer Survival DataCSV DOC
survivalpbcMayo Clinic Primary Biliary Cirrhosis DataCSV DOC
survivalratsRat treatment data from Mantel et alCSV DOC
survivalretinopathyDiabetic RetinopathyCSV DOC
survivalstanford2More Stanford Heart Transplant dataCSV DOC
survivaltobinTobin's Tobit dataCSV DOC
survivaltransplantLiver transplant waiting listCSV DOC
survivalveteranVeterans' Administration Lung Cancer studyCSV DOC
vcdArthritisArthritis Treatment DataCSV DOC
vcdBaseballBaseball DataCSV DOC
vcdBrokenMarriageBroken Marriage DataCSV DOC
vcdBundesligaErgebnisse der Fussball-BundesligaCSV DOC
vcdBundestag2005Votes in German Bundestag Election 2005CSV DOC
vcdButterflyButterfly Species in MalayaCSV DOC
vcdCoalMinersBreathlessness and Wheeze in Coal MinersCSV DOC
vcdDanishWelfareDanish Welfare Study DataCSV DOC
vcdEmploymentEmployment StatusCSV DOC
vcdFederalist'May' in Federalist PapersCSV DOC
vcdHittersHitters DataCSV DOC
vcdHorseKicksDeath by Horse KicksCSV DOC
vcdHospitalHospital dataCSV DOC
vcdJobSatisfactionJob Satisfaction DataCSV DOC
vcdJointSportsOpinions About Joint SportsCSV DOC
vcdLifeboatsLifeboats on the TitanicCSV DOC
vcdNonResponseNon-Response Survey DataCSV DOC
vcdOvaryCancerOvary Cancer DataCSV DOC
vcdPreSexPre-marital Sex and DivorceCSV DOC
vcdPunishmentCorporal Punishment DataCSV DOC
vcdRepVictRepeat Victimization DataCSV DOC
vcdSaxonyFamilies in SaxonyCSV DOC
vcdSexualFunSex is FunCSV DOC
vcdSpaceShuttleSpace Shuttle O-ring FailuresCSV DOC
vcdSuicideSuicide Rates in GermanyCSV DOC
vcdTrucksTruck Accidents DataCSV DOC
vcdUKSoccerUK Soccer ScoresCSV DOC
vcdVisualAcuityVisual Acuity in Left and Right EyesCSV DOC
vcdVonBortVon Bortkiewicz Horse Kicks DataCSV DOC
vcdWeldonDiceWeldon's Dice DataCSV DOC
vcdWomenQueueWomen in QueuesCSV DOC
ZeligMatchIt.urlTable of links for ZeligCSV DOC
ZeligPEriskPolitical Economic Risk Data from 62 Countries in 1987CSV DOC
ZeligSupremeCourtU.S. Supreme Court Vote MatrixCSV DOC
ZeligWeimar1932 Weimar election dataCSV DOC
ZeligZelig.urlTable of links for ZeligCSV DOC
ZeligapprovalU.S. Presidential Approval DataCSV DOC
ZeligbivariateSample data for bivariate probit regressionCSV DOC
ZeligcoalitionCoalition Dissolution in Parliamentary DemocraciesCSV DOC
Zeligcoalition2Coalition Dissolution in Parliamentary Democracies, Modified VersionCSV DOC
ZeligeidatSimulation Data for Ecological InferenceCSV DOC
Zeligfree1Freedom of Speech DataCSV DOC
Zeligfree2Freedom of Speech DataCSV DOC
ZeligfriendshipSimulated Example of Schoolchildren Friendship NetworkCSV DOC
ZeliggrunfeldSimulation Data for model Seemingly Unrelated Regression (sur) that corresponds to method SUR of systemfitCSV DOC
ZelighoffSocial Security Expenditure DataCSV DOC
ZelighomerunSample Data on Home Runs Hit By Mark McGwire and Sammy Sosa in 1998.CSV DOC
Zeligimmi1Individual Preferences Over Immigration PolicyCSV DOC
Zeligimmi2Individual Preferences Over Immigration PolicyCSV DOC
Zeligimmi3Individual Preferences Over Immigration PolicyCSV DOC
Zeligimmi4Individual Preferences Over Immigration PolicyCSV DOC
Zeligimmi5Individual Preferences Over Immigration PolicyCSV DOC
ZeligimmigrationIndividual Preferences Over Immigration PolicyCSV DOC
ZeligkleinSimulation Data for model Two-Stage Least Square (twosls) that corresponds to method 2SLS of systemfitCSV DOC
ZeligkmentaSimulation Data for model Three-Stage Least Square (threesls) that corresponds to method 3SLS of systemfitCSV DOC
ZeligmacroMacroeconomic DataCSV DOC
ZeligmexicoVoting Data from the 1988 Mexican Presidental ElectionCSV DOC
ZeligmidMilitarized Interstate DisputesCSV DOC
ZelignewpaintersThe Discretized Painter's Data of de PilesCSV DOC
ZeligsanctionMultilateral Economic SanctionsCSV DOC
ZeligseatshareLeft Party Seat Share in 11 OECD CountriesCSV DOC
Zeligsna.exSimulated Example of Social Network DataCSV DOC
ZeligswissSwiss Fertility and Socioeconomic Indicators (1888) DataCSV DOC
ZeligtobinTobin's Tobit DataCSV DOC
ZeligturnoutTurnout Data Set from the National Election SurveyCSV DOC
ZeligvoteincomeSample Turnout and Demographic Data from the 2000 Current Population SurveyCSV DOC
HSAURBCGBCG Vaccine DataCSV DOC
HSAURBtheBBeat the Blues DataCSV DOC
HSAURCYGOB1CYG OB1 Star Cluster DataCSV DOC
HSAURForbes2000The Forbes 2000 Ranking of the World's Biggest Companies (Year 2004)CSV DOC
HSAURGHQGeneral Health QuestionnaireCSV DOC
HSAURLanzaPrevention of Gastointestinal DamagesCSV DOC
HSAURagefatTotal Body Composision DataCSV DOC
HSAURaspirinAspirin DataCSV DOC
HSAURbirthdeathratesBirth and Death Rates DataCSV DOC
HSAURbladdercancerBladder Cancer DataCSV DOC
HSAURcloudsCloud Seeding DataCSV DOC
HSAURepilepsyEpilepsy DataCSV DOC
HSAURfosterFoster Feeding ExperimentCSV DOC
HSAURheptathlonOlympic Heptathlon Seoul 1988CSV DOC
HSAURmastectomySurvival Times after Mastectomy of Breast Cancer PatientsCSV DOC
HSAURmeteoMeteorological Measurements for 11 YearsCSV DOC
HSAURorallesionsOral Lesions in Rural IndiaCSV DOC
HSAURphosphatePhosphate Level DataCSV DOC
HSAURpistonringsPiston Rings FailuresCSV DOC
HSAURplanetsExoplanets DataCSV DOC
HSAURplasmaBlood Screening DataCSV DOC
HSAURpolypsFamilial Andenomatous PolyposisCSV DOC
HSAURpolyps3Familial Andenomatous PolyposisCSV DOC
HSAURpotteryRomano-British Pottery DataCSV DOC
HSAURrearrestsRearrests of Juvenile FelonsCSV DOC
HSAURrespiratoryRespiratory Illness DataCSV DOC
HSAURroomwidthStudents Estimates of Lecture Room WidthCSV DOC
HSAURschizophreniaAge of Onset of Schizophrenia DataCSV DOC
HSAURschizophrenia2Schizophrenia DataCSV DOC
HSAURschooldaysDays not Spent at SchoolCSV DOC
HSAURskullsEgyptian SkullsCSV DOC
HSAURsmokingNicotine Gum and Smoking CessationCSV DOC
HSAURstudentsStudent Risk TakingCSV DOC
HSAURsuicidesCrowd Baiting Behaviour and SuicidesCSV DOC
HSAURtoothpasteToothpaste DataCSV DOC
HSAURvotingHouse of Representatives Voting DataCSV DOC
HSAURwaterMortality and Water HardnessCSV DOC
HSAURwatervolesWater Voles DataCSV DOC
HSAURwavesElectricity from Wave Power at SeaCSV DOC
HSAURweightgainGain in Weight of RatsCSV DOC
HSAURwomensroleWomens Role in SocietyCSV DOC
psychBechtoldtSeven data sets showing a bifactor solution.CSV DOC
psychBechtoldt.1Seven data sets showing a bifactor solution.CSV DOC
psychBechtoldt.2Seven data sets showing a bifactor solution.CSV DOC
psychDwyer8 cognitive variables used by Dwyer for an example.CSV DOC
psychGleserExample data from Gleser, Cronbach and Rajaratnam (1965) to show basic principles of generalizability theory.CSV DOC
psychGorsuchExample data set from Gorsuch (1997) for an example factor extension.CSV DOC
psychHarman.55 socio-economic variables from Harman (1967)CSV DOC
psychHarman.8Correlations of eight physical variables (from Harman, 1966)CSV DOC
psychHarman.politicalEight political variables used by Harman (1967) as example 8.17CSV DOC
psychHolzingerSeven data sets showing a bifactor solution.CSV DOC
psychHolzinger.9Seven data sets showing a bifactor solution.CSV DOC
psychReiseSeven data sets showing a bifactor solution.CSV DOC
psychSchmid12 variables created by Schmid and Leiman to show the Schmid-Leiman TransformationCSV DOC
psychThurstoneSeven data sets showing a bifactor solution.CSV DOC
psychThurstone.33Seven data sets showing a bifactor solution.CSV DOC
psychTucker9 Cognitive variables discussed by Tucker and Lewis (1973)CSV DOC
psychability16 ability items scored as correct or incorrect.CSV DOC
psychaffectTwo data sets of affect and arousal scores as a function of personality and movie conditionsCSV DOC
psychbfi25 Personality items representing 5 factorsCSV DOC
psychbfi.dictionary25 Personality items representing 5 factorsCSV DOC
psychblotBond's Logical Operations Test - BLOTCSV DOC
psychburt11 emotional variables from Burt (1915)CSV DOC
psychcitiesDistances between 11 US citiesCSV DOC
psychcubitsGalton's example of the relationship between height and 'cubit' or forearm lengthCSV DOC
psychcushnyA data set from Cushny and Peebles (1905) on the effect of three drugs on hours of sleep, used by Student (1908)CSV DOC
psychepiEysenck Personality Inventory (EPI) data for 3570 participantsCSV DOC
psychepi.bfi13 personality scales from the Eysenck Personality Inventory and Big 5 inventoryCSV DOC
psychepi.dictionaryEysenck Personality Inventory (EPI) data for 3570 participantsCSV DOC
psychgaltonGalton's Mid parent child height dataCSV DOC
psychheightsA data.frame of the Galton (1888) height and cubit data set.CSV DOC
psychincomeUS family income from US census 2008CSV DOC
psychiqitems16 multiple choice IQ itemsCSV DOC
psychmsq75 mood items from the Motivational State Questionnaire for 3896 participantsCSV DOC
psychneoNEO correlation matrix from the NEO_PI_R manualCSV DOC
psychpeasGalton's PeasCSV DOC
psychsat.act3 Measures of ability: SATV, SATQ, ACTCSV DOC
psychwithinBetweenAn example of the distinction between within group and between group correlationsCSV DOC
quantregBoscoBoscovich DataCSV DOC
quantregCobarOreCobar Ore dataCSV DOC
quantregMammalsGarland(1983) Data on Running Speed of MammalsCSV DOC
quantregbarroBarro DataCSV DOC
quantregengelEngel DataCSV DOC
quantreggaspriceTime Series of US Gasoline PricesCSV DOC
quantreguisUIS Drug Treatment study dataCSV DOC
geepackdietoxGrowth curves of pigs in a 3x3 factorial experimentCSV DOC
geepackkochOrdinal Data from KochCSV DOC
geepackohioOhio Children Wheeze StatusCSV DOC
geepackrespdisClustered Ordinal Respiratory DisorderCSV DOC
geepackrespiratoryData from a clinical trial comparing two treatments for a respiratory illnessCSV DOC
geepackseizureEpiliptic SeizuresCSV DOC
geepacksitka89Growth of Sitka Spruce TreesCSV DOC
geepackspruceLog-size of 79 Sitka spruce treesCSV DOC
texmexliverLiver related laboratory dataCSV DOC
texmexportpirieRain, wavesurge and portpirie datasets.CSV DOC
texmexrainRain, wavesurge and portpirie datasets.CSV DOC
texmexsummerAir pollution data, separately for summer and winter monthsCSV DOC
texmexwavesurgeRain, wavesurge and portpirie datasets.CSV DOC
texmexwinterAir pollution data, separately for summer and winter monthsCSV DOC
multgeearthritisRheumatoid Arthritis Clinical TrialCSV DOC
multgeehousingHomeless DataCSV DOC
evirbmwDaily Log Returns on BMW Share PriceCSV DOC
evirdanishDanish Fire Insurance ClaimsCSV DOC
evirnidd.annualThe River Nidd DataCSV DOC
evirnidd.threshThe River Nidd DataCSV DOC
evirsiemensDaily Log Returns on Siemens Share PriceCSV DOC
evirsp.rawSP Data to June 1993CSV DOC
evirspto87SP Return Data to October 1987CSV DOC
lme4ArabidopsisArabidopsis clipping/fertilization dataCSV DOC
lme4DyestuffYield of dyestuff by batchCSV DOC
lme4Dyestuff2Yield of dyestuff by batchCSV DOC
lme4InstEvalUniversity Lecture/Instructor Evaluations by Students at ETHCSV DOC
lme4PastesPaste strength by batch and caskCSV DOC
lme4PenicillinVariation in penicillin testingCSV DOC
lme4VerbAggVerbal Aggression item responsesCSV DOC
lme4cakeBreakage Angle of Chocolate CakesCSV DOC
lme4cbppContagious bovine pleuropneumoniaCSV DOC
lme4grouseticksData on red grouse ticks from Elston et al. 2001CSV DOC
lme4sleepstudyReaction times in a sleep deprivation studyCSV DOC
mosaicDataAlcoholAlcohol Consumption per CapitaCSV DOC
mosaicDataBirthdaysUS Births in 1969 - 1988CSV DOC
mosaicDataBirthsUS BirthsCSV DOC
mosaicDataBirths78US Births in 1978CSV DOC
mosaicDataCPS85Data from the 1985 Current Population Survey (CPS85)CSV DOC
mosaicDataCoolingWaterCoolingWaterCSV DOC
mosaicDataCountriesCountriesCSV DOC
mosaicDataDimesWeight of dimesCSV DOC
mosaicDataGaltonGalton's dataset of parent and child heightsCSV DOC
mosaicDataGestationData from the Child Health and Development StudiesCSV DOC
mosaicDataGoosePermitsGoose Permit StudyCSV DOC
mosaicDataHELPfullHealth Evaluation and Linkage to Primary CareCSV DOC
mosaicDataHELPmissHealth Evaluation and Linkage to Primary CareCSV DOC
mosaicDataHELPrctHealth Evaluation and Linkage to Primary CareCSV DOC
mosaicDataHeatXData from a heat exchanger laboratoryCSV DOC
mosaicDataKidsFeetFoot measurements in childrenCSV DOC
mosaicDataMarriageMarriage recordsCSV DOC
mosaicDataMitesMites and Wilt DiseaseCSV DOC
mosaicDataRailTrailVolume of Users of a Rail TrailCSV DOC
mosaicDataRidersVolume of Users of a Massachusetts Rail TrailCSV DOC
mosaicDataSATState by State SAT dataCSV DOC
mosaicDataSaratogaHousesHouses in Saratoga County (2006)CSV DOC
mosaicDataSnowGRSnowfall data for Grand Rapids, MICSV DOC
mosaicDataSwimRecords100 m Swimming World RecordsCSV DOC
mosaicDataTenMileRaceCherry Blossom RaceCSV DOC
mosaicDataUtilitiesUtility billsCSV DOC
mosaicDataUtilities2Utility billsCSV DOC
mosaicDataWhickhamData from the Whickham surveyCSV DOC
ISLRAutoAuto Data SetCSV DOC
ISLRCaravanThe Insurance Company (TIC) BenchmarkCSV DOC
ISLRCarseatsSales of Child Car SeatsCSV DOC
ISLRCollegeU.S. News and World Report's College DataCSV DOC
ISLRDefaultCredit Card Default DataCSV DOC
ISLRHittersBaseball DataCSV DOC
ISLROJOrange Juice DataCSV DOC
ISLRPortfolioPortfolio DataCSV DOC
ISLRSmarketS&P Stock Market DataCSV DOC
ISLRWageMid-Atlantic Wage DataCSV DOC
ISLRWeeklyWeekly S&P Stock Market DataCSV DOC
Stat2DataAlfalfaAlfalfaCSV DOC
Stat2DataArcheryDataArcheryDataCSV DOC
Stat2DataAutoPollutionAutoPollutionCSV DOC
Stat2DataBackpackBackpackCSV DOC
Stat2DataBaseballTimesBaseballTimesCSV DOC
Stat2DataBeeStingsBeeStingsCSV DOC
Stat2DataBirdNestBirdNestCSV DOC
Stat2DataBlood1Blood1CSV DOC
Stat2DataBlueJaysBlue JaysCSV DOC
Stat2DataBritishUnionsBritishUnionsCSV DOC
Stat2DataCAFECAFECSV DOC
Stat2DataCO2CO2CSV DOC
Stat2DataCalciumBPCalciumBPCSV DOC
Stat2DataCancerSurvivalCancerSurvivalCSV DOC
Stat2DataCaterpillarsCaterpillarsCSV DOC
Stat2DataCerealCerealCSV DOC
Stat2DataChemoTHCChemoTHCCSV DOC
Stat2DataChildSpeaksChildSpeaksCSV DOC
Stat2DataClothingClothingCSV DOC
Stat2DataCloudSeedingCloud SeedingCSV DOC
Stat2DataCloudSeeding2Cloud Seeding 2CSV DOC
Stat2DataCrackerFiberCracker Fiber in DietsCSV DOC
Stat2DataCuckooCuckooCSV DOC
Stat2DataDay1SurveyDay1SurveyCSV DOC
Stat2DataDiamondsDiamondsCSV DOC
Stat2DataDiamonds2Diamonds2CSV DOC
Stat2DataElection08Election08CSV DOC
Stat2DataEthanolEthanolCSV DOC
Stat2DataFGByDistanceFGByDistanceCSV DOC
Stat2DataFantasyBaseballFantasyBaseballCSV DOC
Stat2DataFertilityFertilityCSV DOC
Stat2DataFilmFilmCSV DOC
Stat2DataFinalFourIzzoFinalFourIzzoCSV DOC
Stat2DataFinalFourLongFinalFourLongCSV DOC
Stat2DataFinalFourShortFinalFourShortCSV DOC
Stat2DataFingersFingersCSV DOC
Stat2DataFirstYearGPAFirstYearGPACSV DOC
Stat2DataFishEggsFishEggsCSV DOC
Stat2DataFlightResponseFlightResponseCSV DOC
Stat2DataFluorescenceFluorescenceCSV DOC
Stat2DataFruitFliesFruitFliesCSV DOC
Stat2DataGoldenrodGoldenrod GallsCSV DOC
Stat2DataGroceryGroceryCSV DOC
Stat2DataGunnelsGunnelsCSV DOC
Stat2DataHawkTailHawkTailCSV DOC
Stat2DataHawkTail2HawkTail2CSV DOC
Stat2DataHawksHawksCSV DOC
Stat2DataHearingTestHearingTestCSV DOC
Stat2DataHighPeaksHighPeaksCSV DOC
Stat2DataHoopsHoopsCSV DOC
Stat2DataHorsePricesHorsePricesCSV DOC
Stat2DataHousesHousesCSV DOC
Stat2DataICUICUCSV DOC
Stat2DataInfantMortalityInfantMortalityCSV DOC
Stat2DataInsuranceVoteInsuranceVoteCSV DOC
Stat2DataJurorsJurorsCSV DOC
Stat2DataKids198Kids198CSV DOC
Stat2DataLeafHoppersLeafHoppersCSV DOC
Stat2DataLeukemiaLeukemiaCSV DOC
Stat2DataLongJumpOlympicsLongJumpOlympicsCSV DOC
Stat2DataLostLetterLostLetterCSV DOC
Stat2DataMLB2007StandingsMLB2007StandingsCSV DOC
Stat2DataMarathonMarathonCSV DOC
Stat2DataMarketsMarketsCSV DOC
Stat2DataMathEnrollmentMath EnrollmentsCSV DOC
Stat2DataMathPlacementMath PlacementCSV DOC
Stat2DataMedGPAMedGPACSV DOC
Stat2DataMentalHealthMental Health AdmissionsCSV DOC
Stat2DataMetabolicRateMetabolic Rate of CaterpillarsCSV DOC
Stat2DataMetroHealth83MetroHealth83CSV DOC
Stat2DataMilgramMilgramCSV DOC
Stat2DataMothEggsMoth EggsCSV DOC
Stat2DataNCbirthsNCbirthsCSV DOC
Stat2DataNFL2007StandingsNFL2007StandingsCSV DOC
Stat2DataNursingNursingCSV DOC
Stat2DataOlivesOlivesCSV DOC
Stat2DataOringsOringsCSV DOC
Stat2DataOverdrawnOverdrawnCSV DOC
Stat2DataPalmBeachPalmBeachCSV DOC
Stat2DataPedometerPedometerCSV DOC
Stat2DataPerchPerchCSV DOC
Stat2DataPigFeedPigFeedCSV DOC
Stat2DataPinesPinesCSV DOC
Stat2DataPoliticalPoliticalCSV DOC
Stat2DataPollster08Pollster08CSV DOC
Stat2DataPopcornPopcornCSV DOC
Stat2DataPorscheJaguarPorscheJaguarCSV DOC
Stat2DataPorschePricePorschePriceCSV DOC
Stat2DataPulsePulseCSV DOC
Stat2DataPutts1Putts1CSV DOC
Stat2DataPutts2Putts2CSV DOC
Stat2DataReligionGDPReligionGDPCSV DOC
Stat2DataRetirementRetirementCSV DOC
Stat2DataRiverElementsRiverElementsCSV DOC
Stat2DataRiverIronRiver IronCSV DOC
Stat2DataSATGPASAT scores and GPACSV DOC
Stat2DataSampleFGSampleFGCSV DOC
Stat2DataSandwichAntsSandwich AntsCSV DOC
Stat2DataSeaSlugsSea SlugsCSV DOC
Stat2DataSparrowsSparrowsCSV DOC
Stat2DataSpeciesAreaSpecies AreaCSV DOC
Stat2DataSpeedSpeedCSV DOC
Stat2DataSwahiliSwahiliCSV DOC
Stat2DataTMSTMSCSV DOC
Stat2DataTextPricesText PricesCSV DOC
Stat2DataThreeCarsThree CarsCSV DOC
Stat2DataTipJokeTip JokeCSV DOC
Stat2DataTitanicTitanicCSV DOC
Stat2DataTomlinsonRushLaDainian Tomlinson Rushing YardsCSV DOC
Stat2DataTwinsLungsTwinsLungsCSV DOC
Stat2DataUSstampsUSstampsCSV DOC
Stat2DataVoltsVoltsCSV DOC
Stat2DataWalkingBabiesWalkingBabiesCSV DOC
Stat2DataWeightLossIncentiveWeightLossIncentiveCSV DOC
Stat2DataWeightLossIncentive4WeightLossIncentive4CSV DOC
Stat2DataWeightLossIncentive7WeightLossIncentive7CSV DOC
Stat2DataWordMemoryWordMemoryCSV DOC
Stat2DataYouthRisk2007YouthRisk2007CSV DOC
Stat2DataYouthRisk2009YouthRisk2009CSV DOC

 

Source:  r-dir (r-directory)

https://r-dir.com/reference/datasets.html

List:

World Bank Data - Literally hundreds of datasets spanning many decades, sortable by topic or country. Data is downloadable in Excel or XML formats, or you can make API calls. This is an outstanding resource.

Gapminder - Hundreds of datasets on world health, economics, population, etc. All of it is viewable online within Google Docs, and downloadable as spreadsheets.

The Data Hub - Hosted by CKAN. Most of these datasets come from the government.

Datamob - List of public datasets.

Numbrary - Lists of datasets.

Kaggle - Kaggle is a site that hosts data mining competitions. Each competition provides a data set that's free for download.

SNAP - Stanford's Large Network Dataset Collection. This list has several datasets related to social networking. Lots of fun in here!

KONECT - The Koblenz Network Collection. Several datasets related to social networking & Wikipedia.

Million Song Dataset - This is a collection of audio features and metadata for a million contemporary popular music tracks.

Energy Information Administration - This site offers a number of datasets on energy production, consumption, sources, etc.

GeoDa Center - This is a collection of geospatial datasets offered by Arizona State Univerisity's Center for Geospatial Analysis & Computation.

Reddit Datasets - This last one isn't a dataset itself, but rather a social news site devoted to datasets. It's updated regularly with news about newly available datasets.

Quandl - This is a web-based front end to a number of public data sets. What's nice about this website is that it allows for the combination of data from a number of sources, and can export the data in a number of formats.

1,001 Datasets - This is a list of lists of datasets. There's not much organization here, but there really are a LOT of datasets. Dive in and have fun.

Yahoo! Webscope - A reference library of interesting and scientifically useful datasets for non-commercial use by academics and other scientists.

Time Series Data Library - Curated by Professor Rob Hyndman of Monash University in Australia, this is a collection of over 500 datasets containing time-series data, organized by category.

Awesome Public Datasets - Curated list of hundreds of public datasets, organized by topic.

Common Crawl - Massive dataset of billions of pages scraped from the web. The data itself is on Amazon Public Datasets, so its easy to load it into an EC2 instance there. The dataset is updated with a new scrape about once per month.

 

SOURCE: Amazon Public Datasets - Collection of datasets that are ready to be loaded into an EC2 instance.

A Multi-wavelength Infrared Atlas of the Galactic Plane Open Source tools were used to combine images from five major infrared surveys of the Galactic Plane, archived at the NASA/IPAC Infrared Science Archive (IRSA). The result is a 16-wavelength infrared Atlas of the Galactic Plane that coves the wavelength range 1 μm to 24 μm.

CCAFS-Climate Data High resolution climate data to help assess the impacts of climate change primarily on agriculture. These open access datasets of climate projections will help researchers make climate change impact assessments.

NASA NEX Three NASA NEX datasets are now available, including climate projections and satellite images of Earth.

Human Microbiome Project Human Microbiome Project Data Set

Enron Email Data Enron email data publicly released as part of FERC's Western Energy Markets investigation converted to industry standard formats by EDRM. The data set consists of 1,227,255 emails with 493,384 attachments covering 151 custodians. The email is provided in Microsoft PST, IETF MIME, and EDRM XML formats.

Japan Census Data Multiple data sets including: (1) Population Census of Japan (1995, 2000, 2005, 2010), (2) Establishment and Enterprise Census of Japan (1999, 2001, 2004, 2006), and (3) Economic Census of Japan (2009).

Apache Software Foundation Public Mail Archives  A collection of all publicly available Apache Software Foundation mail archives as of July 11, 2011

Freebase Simple Topic Dump A data dump of the basic identifying facts about every topic in Freebase

Freebase Quad Dump A data dump of all the current facts and assertions in Freebase

Wikipedia Page Traffic Statistic V3 This dataset contains a 150 GB sample of the data used to power trendingtopics.org. It includes a full 3 months of hourly page traffic statistics from Wikipedia (1/1/2011-3/31/2011).

Material Safety Data Sheets 230,000 Material Safety Data Sheets.

Million Song Dataset The Million Songs Collection is a collection of 28 datasets containing audio features and metadata for a million contemporary popular music tracks.

Million Song Sample Dataset This is a 10,000 song subset of audio features and metadata from the Million Songs collection - a collection of 28 datasets containing audio features and metadata for a million contemporary popular music tracks.

Marvel Universe Social Graph This dataset is an example of a social collaboration network based on the characters in The Marvel Universe, that is, the artificial world that takes place in the universe of the Marvel comic books.

Google Books Ngrams A data set containing Google Books n-gram corpora. This data set is freely available on Amazon S3 in a Hadoop friendly file format and is licensed under a Creative Commons Attribution 3.0 Unported License. The original dataset is available from http://books.google.com/ngrams/.

The WestburyLab USENET corpus The WestburyLab USENET corpus is an anonymized compilation of postings from 47,860 English-language newsgroups from 2005-2010.

1000 Genomes Project The 1000 Genomes Project, initiated in 2008, is an international public-private consortium that aims to build the most detailed map of human genetic variation available.

Wikipedia Traffic Statistics V2 Contains 16 months of hourly pageview statistics for all articles in Wikipedia

M-Lab dataset: Network Diagnostic Tool (NDT) NDT test results created through Measurement Lab (M-Lab) between February 2009 and September 2009

M-Lab dataset: Network Path and Application Diagnosis tool (NPAD) NPAD test results created through Measurement Lab (M-Lab) between February 2009 and September 2009

Petroleum Public Data Set (working Title) Public-domain data for the oil & gas industry, assembled from the contributions of participating agencies in the United States, Canada and around the world. This data provides industry stakeholders with an opportunity to focus their efforts on the analysis and interpretation of this data without concern for the trivial and time-consuming tasks of locating, downloading, reformatting and integrating the data prior to value-added work being performed.

Sloan Digital Sky Survey DR6 Subset The Sloan Digital Sky Survey is the most ambitious astronomical survey ever undertaken.

Wikipedia Page Traffic Statistics Contains 7 months of hourly pageview statistics for all articles in Wikipedia

Wikipedia XML Data A complete copy of all Wikimedia wikis, in the form of wikitext source and metadata embedded in XML.

Federal Reserve Economic Data - Fred Database of 20,059 U.S. economic time series.

Twilio/Wigle.net Street Vector Data Set Twilio/Wigle.net database of mapped US street names and address ranges.

Federal Contracts from the Federal Procurement Data Center (USASpending.gov) A data dump of all federal contracts from the Federal Procurement Data Center found at USASpending.gov.

University of Florida Sparse Matrix Collection The University of Florida Sparse Matrix Collection is a large, widely available, and actively growing set of sparse matrices that arise in real applications.

2008 TIGER/Line Shapefiles Census 2000 and Current United States shapefiles

Wikipedia Extraction (WEX) A processed dump of the English language Wikipedia

Business and Industry Summary Data US Business and Industry Summary Data

2003-2006 US Economic Data US Economic Data for years 2003 to 2006

Freebase Data Dump Freebase is an open database of the world's information, covering millions of topics in hundreds of categories

DBpedia 3.5.1 DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web
 

1980 US Census Data from the 1980 US Census

1990 US Census Data from the 1990 US Census

2000 US Census Data from the 2000 US Census

Transportation Databases Various transportation statistics

Labor Statistics Databases Various Labor Statistics

 

 

Source:  

List:

 

Source:  

List:

 

Source:  

List:

 

Enjoy! As mentioned above - 100% of this data is reposted - original source is in links - if I've missed any citations, please let me know and will fix

Interested in more content by this author?

Before you can comment, you need to sign-up or login

responded May 2014

(last followup May 2014)
Thanks sharing this it's a great resource.

responded May 2014

Post Author
thanks Lee!

responded March 2014

Post Author
edited April 2014
Greetings!

If you have a "list of list" that you think should be on this "list of list of lists" - please pop it in comments, or message me with the link and I'll add it!

Cheers!
Ryan

UC Irvine - 284 data sets as a service to the machine learning community. - http://archive.ics.uci.edu/ml/
0supporters
0followers
recommends

About this document

1001 Datasets and Data repositories ( List of lists of lists ) - rough list to compile - a rough lists of lists

Created: December 28, 2013

You might also like