Where to find a dataset
by Andrij David
2 min read

Categories

Tags

Data is at heart of every data science project. Here we have curated some of the most common places where you can find datasets for your research or personal project.

Google datasets search is a new search engine developed by Google. This service is freely available for use. It aims to help researchers locate online public data. You can access it at https://toolbox.google.com/datasetsearch

Baidu Research

Baidu Research

Baidu provides a wide range of datasets at no cost for research and personal uses. They propose data from the various business units like medical (annotated retinal fundus images), video highlight, scene Parsing which provides a set of tools and datasets for advanced autonomous driving research, and many more . . .

Public list on Github

On Github, you can find numerous repository listing public and open datasets.

Data world

Data.world is the platform for modern data teamwork. It has a modern catalog of data in a wide range of domain.

Kaggle

Kaggle is a data science community that hosts machine learning competitions. There are a variety of externally-contributed interesting data sets on the site. You can download data for either, but you have to sign up for Kaggle and accept the terms of service for the competition

AWS public datasets

Amazon makes large data sets available on its Amazon Web Services platform. You can download the data and work with it on your own computer, or analyze the data in the cloud using EC2 and Hadoop via EMR. You can read more about how the program works here.

Amazon has a page that lists all of the data sets for you to browse. You’ll need an AWS account, although Amazon gives you a free access tier for new accounts that will enable you to explore the data without being charged.

Google public datasets

Google lists all of the data sets on a page. You’ll need to sign up for a GCP account, but the first 1TB of queries you make is free.

UC Irvine Machine Learning Repository

UCI Machine Learning Repository is one of the oldest dataset repositories on the web. They currently maintain 22 datasets that are freely available without registration.

Quandl

Quandl provides financial data, economic data, and some alternative data. Some of the datasets are free but many others require purchase.

Government

Some government offers an official data portal. Those data can range from government budgets to school performance scores.

Data.gov makes it possible to download data from multiple US government agencies. For United Kingdom Data.gov.uk, for India Data.gov.in, for Europe https://open-data.europa.eu/.

Reddit /r/datasets

At the time of this writing, this subreddit has around 47K subscribers. It has a very active community with whom you can share, find, and discuss Datasets.


If you are aware of a place to find a dataset, feel free to suggest.