Where I Find My Data

October 5, 2025

A curated list of all the data sources that I use for my EDA, ML, and visualization experiments.

Hey there!

Throughout my studies, my professors have always emphasized the importance of choosing the right data source for any project, it can shape the entire outcome of a data science workflow.

Data often accounts for about 70% of a project’s predictive power, while the model contributes the remaining 30%. In other words, a strong dataset is crucial.

So, for this blog post, I decided to share some of the data sources I use most often in my projects, from classic repositories to a few niche finds.

  1. Kaggle Datasets - https://www.kaggle.com/datasets
  1. UC Irvine Machine Learning Repository - https://archive-beta.ics.uci.edu/
  1. Registry of Open Data on AWS - https://registry.opendata.aws/
  1. Google Dataset Search - https://datasetsearch.research.google.com/
  1. Microsoft Research Open Data - https://www.microsoft.com/en-us/research/tools/
  1. Github: awesome-public-datasets - https://github.com/awesomedata/awesome-public-datasets
  1. Open Government Data Platform (OGD) India - https://www.data.gov.in/
  1. U.S. Government’s Open Data - https://data.gov/
  1. OpenDataNI - https://www.opendatani.gov.uk/
  1. The official portal for European data - https://data.europa.eu/en
  1. Airbnb Data Portal - https://www.airroi.com/data-portal/

Data sources can make or break a project. Exploring them not only sparks new project ideas but also gives you a better sense of the data’s quality and context.

If you have a favorite source I missed, I’d love to hear about it!

~Vibhav