Member-only story
5 Data Sources for Data Engineering Projects
Developing your first data engineering portfolio
One of the most beneficial ways to increase your chances of being selected for a data engineering position is to add well-documented portfolio projects to your resume.
It not only improves your skills but also provides a tangible final product you can discuss with your recruiter.
We previously shared an article that looked over 5 examples of data engineering projects you could take on. But what if you want to make your own?
Where should you start?
Often, finding the right dataset is the hardest part of putting together an effective project.
To assist you in your journey of project development, this article will curate a list of data sources that can be utilized for data engineering. I will also outline a few ways that other developers have implemented these datasets for you to draw inspiration from.
1. San Francisco Open Data’s API
The first data source comes from San Francisco Open Data’s API. The local San Francisco government has done a tremendous job of tracking data from a large variety of publishing departments including Treasurer-Tax Collector, Airport (SFO), and the Municipal Transportation Agency, to name a few. An apt data engineering application of this data source was outlined by Ilya Galperin in which eviction trends were tracked by district, filing reason, neighborhood, and demographic.
This study was especially interesting because it was conducted over the months following COVID-19, allowing for exploratory analysis of how COVID-19 impacted various subsets of San Francisco. It will be years before we fully grasp the intricacies of the impact of the pandemic and how they differ across various locations and demographics. This project is one small step towards understanding the repercussions of COVID-19 and the disparity of the impact among various groups. Galperin…