Member-only story
Data Collection With API — For Beginners
A simple guide to leveraging APIs to obtain data using Python
The Application Programming Interface (API) has become a core component of many of the products and services we’ve become accustomed to using.
It is able to bolster relations between companies and clients. For companies, it is a convenient way to promote their own business to their clients while ensuring the security of their backend systems. For clients, APIs provide the means to access data that can be used to fuel their research or product development.
Here, I will give a brief overview of APIs and demonstrate how you can use this resource for your own data collection using Python.
Forms of Data Collection
Before discussing APIs, let’s quickly go over the options you have when it comes to procuring data.
1. Collecting your own data
This one seems like a no-brainer; if you want some data, why not collect your own? After all, no one understands your requirements better than you yourself. So, just get out there can start collecting, right?
Wrong.
For most cases, collecting your own data is an absurd notion. Procuring information of the required quantity and quality requires considerable time, money, manpower, and resources.
This is an infeasible (if not impossible) undertaking.
2. Using ready-made datasets
Why go through the trouble of collecting and processing data when you can just use someone else’s preprocessed datasets?
Ready-made datasets can be appealing since someone has already done all the hard work for you in making them. You’ve no doubt encountered plenty of them on sites like Kaggle.com and Data.gov.
Unfortunately, the convenience of this approach comes at the cost of flexibility and control. When you use a ready-made dataset, you are restricted by the preprocessing performed on that dataset prior to its upload.
Chances are that some of the records or features that would have been useful to you were discarded by the source.