Member-only story
Use Python to Convert Worksheets in an Excel File to Separate CSV Files
ETL improvements
Working with various different clients has exposed me to scenarios where the data files are provided in an Excel .xlsx
format. To me personally, this file format is not very versatile and compatible when it comes to programmatically doing something with it.
There are a few reasons, but one of the main ones is how Excel automatically parses some values into a particular format (numeric, scientific notation, and dates as examples) as it tries to be “smart”.
In some cases, this might be useful but when it comes to programming, this can cause bugs and add complexities.
As a result, I would normally convert Excel files into CSV files before I start processing the data contained in them.
This is so that the data format then stays consistent and in general, CSV files are easily integrated into any program (since it is just a text file where the data in it is separated by commas ,
).

Initially, I used to just export each of the worksheets in an Excel workbook as CSV
and save them as new files.
However, as I kept getting more Excel files to be processed and make part of an ETL pipeline, I decided that I needed to automate it programmatically so I don’t need to spend any more time manually converting them into CSV files.
I believe there are some of you who are experiencing the same and so I decided to share the following simple Python script that does the conversion of an Excel worksheet to a CSV file.