Better Programming

Advice for programmers.

Follow publication

Member-only story

The Pros and Cons of Using Jupyter Notebooks as Your Editor for Data Science Work

Steffen Sjursen
Better Programming
Published in
5 min readMar 1, 2020
Image by Gerd Altmann from Pixabay

Jupyter notebooks have three particularly strong benefits:

  • They’re great for showcasing your work. You can see both the code and the results. The notebooks at Kaggle is a particularly great example of this.
  • It’s easy to use other people’s work as a starting point. You can run cell by cell to better get an understanding of what the code does.
  • Very easy to host server side, which is useful for security purposes. A lot of data is sensitive and should be protected, and one of the steps toward that is no data is stored on local machines. A server-side Jupyter Notebook setup gives you that for free.

When prototyping, the cell-based approach of Jupyter notebooks is great. But you quickly end up programming several steps — instead of looking at object-oriented programming.

Downsides of Jupyter notebooks

When we’re writing code in cells instead of functions/classes/objects, you quickly end up with duplicate code that does the same thing, which is very hard to maintain.

Don’t get the support from a powerful IDE.

Consequences of duplicate code:

  • It’s hard to actually collaborate on code with Jupyter — as we’re copying snippets from each other it’s very easy to get out of sync
  • Hard to maintain one version of the truth. Which one of these notebooks has the one true solution to the number of xyz?

There’s also a tricky problem related to plotting. How are you sharing plots outside of the data science team? At first, Jupyter Notebook is a great way of sharing plots — just share the notebook! But how do you ensure the data there’s fresh? Easy, just have them run the notebook.

But in large organizations, you might run into a lot of issues as you don’t want too many users having direct access to the underlying data (for GDPR issues or otherwise). In practice, in a workplace, we’ve noticed plots from Jupyter typically get shared by copy/pasting into PowerPoint. It’s highly ineffective to have your data scientists do copy/paste…

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

Responses (7)

Write a response