Why You Should Invest in Julia Now, as a Data Scientist
Know what Julia has to offer and the resources to get started

Julia is a high level, dynamic programming language built to be as fast as C or C++ while remaining as easy to use as Python. For data scientists, this is a computational dream come true.
In this post, we will talk about the following topics with the goal being to convince a data scientist that the Julia ecosystem is worth investing time into. At a high level, the main reason to switch to Julia (or use it to supplement existing workflows) is the productivity it enables for developers. Who doesn’t love being able to work more effectively?!
Topics we will cover
- Julia use-cases 🧑💻
- Data Science packages 🤖
- Interoperability 🔀
- Speed ⚡️
- Learning Resources 📚
Edit: My Co-author and I are thrilled to share that pre-orders our new book, Julia Crash Course, are now live:
Real-world Julia use-cases 🧑💻
Julia is being used to solve a plethora of very interesting and challenging problems.
I will do my best to cover a few which I find particularly interesting, but it is worth noting that Julia Computing has a great section of their website dedicated to this: https://juliacomputing.com/case-studies/ and the annual Julia Conference (https://juliacon.org) is essentially one long case studies presentation (check out a nice visualization of JuliaCon talk topics: https://live.juliacon.org/viz, you will notice some interesting trends). Please check those out for more justice than I can do in this post.
Protecting The Electric Grid
One use case which I find interesting is from a company called Fugro Roames in Australia. They use Julia to detect potential damage to power lines based on aerial images. The company is quoted as saying:
“We were sold when we rewrote an inertial navigation / GNSS sensor fusion code from Matlab into Julia. The result was about 100x faster, and the new structure was more readable. The translation effort was quite minimal.”
You can read more about this case study here.
Adding Machine Learning to Low Power Devices
As many data scientists know, machine learning is a very computationally intensive task. The folks at LakeTide use Julia to create machine learning solutions in low-power environments. An interesting quote from them states:
“Julia has helped us increase our effectiveness as programmers and allows us to become productive quickly in a number of domains that normally require a large up-front investment in study time. A good example is GPU programming where we can begin loading data into CuArrays.jl for performance improvements before learning how to write custom kernels using CUDAdrv.jl.”
This use case really highlights one of the fundamental value adds for data scientists and users of Julia, the basic building blocks are there and work well. And when you want to do even more than you are doing now computationally, the skills you need to do so are already part of your repertoire. After all, it is just more Julia code! You don’t need to learn how to write Python and then write performant C code. Read more about the case study.
Simulating a public transportation system with OpenStreetMapX.jl
I could write a whole book on all of the cool things Julia is being used for. The bottom line is that folks using Julia for Data Science are seeing the benefits. The language is making them more productive and allowing them to solve problems faster and hopefully with more joy.
Packages for Data Science in Julia 📦
One interesting characteristic of the Julia community that differs from other open-source ecosystems is the extent to which package developers use GitHub organizations. In the case of:
- statistics, there is a Julia Stats org: https://github.com/JuliaStats.
- for machine learning, there is https://github.com/JuliaML
- for data operations, there is https://github.com/JuliaData.
You can find many of the Julia orgs listed here.

As you can see, the ecosystem is quite robust. Next, let us talk briefly about some of the common data science tools from the Python ecosystem (just because I am most familiar with those, sorry R folks) and their Julia equivalent.
I would posit that most folks doing data science in Python using Numpy and Pandas as core parts of their workflow. Due to the way Julia handles arrays, there is no special package that does Numpy-like operations. You can see an interesting example of this in a tutorial I found a few weeks ago: https://www.matecdev.com/posts/numpy-julia-fortran.html
Since Julia comes with first-class support for multi-dimensions arrays (https://docs.julialang.org/en/v1/manual/arrays/), data scientists can do much of what they would do in Numpy in Julia, without any additional overhead.
In the context of Pandas, the Julia ecosystem has DataFrames.jl (https://dataframes.juliadata.org/stable/) which just recently hit its critical 1.0 release at last years JuliaCon.
This talk does a good job of providing an overview of the ecosystem and current state of the packages. If you are tired of all this talk and want to just dive into using DataFrames, I suggest this workshop from JuliaCon 2021:
DataFrames.jl
is an excellent example of the power of Julia and maturity of the ecosystem. It has complete feature parity with Pandas and is extremely performant: https://h2oai.github.io/db-benchmark/. If you just can’t let go of Pandas (and who could blame you, it is really a great tool) or Numpy, you could always just use them right in your Julia code (read more here: https://www.geeksforgeeks.org/how-to-install-numpy-package-in-julia/). This is a great transition into our next section on interoperability.
Interoperability With Other Languages 🔀
Julia is great, but that doesn’t mean you have to do everything in it. Sometimes you write some elegant Matlab code and just can’t let go of it. The Julia Interop organization hosts all of the packages you need to stitch other languages into your Julia code: https://github.com/JuliaInterop
You can write Matlab, Python, R, C++, and more, right inside a Julia file! I actually already wrote up a simple example of this in a recent post which you can find here:
In this article, I showcase how you can work with the Hugging Face API directly in Julia by using PyCall (though since I wrote this, someone actually wrapped the API in Julia: https://github.com/chengchingwen/HuggingFaceApi.jl, the Julia community moves quick!)
Julia is fast ⚡️
If you had heard of Julia before reading this, you probably heard that it is fast. The legends are indeed true. A few of the examples above (like the DataFrames benchmark) highlight this. Let’s look at a couple of other examples:

This plot highlights a few basic operations and their speed in each language. This benchmark lives at: https://julialang.org/benchmarks/ so I’ll refer you there for all of the details.
Another place where Julia brings large speed improvements is in reading CSV’s (which I think most data scientists will agree they do more than they care to admit). There is another really nice article highlighting how Julia and CSV.jl provide a 10–20x speedup over Python and R:
In an effort to avoid re-inventing the wheel, I want to refer you all to one last article on Julia’s speed. “Yah Yah we get it, Julia is fast.”
In the below article, the author goes over writing a few basic algorithms in Julia and comparing their performance to other languages:
Learning Resources 📚
If you made it this far, you are hopefully pretty excited about learning Julia. I will try to give some suggestions for books and other resources that have helped me along my Julia journey. First, to learn the basics of Julia, head to: https://julialang.org/learning/
If you want to take an online course on the basics of Data Science in Julia taught by the wonderful Huda Nassar, head to:
If you want to read a book, check out the recently published Julia Data Science book:
which is totally free but has a paper copy available on Amazon if you, like me, need to spend less time looking at screens.
Closing Thoughts 🎁
I want to start by giving a huge shoutout to all the folks in the Julia community who made what I just wrote about possible. The works shown above is no small feat and many folks have worked very hard to make this all possible. I would also like to give a shout out to Hamel Husain who suggested I write this post:
While much of this post has covered many of the technical reasons why you might want to try Julia, one thing I would be remiss to not mention is the community. The folks who are using and building things in the ecosystem are really an incredible assortment of folks. Because Julia is being used to solve so many interesting and difficult problems, the folks in the Julia community tend to be rock stars. I have personally had the chance to meet and connect with so many interesting and kind folks. It is really one of the reasons I stick around and am so deeply passionate about helping the community.
I hope you found this article helpful. Thanks for reading.
Want to Connect With the Author?If you have questions or want to connect, please ping me on Twitter as I am always happy to help and chat with folks onboarding into the Julia community.