Member-only story

What Is Dask and How Can It Help You as a Data Scientist?

Managing big data no longer means just buying bigger and faster servers

Ben Rogojan
Better Programming
5 min readFeb 25, 2020

--

Photo by Nathan John on Unsplash

It now also means needing to understand the concept of parallel computing.

The list of tools and data systems that are helping manage this specific concept continues to grow on a yearly basis. Whether it be using AWS and querying on Redshift or custom libraries, the need to learn how to wrangle data in parallel is very valuable.

Python — being the most popular language owing to its ease of use — offers a number of libraries that enable programmers to develop more powerful software for the purpose of running models and data transforms in parallel.

What if a magical solution appears and offers parallel computing, speeded up algorithms, and even allows you to integrate NumPy and pandas with the XGBoost libraries?

Well, we do have that magic potion and it goes by the name of “Dask”.

In this article, we will discuss what Dask is and why you might consider using it.

What Is Dask?

Dask is an open-source project that allows developers to build their software in coordination with scikit-learn, pandas, and NumPy. It is a very versatile tool that works…

--

--

Ben Rogojan
Ben Rogojan

Written by Ben Rogojan

#Data #Engineer, Strategy Development Consultant and All Around Data Guy #deeplearning #dataengineering #datascience #tech https://linktr.ee/SeattleDataGuy

Responses (2)

Write a response