Member-only story
Distributed Task Queues With Django, RabbitMQ, and Celery
Learn about distributed task queues for making asynchronous API requests

What happens when a user sends a request, but processing that request takes longer than the HTTP request-response cycle? What if you’re accessing multiple databases or want to return a document too large to process within the time window? What if you want to access an API, but the number of requests is throttled to a maximum of n requests per t time window?
These are part of the questions that were raised during the data collection process for my master’s thesis. For my research, microposts from Twitter were scraped via the Twitter API. Two main issues arose that are resolved by distributed task queues:
- The Twitter API limits requests to a maximum of 900 GET statuses/lookups per request window of 15 minutes. Data collection consisted of well over 100k requests, or 30+ hours. Mitigating this process to a server proved indispensable in the planning.
- Database operations, in particular the creation of instances for annotators in our server-hosted annotation tool, exceeded the request/response time window. To be able to create these instances, I needed to use a distributed task queue.
This article will cover:
- Celery distributed task queues
- RabbitMQ workers
- Django implementation
- Create the view
- Activate workers
- Updating and troubleshooting
These steps can be followed offline via a localhost Django project or online on a server (for example, via DigitalOcean, Transip, or AWS). The benefit of having a server is that you do not need to turn on your computer to run these distributed task queues, and for the Twitter API use case, that means 24/7 data collection requests.
Be aware, the implementation of distributed task queues can a bit of a pickle and can get quite difficult. Don’t hesitate to reach out for help! A basic understanding of the MVC architecture (forms, URL endpoints, and views) in Django is assumed in this article.