Alligator Tutorial¶
Alligator is a simple offline task queuing system. It enables you to take expensive operations & move them offline, either to a different process or even a whole different server.
This is extremely useful in the world of web development, where request-response cycles should be kept as quick as possible. Scheduling tasks helps remove expensive operations & keeps end-users happy.
Some example good use-cases for offline tasks include:
- Sending emails
- Resizing images/creating thumbnails
- Notifying social networks
- Fetching data from other data sources
You should check out the instructions on Installing Alligator to install Alligator.
Alligator is written in pure Python & can work with all frameworks. For this tutorial, we’ll assume integration with a Django-based web application, but it could just as easily be used with Flask, Pyramid, pure WSGI applications, etc.
Philosophy¶
Alligator is a bit different in approach from other offline task systems. Let’s highlight some ways & the why’s.
- Tasks Are Any Plain Old Function
No decorators, no special logic/behavior needed inside, no inheritance. ANY importable Python function can become a task with no modifications required.
Importantly, it must be importable. So instance methods on a class aren’t processable.
- Plain Old Python
- Nothing specific to any framework or architecture here. Plug it in to whatever code you want.
- Simplicity
- The code for Alligator should be small & fast. No complex gymnastics, no premature optimizations or specialized code to suit a specific backend.
- You’re In Control
Your code calls the tasks & can setup all the execution options needed. There are hook functions for special processing, or you can use your own
Task
orClient
classes.Additionally, you control the consuming of the queue, so it can be processed your way (or fanned out, or prioritized, or whatever).
Figure Out What To Offline¶
The very first thing to do is figure out where the pain points in your application are. Doing this analysis differs wildly (though things like django-debug-toolbar, profile or snakeviz can be helpful). Broadly speaking, you should look for things that:
- access the network
- do an expensive operation
- may fail & require retrying
- things that aren’t immediately required for success
If you have a web application, just navigating around & timing pageloads can be a cheap/easy way of finding pain points.
For the purposes of this tutorial, we’ll assume a user of our hot new Web 3.0 social network made a new post & all their followers need to see it.
So our existing view code might look like:
from django.conf import settings
from django.http import Http404
from django.shortcuts import redirect, send_email
from sosocial.models import Post
def new_post(request):
if not request.method == 'POST':
raise Http404('Gotta use POST.')
# Don't write code like this. Sanitize your data, kids.
post = Post.objects.create(
message=request.POST['message']
)
# Ugh. We're sending an email to everyone who follows the user, which
# could mean hundreds or thousands of emails. This could timeout!
subject = "A new post by {}".format(request.user.username)
to_emails = [follow.email for follow in request.user.followers.all()]
send_email(
subject,
post.message,
settings.SERVER_EMAIL,
recipient_list=to_emails
)
# Redirect like a good webapp should.
return redirect('activity_feed')
Creating a Task¶
The next step won’t involve Alligator at all. We’ll extract that slow code into an importable function, then call it from where the code used to be. So we can convert our existing code into:
from django.contrib.auth.models import User
from django.conf import settings
from django.http import Http404
from django.shortcuts import redirect, send_email
from sosocial.models import Post
def send_post_email(user_id, post_id):
post = Post.objects.get(pk=post_id)
user = User.objects.get(pk=user_id)
subject = "A new post by {}".format(user.username)
to_emails = [follow.email for follow in user.followers.all()]
send_email(
subject,
post.message,
settings.SERVER_EMAIL,
recipient_list=to_emails
)
def new_post(request):
if not request.method == 'POST':
raise Http404('Gotta use POST.')
# Don't write code like this. Sanitize your data, kids.
post = Post.objects.create(
message=request.POST['message']
)
# The code was here. Now we'll call the function, just to make sure
# things still work.
send_post_email(request.user.pk, post.pk)
# Redirect like a good webapp should.
return redirect('activity_feed')
Now go run your tests or hand-test things to ensure they still work. This is important because it helps guard against regressions in your code.
You’ll note we’re not directly passing the User
or Post
instances,
instead passing the primary identifiers, even as it stands it’s causing two
extra queries. While this is sub-optimal as things stands, it neatly prepares
us for offlining the task.
Note
Why not pass the instances themselves?
While it’s possible to create instances that nicely serialize, the problem with this approach is stale data & unnecessarily large payloads.
While the ideal situation is tasks that are processed within seconds of being added to the queue, in the real world, queues can get backed up & users may further change data. By fetching the data fresh when processing the task, you ensure you’re not working with old data.
Further, most queues are optimized for small payloads. The more data to send over the wire, the slower things go. Given that’s the opposite reason for adding a task queue, it doesn’t make sense.
Create a Gator Instance¶
While it’s great we got better encapsulation by pulling out the logic into its own function, we’re still doing the sending of email in-process, which means our view is still slow.
This is where Alligator comes in. We’ll start off by importing the Gator
class at the top of the file & making an instance.
Note
Unless you’re only using Alligator in one file, a best practice would
be to put that import & initialization into it’s own file, then import that
configured gator
object into your other files. Configuring it in one
place is better than many instantiations (but also allows for setting
up a different instance elsewhere).
When creating a Gator
instance, you’ll need to choose a queue backend.
Alligator ships with support for local-memory, Redis & SQS. See the
Installing Alligator docs for setup info.
Local Memory¶
Primarily only for development or in testing, this has no dependencies, but keeps everything in-process.
from alligator import Gator
# Creates an in-memory/in-process queue.
# The same process must consume from the queue, or things will be thrown
# away when the process exits.
gator = Gator('locmem://')
Redis¶
Redis is a good option for production and small-large installations.
from alligator import Gator
# Connect to a locally-running Redis server & use DB 0.
gator = Gator('redis://localhost:6379/0')
SQS¶
Amazon SQS is specifically a queue service & works well in large-scale environments.
from alligator import Gator
# Connect to the globally available SQS service.
gator = Gator('sqs://us-west-2/')
For the duration of the tutorial, we’ll assume you chose Redis.
SQLite¶
SQLite excels in small/light loads & simple setups (or development).
from alligator import Gator
# Setup the SQLite database & the `all` queue table.
gator = Gator('sqlite:///var/data/sqlite/my_queue.db')
# This only needs to be run *once* per-queue.
gator.backend.setup_tables("all")
Put the Task on the Queue¶
After we make a Gator
instance, the only other change is to how we call
send_post_email
. Instead of calling it directly, we’ll need to enqueue
a task.
There are two common ways of creating a task in Alligator:
gator.task()
- A typical function call. You pass in the callable & the
*args
/**kwargs
to provide to the callable. It gets put on the queue with the default task execution options. gator.options()
- Creates a context manager that has a
.task()
method that works like the above. This is useful for controlling the task execution options, such as retries or if the task should be asynchronous. See the “Working Around Failsome Tasks” section below.
Since we’re just starting out with Alligator & looking to replicate the
existing behavior, we’ll use gator.task(...)
to create & enqueue the task.
# Old code
send_post_email(request.user.pk, post.pk)
# New code
gator.task(send_post_email, request.user.pk, post.pk)
Hardly changed in code, but a world of difference in execution speed. Rather than blasting out hundreds of emails & possibly timing out, a task is placed on the queue & execution continues quickly. The complete code looks like:
from alligator import Gator
from django.contrib.auth.models import User
from django.conf import settings
from django.http import Http404
from django.shortcuts import redirect, send_email
from sosocial.models import Post
# Please configure this once & import it elsewhere.
# Bonus points if you use a settings (e.g. ``settings.ALLIGATOR_DSN``)
# instead of a hard-coded string.
gator = Gator('redis://localhost:6379/0')
def send_post_email(user_id, post_id):
post = Post.objects.get(pk=post_id)
user = User.objects.get(pk=user_id)
subject = "A new post by {}".format(user.username)
to_emails = [follow.email for follow in user.followers.all()]
send_email(
subject,
post.message,
settings.SERVER_EMAIL,
recipient_list=to_emails
)
def new_post(request):
if not request.method == 'POST':
raise Http404('Gotta use POST.')
# Don't write code like this. Sanitize your data, kids.
post = Post.objects.create(
message=request.POST['message']
)
# The function call was here. Now we'll create a task then carry on.
gator.task(send_post_email, request.user.pk, post.pk)
# Redirect like a good webapp should.
return redirect('activity_feed')
Running a Worker¶
Time to kick back, relax & enjoy your speedy new site, right?
Unfortunately, not quite. Now we’re successfully queuing up tasks for later
processing & things are completing quickly, but nothing is processing those
tasks. So we need to run a Worker
to consume the queued tasks.
We have two options here. We can either use the included latergator.py
script or we can create our own. The following are identical in function:
$ latergator.py redis://localhost:6379/0
Or…
# Within something like ``run_tasks.py``...
from alligator import Gator, Worker
# Again, bonus points for an import and/or settings usage.
gator = Gator('redis://localhost:6379/0')
worker = Worker(gator)
worker.run_forever()
Both of these will create a long-running process, which will consume tasks off the queue as fast as they can.
While this is fine to start off, if you have a heavily trafficked site, you’ll likely need many workers. Simply start more processes (using a tool like Supervisor works best).
You can also make things like management commands, build other custom tooling around processing or even launch workers on their own dedicated servers.
Working Around Failsome Tasks¶
Sometimes tasks don’t always succeed on the first try. Maybe the database is down, the mail server isn’t working or a remote resource can’t be loaded. As it stands, our task will try once then fail loudly.
Alligator also supports retrying tasks, as well as having an on_error
hook.
To specify we want retries, we’ll have to use the other important bit of
Alligator, Gator.options
.
Gator.options
gives you a context manager & allows you to configure task
execution options that then apply to all tasks within the manager. Using that
looks like:
# Old code
# gator.task(send_post_email, request.user.pk, post.pk)
# New code
with gator.options(retries=3) as opts:
# Be careful to use ``opts.task``, not ``gator.task`` here!
opts.task(send_post_email, request.user.pk, post.pk)
Now that task will get three retries when it’s processed, making network failures much more tolerable.
Delaying/Scheduling Tasks¶
You can also choose to either delay the execution of a task by a set number of seconds OR schedule when the task can first run.
To delay a task, just add the delay_by
parameter:
# Don't execute this task for at least 5 minutes.
with gator.options(delay_by=60 * 5):
opts.task(send_post_email, request.user.pk, post.pk)
To schedule a task in the future, you’ll need to provide the delay_until
parameter, set to a future Unix timestamp:
import time
# There are lots of ways to compute a future timestamp.
# For our purposes here, let's just do some simple math.
tomorrow = time.time() + (60 * 60 * 24)
with gator.options(delay_until=tomorrow):
opts.task(send_post_email, request.user.pk, post.pk)
Testing Tasks¶
All of this is great, but if you can’t test the task, you might as well not have code.
Alligator supports an is_async=False
option, which means that
rather than being put on the queue, your task runs right away (acting like you
just called the function, but with all the retries & hooks included).
# Bonus points for using ``settings.DEBUG`` (or similar) instead of a
# hard-coded ``False``.
with gator.options(is_async=False) as opts:
opts.task(send_post_email, request.user.pk, post.pk)
Now your existing integration tests (from before converting to offline tasks) should work as expected.
Warning
Make sure you don’t accidently commit this & deploy to production. If so, why have an offline task system at all?
Additionally, you get naturally improved ability to test, because now your tasks are just plain old functions. This means you can typically just import the function & write tests against it (rather than the whole view), which makes for better unit tests & fewer integration tests to ensure things work right.
Going Beyond¶
This is 90%+ of the day-to-day usage of Alligator, but there’s plenty more you can do with it.
You may wish to peruse the Best Practices docs for ideas on how to keep your Alligator clean & flexible.
If you need more custom functionality, the Extending Alligator docs have examples on:
- Customizing task behavior using the
on_start/on_success/on_error
hook functions. - Custom
Task
classes. - Multiple queues &
Workers
for scalability. - Custom backends.
Worker
subclasses.
Happy queuing!