Best Practices¶
Moving to offlining tasks requires some shifts in the way you develop your code. There are also some good tricks/ideas for integrating Alligator.
If you have suggestions for other best practices, please submit a pull request at https://github.com/toastdriven/alligator/pulls!
Configure One Gator
¶
This is alluded to in the Alligator Tutorial, but unless you have advanced needs,
you’re probably best off configuring a single Gator
instance in your
code. Then you can import that instance wherever you need it.
Generally speaking, you’ll want to create a new file for just this, though if
you have a utils.py
or other common file, you can add it there. For
example:
# Create a new file, like ``myapp/gator.py``
from alligator import Gator
gator = Gator('redis://localhost:6379/0')
Then your code elsewhere imports it:
# ``myapp/views.py``
from myapp.gator import gator
# ...Later...
def previously_slow_view(request):
gator.task(expensive_cache_rebuild, user_id=request.user.pk)
This helps DRY up your code. It also helps you avoid having to change many files if you change backends or configuration.
Use Environment Variables or Settings for the Gator
DSN¶
Instead of hard-coding the DSN for each Gator
instance, you should rely on a configuration setting instead.
If you’re using plain old Python or subscribe to the Twelve-Factor App, you might lean on environment variables set in the shell. For instance, the Alligator test suite does:
import os
from alligator import Gator
# Lean on the ENV variable.
gator = Gator(os.environ['ALLIGATOR_CONN'])
Then when running your app, you could do the following in development, for ease of setup:
$ export ALLIGATOR_CONN=locmem://
$ python myapp.py
But the following on production, for handling large loads:
$ export ALLIGATOR_CONN=redis://some.dns.name.com:6379/0
$ python myapp.py
If you’re using something like Django, you could lean on settings
instead, like:
from alligator import Gator
from django.conf import settings
# Lean on the settings variable.
gator = Gator(settings.ALLIGATOR_CONN)
And have differing settings files for development vs. production.
Use an Alternate Queue for Testing¶
This is an important one. By default, Alligator doesn’t make any assumptions about what environment (development, testing, production) it is in. So the same queue name will be used.
Especially if you have a shared queue setup for running tests, you can accidentally add testing data to your queue! There are two possible resolutions to this:
Don’t Share
Set your testing environment up such that it has it’s own queue stack. This will nicely isolate things & not require any code changes.
Prefix your
queue_name
If you must share setup (for instance, developing & testing on the same machine), use a similar approach to the “Env/Settings for Gator DSN” tip, providing a prefix for your queue name. For example:
import os
from alligator import Gator
# Lean on the ENV variable for a queue prefix.
gator = Gator(
'redis://localhost:6379/0',
# If you ``export ALLIGATOR_PREFIX=test```, your queue name
# becomes 'test_all'. If not set, it's just 'all'.
queue_name='_'.join([os.environ.get('ALLIGATOR_PREFIX', ''), 'all'])
)
Use Environment Variables or Settings for Task.is_async
¶
If you’re just using gator.task
& trying to write tests, you may have a
hard time verifying behavior in an integration test (though you should be able
to just unit test the task function).
On the other hand, if you use the gator.options
context manager & supply
an is_async=False
execution option, integration tests become easy, as the
expense of possibly accidentally committing that & causing issues in production.
The best approach is to use the gator.options
context manager, but use
an environment variable/setting to control if things run asynchronously.
import os
# Using the above tip of a single import...
from myapp.gator import gator
def some_view(request):
with gator.options(is_async=os.environ['ALLIGATOR_ASYNC']) as opts:
opts.task(expensive_thing)
This allows you to set export ALLIGATOR_ASYNC=False
in development/testing
(so the task runs right away in-process) but queues appropriately in
production.
Simple Task Parameters¶
When creating task functions, you want to simplify the arguments passed to it, as well as removing as many assumptions as possible.
You may be tempted to try to save queries by passing full objects or large lists of things as a parameter.
However, you must remember that the task may run at a very different time (perhaps hours in the future if you’re overloaded) or on a completely different machine than the one scheduling the task. Data goes stale easily & few things are as frustrating to debug as stale data being re-written over the top of new data.
Where possible, do the following things:
- Pass primary keys or identifiers instead of rich objects
- Persist large collections in the database or elsewhere, then pass a lookup identifier to the task
- Use simple data types, as they serialize well & result in smaller queue payloads, meaning faster scheduling & consuming of tasks
Re-use the Gator.options
Context Manager¶
All the examples in the Alligator docs show creating a single task within
a gator.options(...)
context manager. So you might be tempted to write code
like:
with gator.options(retries=3) as opts:
opts.task(send_mass_mail, list_id)
with gator.options(retries=3) as opts:
opts.task(update_follow_counts, request.user.pk)
However, you can reuse that context manager to provide the same execution options to all tasks within the block. So we can clean up & shorten our code to:
with gator.options(retries=3) as opts:
opts.task(send_mass_mail, list_id)
opts.task(update_follow_counts, request.user.pk)
Two unique tasks will still be created, but both will have the retries=3
provided to better ensure they succeeed.