Huey Extensions¶
The huey.contrib
package contains modules that provide extra functionality
beyond the core APIs.
Mini-Huey¶
MiniHuey
provides a very lightweight huey-like API that may be
useful for certain classes of applications. Unlike Huey
, the
MiniHuey
consumer runs inside a greenlet in your main application process.
This means there is no separate consumer process to run, not is there any
persistence for the enqueued/scheduled tasks; whenever a task is enqueued or is
scheduled to run, a new greenlet is spawned to execute the task.
Usage and task declaration:
-
class
MiniHuey
([name='huey'[, interval=1[, pool_size=None]]])¶ Parameters: - name (str) – Name given to this huey instance.
- interval (int) – How frequently to check for scheduled tasks (seconds).
- pool_size (int) – Limit number of concurrent tasks to given size.
from huey import crontab
from huey.contrib.minimal import MiniHuey
huey = MiniHuey()
@huey.task()
def fetch_url(url):
return urllib2.urlopen(url).read()
@huey.task(crontab(minute='0', hour='4'))
def run_backup():
pass
Note
There is not a separate decorator for periodic, or crontab, tasks. Just
use MiniHuey.task()
and pass in a validation function.
When your application starts, be sure to start the MiniHuey
scheduler:
from gevent import monkey; monkey.patch_all()
huey.start() # Kicks off scheduler in a new greenlet.
start_wsgi_server() # Or whatever your main application is doing...
Warning
Tasks enqueued manually for immediate execution will be run regardless of
whether the scheduler is running. If you want to be able to schedule tasks
in the future or run periodic tasks, you will need to call
start()
.
Calling tasks and getting results works about the same as regular huey:
async_result = fetch_url('https://www.google.com/')
html = async_result.get() # Block until task is executed.
# Fetch the Yahoo! page in 30 seconds.
async_result = fetch_url.schedule(args=('https://www.yahoo.com/',),
delay=30)
html = async_result.get() # Blocks for ~30s.
SQLite Storage¶
The SqliteHuey
and the associated SqliteStorage
can be
used instead of the default RedisHuey
. SqliteHuey
is
implemented in such a way that it can safely be used with a multi-process,
multi-thread, or multi-greenlet consumer.
Using SqliteHuey
is almost exactly the same as using RedisHuey
.
Begin by instantiating the Huey
object, passing in the name of the queue
and the filename of the SQLite database:
from huey.contrib.sqlitedb import SqliteHuey
huey = SqliteHuey('my_app', filename='/var/www/my_app/huey.db')
Note
The SQLite storage engine depends on peewee.
For information on installing peewee, see the
peewee installation documentation,
or simply run: pip install peewee
.
Simple Server¶
Huey supports a simple client/server database that can be used for development and testing. The server design is inspired by redis and implements commands that map to the methods described by the storage API. If you’d like to read a technical post about the implementation, check out this blog post.
The server can optionally use gevent, but if
gevent is not available you can use threads (use -t
for threads).
To obtain the simple server, you can clone the simpledb
repository:
$ git clone https://github.com/coleifer/simpledb
$ cd simpledb
$ python setup.py install
Running the simple server¶
Usage:
Usage: simpledb.py [options]
Options:
-h, --help show this help message and exit
-d, --debug Log debug messages.
-e, --errors Log error messages only.
-t, --use-threads Use threads instead of gevent.
-H HOST, --host=HOST Host to listen on.
-m MAX_CLIENTS, --max-clients=MAX_CLIENTS
Maximum number of clients.
-p PORT, --port=PORT Port to listen on.
-l LOG_FILE, --log-file=LOG_FILE
Log file.
-x EXTENSIONS, --extensions=EXTENSIONS
Import path for Python extension module(s).
By default the server will listen on localhost, port 31337.
Example (with logging):
$ python simpledb.py --debug --log-file=/var/log/huey-simple.log
Using simple server with Huey¶
To use the simple server with Huey, use the SimpleHuey
class:
from huey.contrib.simple_storage import SimpleHuey
huey = SimpleHuey('my-app')
@huey.task()
def add(a, b):
return a + b
The SimpleHuey
class relies on a SimpleStorage
storage
backend, which in turn, uses the simple.Client
client class.
Django¶
Huey comes with special integration for use with the Django framework. The integration provides:
- Configuration of huey via the Django settings module.
- Running the consumer as a Django management command.
- Auto-discovery of
tasks.py
modules to simplify task importing. - Properly manage database connections.
Supported Django versions are the officially supported at https://www.djangoproject.com/download/#supported-versions
Setting things up¶
To use huey with Django, the first step is to add an entry to your project’s
settings.INSTALLED_APPS
:
# settings.py
# ...
INSTALLED_APPS = (
# ...
'huey.contrib.djhuey', # Add this to the list.
# ...
)
The above is the bare minimum needed to start using huey’s Django integration. If you like, though, you can also configure both Huey and the consumer using the settings module.
Note
Huey settings are optional. If not provided, Huey will default to using Redis running on localhost:6379 (standard setup).
Configuration is kept in settings.HUEY
, which can be either a dictionary or
a Huey
instance. Here is an example that shows all of the supported
options with their default values:
# settings.py
HUEY = {
'name': settings.DATABASES['default']['NAME'], # Use db name for huey.
'result_store': True, # Store return values of tasks.
'events': True, # Consumer emits events allowing real-time monitoring.
'store_none': False, # If a task returns None, do not save to results.
'always_eager': settings.DEBUG, # If DEBUG=True, run synchronously.
'store_errors': True, # Store error info if task throws exception.
'blocking': False, # Poll the queue rather than do blocking pop.
'backend_class': 'huey.RedisHuey', # Use path to redis huey by default,
'connection': {
'host': 'localhost',
'port': 6379,
'db': 0,
'connection_pool': None, # Definitely you should use pooling!
# ... tons of other options, see redis-py for details.
# huey-specific connection parameters.
'read_timeout': 1, # If not polling (blocking pop), use timeout.
'max_errors': 1000, # Only store the 1000 most recent errors.
'url': None, # Allow Redis config via a DSN.
},
'consumer': {
'workers': 1,
'worker_type': 'thread',
'initial_delay': 0.1, # Smallest polling interval, same as -d.
'backoff': 1.15, # Exponential backoff using this rate, -b.
'max_delay': 10.0, # Max possible polling interval, -m.
'utc': True, # Treat ETAs and schedules as UTC datetimes.
'scheduler_interval': 1, # Check schedule every second, -s.
'periodic': True, # Enable crontab feature.
'check_worker_health': True, # Enable worker health checks.
'health_check_interval': 1, # Check worker health every second.
},
}
Alternatively, you can simply set settings.HUEY
to a Huey
instance and do your configuration directly. In the example below, I’ve also
shown how you can create a connection pool:
# settings.py -- alternative configuration method
from huey import RedisHuey
from redis import ConnectionPool
pool = ConnectionPool(host='my.redis.host', port=6379, max_connections=20)
HUEY = RedisHuey('my-app', connection_pool=pool)
Running the Consumer¶
To run the consumer, use the run_huey
management command. This command
will automatically import any modules in your INSTALLED_APPS
named
tasks.py. The consumer can be configured using both the django settings
module and/or by specifying options from the command-line.
Note
Options specified on the command line take precedence over those specified in the settings module.
To start the consumer, you simply run:
$ ./manage.py run_huey
In addition to the HUEY.consumer
setting dictionary, the management command
supports all the same options as the standalone consumer. These options are
listed and described in the Options for the consumer
section.
For quick reference, the most important command-line options are briefly listed here.
-w
,--workers
- Number of worker threads/processes/greenlets. Default is 1, but most applications should use at least 2.
-k
,--worker-type
- Worker type, must be “thread”, “process” or “greenlet”. The default is thread, which provides good all-around performance. For CPU-intensive workloads, process is likely to be more performant. The greenlet worker type is suited for IO-heavy workloads. When using greenlet you can specify tens or hundreds of workers since they are extremely lightweight compared to threads/processes. See note below on using gevent/greenlet.
Note
Due to a conflict with Django’s base option list, the “verbose” option is
set using -V
or --huey-verbose
. When enabled, huey logs at the
DEBUG level.
For more information, read the Options for the consumer section.
Using gevent¶
When using worker type greenlet, it’s necessary to apply a monkey-patch
before any libraries or system modules are imported. Gevent monkey-patches
things like socket
to provide non-blocking I/O, and if those modules are
loaded before the patch is applied, then the resulting code will execute
synchronously.
Unfortunately, because of Django’s design, the only way to reliably apply this
patch is to create a custom bootstrap script that mimics the functionality of
manage.py
. Here is the patched manage.py
code:
#!/usr/bin/env python
import os
import sys
# Apply monkey-patch if we are running the huey consumer.
if 'run_huey' in sys.argv:
from gevent import monkey
monkey.patch_all()
if __name__ == "__main__":
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "conf")
from django.core.management import execute_from_command_line
execute_from_command_line(sys.argv)
How to create tasks¶
The task()
and periodic_task()
decorators can be
imported from the huey.contrib.djhuey
module. Here is how you might define
two tasks:
from huey import crontab
from huey.contrib.djhuey import periodic_task, task
@task()
def count_beans(number):
print('-- counted %s beans --' % number)
return 'Counted %s beans' % number
@periodic_task(crontab(minute='*/5'))
def every_five_mins():
print('Every five minutes this will be printed by the consumer')
Tasks that execute queries¶
If you plan on executing queries inside your task, it is a good idea to close
the connection once your task finishes. To make this easier, huey provides a
special decorator to use in place of task
and periodic_task
which will
automatically close the connection for you.
from huey import crontab
from huey.contrib.djhuey import db_periodic_task, db_task
@db_task()
def do_some_queries():
# This task executes queries. Once the task finishes, the connection
# will be closed.
@db_periodic_task(crontab(minute='*/5'))
def every_five_mins():
# This is a periodic task that executes queries.
DEBUG and Synchronous Execution¶
When settings.DEBUG = True
, tasks will be executed synchronously just like
regular function calls. The purpose of this is to avoid running both Redis and
an additional consumer process while developing or running tests. If, however,
you would like to enqueue tasks regardless of whether DEBUG = True
, then
explicitly specify always_eager=False
in your huey settings:
# settings.py
HUEY = {
'name': 'my-app',
# Other settings ...
'always_eager': False,
}
Configuration Examples¶
This section contains example HUEY
configurations.
# Redis running locally with four worker threads.
HUEY = {
'name': 'my-app',
'consumer': {'workers': 4, 'worker_type': 'thread'},
}
# Redis on network host with 64 worker greenlets and connection pool
# supporting up to 100 connections.
from redis import ConnectionPool
pool = ConnectionPool(
host='192.168.1.123',
port=6379,
max_connections=100)
HUEY = {
'name': 'my-app',
'connection': {'connection_pool': pool},
'consumer': {'workers': 64, 'worker_type': 'greenlet'},
}
It is also possible to specify the connection using a Redis URL, making it easy to configure this setting using a single environment variable:
HUEY = {
'name': 'my-app',
'url': os.environ.get('REDIS_URL', 'redis://localhost:6379/?db=1')
}
Alternatively, you can just assign a Huey
instance to the HUEY
setting:
from huey import RedisHuey
HUEY = RedisHuey('my-app')