任务/Tasks¶

Tasks

中文

任务是 Celery 应用的构建模块。

任务是一个类，可以由任何可调用对象创建。它同时扮演双重角色：既定义了调用任务时发生的行为（发送消息），也定义了 worker 接收到该消息后发生的行为。

每个任务类都有一个唯一的名称，该名称会在消息中被引用，以便 worker 能够找到要执行的正确函数。

任务消息在被 worker 确认之前不会从队列中移除。 worker 可以提前预留多个消息，即使 worker 被杀死（如电源故障或其他原因），消息也会被重新投递给另一个 worker。

理想情况下，任务函数应具备幂等性：即即使以相同参数调用多次，也不会产生非预期的副作用。由于 worker 无法检测你的任务是否幂等，默认行为是在任务执行前立即确认消息，以避免已经开始执行的任务被再次执行。

如果你的任务是幂等的，可以设置 acks_late 选项，使 worker 在任务返回之后再确认消息。另见 FAQ 条目我应该使用 retry 还是 acks_late？。

请注意，即使启用了 acks_late，如果执行任务的子进程终止（无论是任务调用了 sys.exit()，还是收到信号），worker 仍然会确认该消息。此行为是有意为之，原因如下：

我们不希望重新执行那些导致内核发送 SIGSEGV （段错误）或类似信号的任务。
我们假设系统管理员主动杀死任务，是不希望它自动重启。
占用大量内存的任务可能会触发内核的 OOM 杀手（Out-Of-Memory Killer），同样的情况可能再次发生。
若任务在重新投递时总是失败，可能会造成高频消息循环，进而拖垮系统。

如果你确实希望在上述情形下重新投递任务，可以考虑启用 task_reject_on_worker_lost 配置项。

警告

阻塞任务若无限期运行，最终可能导致 worker 实例无法处理其他任务。

如果你的任务涉及 I/O 操作，务必为这些操作添加超时，比如使用 https://pypi.org/project/requests/ 库进行 Web 请求时添加超时参数：

connect_timeout, read_timeout = 5.0, 30.0
response = requests.get(URL, timeout=(connect_timeout, read_timeout))

时间限制是确保所有任务及时返回的便捷手段，但时间限制触发时会强制杀死进程，因此应仅用于那些尚未配置手动超时的场景。

在早期版本中，默认的 prefork 池调度器对长时间运行的任务不太友好，因此如果你有运行数分钟或数小时的任务，建议在启动 celery worker 时启用命令行选项 -Ofair。不过，从 4.0 版本起，-Ofair 已成为默认调度策略。参见预取限制以获取更多信息，若需获得最佳性能，建议将长任务与短任务路由到专用的 worker 上（参见自动路由）。

如果你的 worker 出现挂起现象，请先调查哪些任务正在运行再提交问题，因为挂起通常是某些任务在执行网络操作时阻塞所致。

英文

Tasks are the building blocks of Celery applications.

A task is a class that can be created out of any callable. It performs dual roles in that it defines both what happens when a task is called (sends a message), and what happens when a worker receives that message.

Every task class has a unique name, and this name is referenced in messages so the worker can find the right function to execute.

A task message is not removed from the queue until that message has been acknowledged by a worker. A worker can reserve many messages in advance and even if the worker is killed -- by power failure or some other reason -- the message will be redelivered to another worker.

Ideally task functions should be idempotent: meaning the function won't cause unintended effects even if called multiple times with the same arguments. Since the worker cannot detect if your tasks are idempotent, the default behavior is to acknowledge the message in advance, just before it's executed, so that a task invocation that already started is never executed again.

If your task is idempotent you can set the acks_late option to have the worker acknowledge the message after the task returns instead. See also the FAQ entry 我应该使用 retry 还是 acks_late？.

Note that the worker will acknowledge the message if the child process executing the task is terminated (either by the task calling sys.exit(), or by signal) even when acks_late is enabled. This behavior is intentional as...

We don't want to rerun tasks that forces the kernel to send a SIGSEGV (segmentation fault) or similar signals to the process.
We assume that a system administrator deliberately killing the task does not want it to automatically restart.
A task that allocates too much memory is in danger of triggering the kernel OOM killer, the same may happen again.
A task that always fails when redelivered may cause a high-frequency message loop taking down the system.

If you really want a task to be redelivered in these scenarios you should consider enabling the task_reject_on_worker_lost setting.

警告

A task that blocks indefinitely may eventually stop the worker instance from doing any other work.

If your task does I/O then make sure you add timeouts to these operations, like adding a timeout to a web request using the https://pypi.org/project/requests/ library:

connect_timeout, read_timeout = 5.0, 30.0
response = requests.get(URL, timeout=(connect_timeout, read_timeout))

Time limits are convenient for making sure all tasks return in a timely manner, but a time limit event will actually kill the process by force so only use them to detect cases where you haven't used manual timeouts yet.

In previous versions, the default prefork pool scheduler was not friendly to long-running tasks, so if you had tasks that ran for minutes/hours, it was advised to enable the -Ofair command-line argument to the celery worker. However, as of version 4.0, -Ofair is now the default scheduling strategy. See 预取限制 for more information, and for the best performance route long-running and short-running tasks to dedicated workers (自动路由).

If your worker hangs then please investigate what tasks are running before submitting an issue, as most likely the hanging is caused by one or more tasks hanging on a network operation.

名称¶

Names

中文

每个任务都必须拥有一个唯一的名称。

如果没有显式指定名称，任务装饰器会自动为你生成一个名称，该名称基于以下两个因素：1）定义任务的模块名，2）任务函数的名称。

显式指定任务名称的示例：

>>> @app.task(name='sum-of-two-numbers')
>>> def add(x, y):
...     return x + y

>>> add.name
'sum-of-two-numbers'

一个推荐的最佳实践是使用模块名作为命名空间，这样可以避免任务名称在多个模块之间发生冲突：

>>> @app.task(name='tasks.add')
>>> def add(x, y):
...     return x + y

你可以通过查看任务的 .name 属性来获取它的名称：

>>> add.name
'tasks.add'

这里指定的名称（tasks.add）恰好是任务定义在名为 tasks.py 的模块中时会自动生成的名称：

tasks.py:

@app.task
def add(x, y):
    return x + y

>>> from tasks import add
>>> add.name
'tasks.add'

备注

你可以在 worker 中使用 inspect 命令查看所有已注册任务的名称。参见用户指南中命令行管理实用程序（inspect/control）部分的 inspect registered 命令。

英文

Every task must have a unique name.

If no explicit name is provided the task decorator will generate one for you, and this name will be based on 1) the module the task is defined in, and 2) the name of the task function.

Example setting explicit name:

>>> @app.task(name='sum-of-two-numbers')
>>> def add(x, y):
...     return x + y

>>> add.name
'sum-of-two-numbers'

A best practice is to use the module name as a name-space, this way names won't collide if there's already a task with that name defined in another module.

>>> @app.task(name='tasks.add')
>>> def add(x, y):
...     return x + y

You can tell the name of the task by investigating its .name attribute:

>>> add.name
'tasks.add'

The name we specified here (tasks.add) is exactly the name that would've been automatically generated for us if the task was defined in a module named tasks.py:

tasks.py:

@app.task
def add(x, y):
    return x + y

>>> from tasks import add
>>> add.name
'tasks.add'

备注

You can use the inspect command in a worker to view the names of all registered tasks. See the inspect registered command in the 命令行管理实用程序（inspect/control） section of the User Guide.

更改自动命名行为¶

Changing the automatic naming behavior

Added in version 4.0.

中文

在某些情况下，默认的自动命名方式并不适用。设想你有许多任务分别定义在多个模块中:

project/
       /__init__.py
       /celery.py
       /moduleA/
               /__init__.py
               /tasks.py
       /moduleB/
               /__init__.py
               /tasks.py

使用默认自动命名时，每个任务会生成如下名称： moduleA.tasks.taskA、moduleA.tasks.taskB、moduleB.tasks.test，等等。你可能不希望所有任务名称中都包含 tasks。如上所述，你可以显式为所有任务指定名称，或者你也可以通过重写 app.gen_task_name() 来更改自动命名行为。继续以上示例，celery.py 中可能包含：

from celery import Celery

class MyCelery(Celery):

    def gen_task_name(self, name, module):
        if module.endswith('.tasks'):
            module = module[:-6]
        return super().gen_task_name(name, module)

app = MyCelery('main')

这样，每个任务将拥有名称如 moduleA.taskA、moduleA.taskB 和 moduleB.test。

警告

请确保你的 app.gen_task_name() 是纯函数：即对于相同的输入，必须始终返回相同的输出。

英文

There are some cases when the default automatic naming isn't suitable. Consider having many tasks within many different modules:

project/
       /__init__.py
       /celery.py
       /moduleA/
               /__init__.py
               /tasks.py
       /moduleB/
               /__init__.py
               /tasks.py

Using the default automatic naming, each task will have a generated name like moduleA.tasks.taskA, moduleA.tasks.taskB, moduleB.tasks.test, and so on. You may want to get rid of having tasks in all task names. As pointed above, you can explicitly give names for all tasks, or you can change the automatic naming behavior by overriding app.gen_task_name(). Continuing with the example, celery.py may contain:

from celery import Celery

class MyCelery(Celery):

    def gen_task_name(self, name, module):
        if module.endswith('.tasks'):
            module = module[:-6]
        return super().gen_task_name(name, module)

app = MyCelery('main')

So each task will have a name like moduleA.taskA, moduleA.taskB and moduleB.test.

警告

Make sure that your app.gen_task_name() is a pure function: meaning that for the same input it must always return the same output.

任务请求¶

Task Request

中文

app.Task.request 包含与当前执行的任务相关的信息与状态。

该请求对象定义了以下属性：

id:: 当前执行任务的唯一 ID。
group:: 如果任务属于某个 group，则为该组的唯一 ID。
chord:: 如果任务是某个 chord 的一部分（处于 header 中），则为该 chord 的唯一 ID。
correlation_id:: 用于去重等用途的自定义 ID。
args:: 位置参数。
kwargs:: 关键字参数。
origin:: 发送该任务的主机名。
retries:: 当前任务已重试的次数，整数，初始值为 0。
is_eager:: 若任务是在客户端本地执行而非由 worker 执行，则为 True。
eta:: 任务的原始 ETA（若有）。为 UTC 时间（受 enable_utc 设置影响）。
expires:: 任务的原始过期时间（若有）。为 UTC 时间（受 enable_utc 设置影响）。
hostname:: 正在执行该任务的 worker 实例的节点名称。
delivery_info:: 其他消息投递信息。为一个映射，包含用于投递该任务的 exchange 和 routing key。例如 app.Task.retry() 使用该信息将任务重新发送到相同的队列。该字典中的键是否可用取决于所使用的消息代理。
reply-to:: 回复应发送到的队列名称（例如用于 RPC 结果后端）。
called_directly:: 如果任务不是由 worker 执行的，此标志将被设置为 True。
timelimit:: 当前任务的 (soft, hard) 时间限制（如果有）。
callbacks:: 任务成功返回时要调用的签名（signature）列表。
errbacks:: 任务失败时要调用的签名列表。
utc:: 若调用者启用了 UTC，则为 True（参见 enable_utc）。

Added in version 3.1.

headers:: 附带此任务消息发送的消息头映射（可能为 None）。
reply_to:: 要将回复发送到的位置（队列名称）。
correlation_id:: 通常与任务 ID 相同，在 amqp 中常用于追踪回复对应的请求。

Added in version 4.0.

root_id:: 当前任务所属工作流中的第一个任务的唯一 ID（如果存在）。
parent_id:: 调用当前任务的父任务的唯一 ID（如果存在）。
chain:: 构成任务链的任务列表（反转顺序，如果存在）。该列表中的最后一个元素是当前任务完成后要接着执行的任务。如果使用的是任务协议的第一版，链式任务将存在于 request.callbacks 中。

Added in version 5.2.

properties:: 接收该任务消息时附带的消息属性映射（可能为 None 或 {}）
replaced_task_nesting:: 当前任务被替换的嵌套层级数（如果有，可能为 0）

英文

app.Task.request contains information and state related to the currently executing task.

The request defines the following attributes:

id:: The unique id of the executing task.
group:: The unique id of the task's group, if this task is a member.
chord:: The unique id of the chord this task belongs to (if the task is part of the header).
correlation_id:: Custom ID used for things like de-duplication.
args:: Positional arguments.
kwargs:: Keyword arguments.
origin:: Name of host that sent this task.
retries:: How many times the current task has been retried. An integer starting at 0.
is_eager:: Set to True if the task is executed locally in the client, not by a worker.
eta:: The original ETA of the task (if any). This is in UTC time (depending on the enable_utc setting).
expires:: The original expiry time of the task (if any). This is in UTC time (depending on the enable_utc setting).
hostname:: Node name of the worker instance executing the task.
delivery_info:: Additional message delivery information. This is a mapping containing the exchange and routing key used to deliver this task. Used by for example app.Task.retry() to resend the task to the same destination queue. Availability of keys in this dict depends on the message broker used.
reply-to:: Name of queue to send replies back to (used with RPC result backend for example).
called_directly:: This flag is set to true if the task wasn't executed by the worker.
timelimit:: A tuple of the current (soft, hard) time limits active for this task (if any).
callbacks:: A list of signatures to be called if this task returns successfully.
errbacks:: A list of signatures to be called if this task fails.
utc:: Set to true the caller has UTC enabled (enable_utc).

Added in version 3.1.

headers:: Mapping of message headers sent with this task message (may be None).
reply_to:: Where to send reply to (queue name).
correlation_id:: Usually the same as the task id, often used in amqp to keep track of what a reply is for.

Added in version 4.0.

root_id:: The unique id of the first task in the workflow this task is part of (if any).
parent_id:: The unique id of the task that called this task (if any).
chain:: Reversed list of tasks that form a chain (if any). The last item in this list will be the next task to succeed the current task. If using version one of the task protocol the chain tasks will be in request.callbacks instead.

Added in version 5.2.

properties:: Mapping of message properties received with this task message (may be None or {})
replaced_task_nesting:: How many times the task was replaced, if at all. (may be 0)

示例¶

Example

中文

一个访问上下文信息的任务示例如下：

@app.task(bind=True)
def dump_context(self, x, y):
    print('Executing task id {0.id}, args: {0.args!r} kwargs: {0.kwargs!r}'.format(
            self.request))

bind 参数表示该函数将成为一个“绑定方法”，因此你可以访问任务类型实例上的属性和方法。

英文

An example task accessing information in the context is:

@app.task(bind=True)
def dump_context(self, x, y):
    print('Executing task id {0.id}, args: {0.args!r} kwargs: {0.kwargs!r}'.format(
            self.request))

The bind argument means that the function will be a "bound method" so that you can access attributes and methods on the task type instance.

日志记录¶

Logging

中文

Worker 会自动为你配置日志记录，当然你也可以手动配置。

Celery 提供了一个名为 "celery.task" 的特殊日志记录器，你可以继承该日志记录器来自动在日志中包含任务名和唯一 ID。

最佳实践是在模块顶部为所有任务创建一个通用日志记录器：

from celery.utils.log import get_task_logger

logger = get_task_logger(__name__)

@app.task
def add(x, y):
    logger.info('Adding {0} + {1}'.format(x, y))
    return x + y

Celery 使用的是标准的 Python 日志库，相关文档可参见 这里。

你也可以使用 print()，因为写入标准输出/错误的内容会被重定向到日志系统（你可以禁用此行为，参见 worker_redirect_stdouts）。

备注

如果你在任务或任务模块中创建日志记录器实例，worker 不会更新输出重定向设置。

如果你希望将 sys.stdout 和 sys.stderr 重定向到自定义日志记录器，你需要手动启用，例如：

import sys

logger = get_task_logger(__name__)

@app.task(bind=True)
def add(self, x, y):
    old_outs = sys.stdout, sys.stderr
    rlevel = self.app.conf.worker_redirect_stdouts_level
    try:
        self.app.log.redirect_stdouts_to_logger(logger, rlevel)
        print('Adding {0} + {1}'.format(x, y))
        return x + y
    finally:
        sys.stdout, sys.stderr = old_outs

备注

如果你需要的某个 Celery 日志记录器没有输出日志，应该检查该记录器是否正确传播（propagate）。以下示例启用了 "celery.app.trace"，以便输出 "succeeded in" 之类的日志：

import celery
import logging

@celery.signals.after_setup_logger.connect
def on_after_setup_logger(**kwargs):
    logger = logging.getLogger('celery')
    logger.propagate = True
    logger = logging.getLogger('celery.app.trace')
    logger.propagate = True

备注

如果你希望完全禁用 Celery 的日志配置，可以使用 setup_logging 信号：

import celery

@celery.signals.setup_logging.connect
def on_setup_logging(**kwargs):
    pass

英文

The worker will automatically set up logging for you, or you can configure logging manually.

A special logger is available named "celery.task", you can inherit from this logger to automatically get the task name and unique id as part of the logs.

The best practice is to create a common logger for all of your tasks at the top of your module:

from celery.utils.log import get_task_logger

logger = get_task_logger(__name__)

@app.task
def add(x, y):
    logger.info('Adding {0} + {1}'.format(x, y))
    return x + y

Celery uses the standard Python logger library, and the documentation can be found here.

You can also use print(), as anything written to standard out/-err will be redirected to the logging system (you can disable this, see worker_redirect_stdouts).

备注

The worker won't update the redirection if you create a logger instance somewhere in your task or task module.

If you want to redirect sys.stdout and sys.stderr to a custom logger you have to enable this manually, for example:

import sys

logger = get_task_logger(__name__)

@app.task(bind=True)
def add(self, x, y):
    old_outs = sys.stdout, sys.stderr
    rlevel = self.app.conf.worker_redirect_stdouts_level
    try:
        self.app.log.redirect_stdouts_to_logger(logger, rlevel)
        print('Adding {0} + {1}'.format(x, y))
        return x + y
    finally:
        sys.stdout, sys.stderr = old_outs

备注

If a specific Celery logger you need is not emitting logs, you should check that the logger is propagating properly. In this example "celery.app.trace" is enabled so that "succeeded in" logs are emitted:

import celery
import logging

@celery.signals.after_setup_logger.connect
def on_after_setup_logger(**kwargs):
    logger = logging.getLogger('celery')
    logger.propagate = True
    logger = logging.getLogger('celery.app.trace')
    logger.propagate = True

备注

If you want to completely disable Celery logging configuration, use the setup_logging signal:

import celery

@celery.signals.setup_logging.connect
def on_setup_logging(**kwargs):
    pass

参数检查¶

Argument checking

中文

Added in version 4.0.

Celery 在你调用任务时会验证传入的参数，就像 Python 调用普通函数一样：

>>> @app.task
... def add(x, y):
...     return x + y

# 使用两个参数调用任务是可行的：
>>> add.delay(8, 8)
<AsyncResult: f59d71ca-1549-43e0-be41-4e8821a83c0c>

# 仅使用一个参数调用任务则会报错：
>>> add.delay(8)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "celery/app/task.py", line 376, in delay
    return self.apply_async(args, kwargs)
File "celery/app/task.py", line 485, in apply_async
    check_arguments(*(args or ()), **(kwargs or {}))
TypeError: add() takes exactly 2 arguments (1 given)

你可以通过将任务的 typing 属性设置为 False 来禁用参数检查：

>>> @app.task(typing=False)
... def add(x, y):
...     return x + y

# 在本地调用有效，但当 worker 接收到任务时会报错。
>>> add.delay(8)
<AsyncResult: f59d71ca-1549-43e0-be41-4e8821a83c0c>

英文

Added in version 4.0.

Celery will verify the arguments passed when you call the task, just like Python does when calling a normal function:

>>> @app.task
... def add(x, y):
...     return x + y

# Calling the task with two arguments works:
>>> add.delay(8, 8)
<AsyncResult: f59d71ca-1549-43e0-be41-4e8821a83c0c>

# Calling the task with only one argument fails:
>>> add.delay(8)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "celery/app/task.py", line 376, in delay
    return self.apply_async(args, kwargs)
File "celery/app/task.py", line 485, in apply_async
    check_arguments(*(args or ()), **(kwargs or {}))
TypeError: add() takes exactly 2 arguments (1 given)

You can disable the argument checking for any task by setting its typing attribute to False:

>>> @app.task(typing=False)
... def add(x, y):
...     return x + y

# Works locally, but the worker receiving the task will raise an error.
>>> add.delay(8)
<AsyncResult: f59d71ca-1549-43e0-be41-4e8821a83c0c>

在参数中隐藏敏感信息¶

Hiding sensitive information in arguments

中文

Added in version 4.0.

当使用 task_protocol 为 2 或更高版本时（自 4.0 起默认），你可以通过 argsrepr 和 kwargsrepr 调用参数来自定义在日志和监控事件中位置参数和关键字参数的显示方式：

>>> add.apply_async((2, 3), argsrepr='(<secret-x>, <secret-y>)')

>>> charge.s(account, card='1234 5678 1234 5678').set(
...     kwargsrepr=repr({'card': '**** **** **** 5678'})
... ).delay()

警告

如果有人能够从消息代理（broker）读取任务消息，或以其他方式拦截消息，那么敏感信息仍然是可访问的。

因此，如果你的消息中包含敏感信息，建议对其进行加密；以信用卡号为例，实际号码应加密存储在安全存储中，并在任务中检索并解密。

英文

Added in version 4.0.

When using task_protocol 2 or higher (default since 4.0), you can override how positional arguments and keyword arguments are represented in logs and monitoring events using the argsrepr and kwargsrepr calling arguments:

>>> add.apply_async((2, 3), argsrepr='(<secret-x>, <secret-y>)')

>>> charge.s(account, card='1234 5678 1234 5678').set(
...     kwargsrepr=repr({'card': '**** **** **** 5678'})
... ).delay()

警告

Sensitive information will still be accessible to anyone able to read your task message from the broker, or otherwise able intercept it.

For this reason you should probably encrypt your message if it contains sensitive information, or in this example with a credit card number the actual number could be stored encrypted in a secure store that you retrieve and decrypt in the task itself.

重试¶

Retrying

中文

可以使用 app.Task.retry() 方法重新执行任务，例如在遇到可恢复的错误时。

调用 retry 时会发送一条新消息，使用相同的 task-id，并确保该消息被投递到与原始任务相同的队列中。

任务重试也会被记录为一种任务状态，因此你可以通过结果实例跟踪任务进度（参见状态）。

下面是一个使用 retry 的示例：

@app.task(bind=True)
def send_twitter_status(self, oauth, tweet):
    try:
        twitter = Twitter(oauth)
        twitter.update_status(tweet)
    except (Twitter.FailWhaleError, Twitter.LoginError) as exc:
        raise self.retry(exc=exc)

备注

app.Task.retry() 的调用会抛出一个异常，因此其后的代码不会被执行。该异常为 Retry，它不会被视为错误，而是一个“半谓词”，表示该任务应被重试，以便在启用了结果后端时记录正确的状态。

这是正常行为，除非将 throw 参数设置为 False，否则总会发生此行为。

task 装饰器中的 bind 参数会使任务获得对 self （任务类型实例）的访问权限。

exc 参数用于传递异常信息，该信息会被用于日志记录和存储任务结果。异常及其回溯信息将在任务状态中可用（如果启用了结果后端）。

如果任务定义了 max_retries 值，在超过最大重试次数时会重新抛出当前异常，但在以下两种情况下不会抛出原始异常：

未提供 exc 参数：

在这种情况下，将抛出 MaxRetriesExceededError 异常。
当前没有异常可用：

如果没有原始异常可以重新抛出，则会使用 exc 参数作为替代，例如：
```
self.retry(exc=Twitter.LoginError())
```
将会抛出提供的 exc 参数所指定的异常。

英文

app.Task.retry() can be used to re-execute the task, for example in the event of recoverable errors.

When you call retry it'll send a new message, using the same task-id, and it'll take care to make sure the message is delivered to the same queue as the originating task.

When a task is retried this is also recorded as a task state, so that you can track the progress of the task using the result instance (see 状态).

Here's an example using retry:

@app.task(bind=True)
def send_twitter_status(self, oauth, tweet):
    try:
        twitter = Twitter(oauth)
        twitter.update_status(tweet)
    except (Twitter.FailWhaleError, Twitter.LoginError) as exc:
        raise self.retry(exc=exc)

备注

The app.Task.retry() call will raise an exception so any code after the retry won't be reached. This is the Retry exception, it isn't handled as an error but rather as a semi-predicate to signify to the worker that the task is to be retried, so that it can store the correct state when a result backend is enabled.

This is normal operation and always happens unless the throw argument to retry is set to False.

The bind argument to the task decorator will give access to self (the task type instance).

The exc argument is used to pass exception information that's used in logs, and when storing task results. Both the exception and the traceback will be available in the task state (if a result backend is enabled).

If the task has a max_retries value the current exception will be re-raised if the max number of retries has been exceeded, but this won't happen if:

An exc argument wasn't given.

In this case the MaxRetriesExceededError exception will be raised.
There's no current exception

If there's no original exception to re-raise the exc argument will be used instead, so:
```
self.retry(exc=Twitter.LoginError())
```
will raise the exc argument given.

使用自定义重试延迟¶

Using a custom retry delay

中文

当任务被重试时，可以在重试前等待一段时间，默认的延迟时间由 default_retry_delay 属性定义。默认值为 3 分钟。请注意，该值的单位为秒（int 或 float）。

你也可以通过在调用 retry() 时传入 countdown 参数来覆盖默认值。

@app.task(bind=True, default_retry_delay=30 * 60)  # 30 分钟后重试。
def add(self, x, y):
    try:
        something_raising()
    except Exception as exc:
        # 覆盖默认延迟时间，设置为 1 分钟后重试
        raise self.retry(exc=exc, countdown=60)

英文

When a task is to be retried, it can wait for a given amount of time before doing so, and the default delay is defined by the default_retry_delay attribute. By default this is set to 3 minutes. Note that the unit for setting the delay is in seconds (int or float).

You can also provide the countdown argument to retry() to override this default.

@app.task(bind=True, default_retry_delay=30 * 60)  # retry in 30 minutes.
def add(self, x, y):
    try:
        something_raising()
    except Exception as exc:
        # overrides the default delay to retry after 1 minute
        raise self.retry(exc=exc, countdown=60)

已知异常的自动重试¶

Automatic retry for known exceptions

中文

有时候你可能希望在遇到某个特定异常时自动重试任务。

幸运的是，你可以通过 app.task() 装饰器中的 autoretry_for 参数告诉 Celery 自动重试任务：

from twitter.exceptions import FailWhaleError

@app.task(autoretry_for=(FailWhaleError,))
def refresh_timeline(user):
    return twitter.refresh_timeline(user)

如果你希望为内部的 retry() 调用指定自定义参数，可以通过 app.task() 装饰器传递 retry_kwargs 参数：

@app.task(autoretry_for=(FailWhaleError,),
        retry_kwargs={'max_retries': 5})
def refresh_timeline(user):
    return twitter.refresh_timeline(user)

这是一种替代手动处理异常的方法。上述示例等价于将任务主体包裹在 try ... except 语句中：

@app.task
def refresh_timeline(user):
    try:
        twitter.refresh_timeline(user)
    except FailWhaleError as exc:
        raise refresh_timeline.retry(exc=exc, max_retries=5)

如果你希望在任意错误发生时都自动重试，可以这样写：

@app.task(autoretry_for=(Exception,))
def x():
    ...

Added in version 4.2.

如果你的任务依赖其他服务，例如向某个 API 发送请求，那么使用指数退避机制是个不错的选择，可以避免请求过多压垮服务。幸运的是，Celery 的自动重试机制对此提供了良好支持。只需指定 retry_backoff 参数即可，例如：

from requests.exceptions import RequestException

@app.task(autoretry_for=(RequestException,), retry_backoff=True)
def x():
    ...

默认情况下，该指数退避机制还会引入随机抖动（jitter），以避免所有任务同时运行。最大的退避延迟默认设置为 10 分钟。你可以根据下方文档说明对这些行为进行自定义。

Added in version 4.4.

你也可以在基于类的任务中设置 autoretry_for、 max_retries、 retry_backoff、 retry_backoff_max 和 retry_jitter 等选项：

class BaseTaskWithRetry(Task):
    autoretry_for = (TypeError,)
    max_retries = 5
    retry_backoff = True
    retry_backoff_max = 700
    retry_jitter = False

Task.autoretry_for¶: 异常类组成的列表或元组。如果任务在执行期间抛出了这些异常之一，则会自动重试。默认情况下，不会对任何异常进行自动重试。

Task.max_retries¶: 一个整数。表示放弃前的最大重试次数。设置为 None 表示无限重试。默认值为 3。

Task.retry_backoff¶: 布尔值或整数。如果设置为 True，自动重试将遵循指数退避机制的规则。第一次重试延迟 1 秒，第二次延迟 2 秒，第三次延迟 4 秒，第四次延迟 8 秒，以此类推。（但该延迟值可能会受到 retry_jitter 启用时的影响。）如果设置为一个整数，则该值将作为延迟因子使用。例如，若设为 3，第一次重试延迟 3 秒，第二次为 6 秒，第三次为 12 秒，第四次为 24 秒，依此类推。默认值为 False，即不启用重试延迟。

Task.retry_backoff_max¶: 一个整数。如果启用了 retry_backoff，则该选项用于设置任务自动重试之间的最大延迟时间（单位：秒）。默认值为 600 （即 10 分钟）。

Task.retry_jitter¶: 布尔值。抖动（jitter）用于在指数退避延迟中引入随机性，以避免队列中的所有任务同时被调度执行。如果设置为 True，则由 retry_backoff 计算出的延迟值将被视为最大值，实际延迟时间将在 0 到该最大值之间随机选取。默认值为 True。

Added in version 5.3.0.

Task.dont_autoretry_for¶: 异常类组成的列表或元组。指定的异常将不会触发自动重试。这允许你从 autoretry_for 中排除某些不希望自动重试的异常。

英文

Added in version 4.0.

Sometimes you just want to retry a task whenever a particular exception is raised.

Fortunately, you can tell Celery to automatically retry a task using autoretry_for argument in the app.task() decorator:

from twitter.exceptions import FailWhaleError

@app.task(autoretry_for=(FailWhaleError,))
def refresh_timeline(user):
    return twitter.refresh_timeline(user)

If you want to specify custom arguments for an internal retry() call, pass retry_kwargs argument to app.task() decorator:

@app.task(autoretry_for=(FailWhaleError,),
          retry_kwargs={'max_retries': 5})
def refresh_timeline(user):
    return twitter.refresh_timeline(user)

This is provided as an alternative to manually handling the exceptions, and the example above will do the same as wrapping the task body in a try ... except statement:

@app.task
def refresh_timeline(user):
    try:
        twitter.refresh_timeline(user)
    except FailWhaleError as exc:
        raise refresh_timeline.retry(exc=exc, max_retries=5)

If you want to automatically retry on any error, simply use:

@app.task(autoretry_for=(Exception,))
def x():
    ...

Added in version 4.2.

If your tasks depend on another service, like making a request to an API, then it's a good idea to use exponential backoff to avoid overwhelming the service with your requests. Fortunately, Celery's automatic retry support makes it easy. Just specify the retry_backoff argument, like this:

from requests.exceptions import RequestException

@app.task(autoretry_for=(RequestException,), retry_backoff=True)
def x():
    ...

By default, this exponential backoff will also introduce random jitter to avoid having all the tasks run at the same moment. It will also cap the maximum backoff delay to 10 minutes. All these settings can be customized via options documented below.

Added in version 4.4.

You can also set autoretry_for, max_retries, retry_backoff, retry_backoff_max and retry_jitter options in class-based tasks:

class BaseTaskWithRetry(Task):
    autoretry_for = (TypeError,)
    max_retries = 5
    retry_backoff = True
    retry_backoff_max = 700
    retry_jitter = False

Task.autoretry_for: A list/tuple of exception classes. If any of these exceptions are raised during the execution of the task, the task will automatically be retried. By default, no exceptions will be autoretried.

Task.max_retries: A number. Maximum number of retries before giving up. A value of None means task will retry forever. By default, this option is set to 3.

Task.retry_backoff: A boolean, or a number. If this option is set to True, autoretries will be delayed following the rules of exponential backoff. The first retry will have a delay of 1 second, the second retry will have a delay of 2 seconds, the third will delay 4 seconds, the fourth will delay 8 seconds, and so on. (However, this delay value is modified by retry_jitter, if it is enabled.) If this option is set to a number, it is used as a delay factor. For example, if this option is set to 3, the first retry will delay 3 seconds, the second will delay 6 seconds, the third will delay 12 seconds, the fourth will delay 24 seconds, and so on. By default, this option is set to False, and autoretries will not be delayed.

Task.retry_backoff_max: A number. If retry_backoff is enabled, this option will set a maximum delay in seconds between task autoretries. By default, this option is set to 600, which is 10 minutes.

Task.retry_jitter: A boolean. Jitter is used to introduce randomness into exponential backoff delays, to prevent all tasks in the queue from being executed simultaneously. If this option is set to True, the delay value calculated by retry_backoff is treated as a maximum, and the actual delay value will be a random number between zero and that maximum. By default, this option is set to True.

Added in version 5.3.0.

Task.dont_autoretry_for: A list/tuple of exception classes. These exceptions won't be autoretried. This allows to exclude some exceptions that match autoretry_for but for which you don't want a retry.

选项列表¶

List of Options

中文

任务装饰器可以接受多个选项，用以改变任务的行为。例如，你可以使用 rate_limit 选项来设置任务的速率限制。

传递给任务装饰器的任意关键字参数，实际上都会被设为最终任务类的属性，以下是内置属性的列表。

英文

The task decorator can take a number of options that change the way the task behaves, for example you can set the rate limit for a task using the rate_limit option.

Any keyword argument passed to the task decorator will actually be set as an attribute of the resulting task class, and this is a list of the built-in attributes.

常规¶

General

中文

Task.name¶

任务的注册名称。

你可以手动设置该名称，或者默认会使用模块名和类名自动生成。

另见名称。

Task.request¶

如果任务正在被执行，该属性将包含当前请求的信息。该信息使用线程本地存储。

参见任务请求。

Task.max_retries¶

仅在任务调用 self.retry 或装饰器使用了 autoretry_for 参数时适用。

表示在放弃之前最多允许重试的次数。如果重试次数超过此值，则会引发 MaxRetriesExceededError 异常。

备注

必须手动调用 retry()，它不会在异常发生时自动重试。

默认值为 3。若设置为 None，则表示禁用重试限制，任务将无限重试直到成功。

Task.throws¶

一个可选的异常类型元组，表示这些错误类型不应视为真正的错误。

属于该列表中的错误仍会被报告为任务失败（发送到结果后端），但 worker 不会将其记录为错误，也不会包含 traceback。

示例：

@task(throws=(KeyError, HttpNotFound)):
def get_foo():
    something()

错误类型行为如下：

预期错误（出现在 Task.throws 中）

使用 INFO 等级记录日志，不包含 traceback。
未预期错误

使用 ERROR 等级记录日志，并包含 traceback。

Task.default_retry_delay¶: 重试前的默认等待时间（以秒为单位）。可以是 int 或 float。默认值为三分钟。

Task.rate_limit¶

设置该任务类型的速率限制（即在特定时间段内允许运行的任务数量）。即使启用了速率限制，任务仍然会完成，只是可能会被延迟启动。

如果设置为 None，则不启用速率限制。如果设置为整数或浮点数，则被解释为「每秒任务数」。

可通过附加 "/s"、"/m" 或 "/h" 来指定速率单位为秒、分钟或小时。任务将在指定时间范围内均匀分布。

示例： "100/m" （每分钟最多 100 个任务）。这将强制两个任务之间至少间隔 600 毫秒。

默认值取自 task_default_rate_limit 设置项：若未设置，则默认不启用任务速率限制。

注意，该限制是 每个 worker 实例 的限制，而非全局限制。若需实施全局速率限制（例如 API 请求的最大频率），应限制任务到特定队列中。

Task.time_limit¶: 该任务的强制时间限制（以秒为单位）。若未设置，则使用 worker 的默认值。

Task.soft_time_limit¶: 该任务的软性时间限制。若未设置，则使用 worker 的默认值。

Task.ignore_result¶

不存储任务状态。请注意，这意味着你无法使用 AsyncResult 来检查任务是否完成，或获取其返回值。

注意：禁用任务结果存储会影响某些功能的使用。更多详情请查阅 Canvas 文档。

Task.store_errors_even_if_ignored¶: 若设置为 True，即使任务配置为忽略结果，错误也会被记录。

Task.serializer¶

指定默认使用的序列化方法（字符串形式）。默认为 task_serializer 设置项的值。可选值包括 pickle、json、yaml，或使用 kombu.serialization.registry 注册的自定义序列化方法。

更多信息请参见序列化器。

Task.compression¶

指定默认使用的压缩方案（字符串形式）。

默认为 task_compression 设置项的值。可选值包括 gzip、bzip2，或使用 kombu.compression 注册的自定义压缩方法。

更多信息请参见压缩。

Task.backend¶: 指定该任务使用的结果存储后端。应为 celery.backends 中某个后端类的实例。默认使用 app.backend，由 result_backend 设置项定义。

Task.acks_late¶

若设置为 True，该任务的消息将在任务执行**之后**被确认；而默认行为是在任务开始执行前确认消息。

注意：这意味着如果 worker 在执行中崩溃，任务可能会被多次执行。因此确保你的任务是 idempotent 的至关重要。

全局默认值可通过 task_acks_late 设置项覆盖。

Task.track_started¶

若设置为 True，任务在被 worker 执行时会报告其状态为 "started"。默认值为 False，即任务状态仅包括 pending、finished 或 waiting-to-retry。对于长时间运行的任务，该状态有助于报告当前运行的任务。

执行该任务的 worker 的主机名与进程 ID 会包含在状态元数据中（如：result.info['pid']）。

全局默认值可通过 task_track_started 设置项覆盖。

参见

Task 的 API 参考文档。

英文

Task.name: The name the task is registered as.

You can set this name manually, or a name will be automatically generated using the module and class name.

See also 名称.

Task.request: If the task is being executed this will contain information about the current request. Thread local storage is used.

See 任务请求.

Task.max_retries: Only applies if the task calls self.retry or if the task is decorated with the autoretry_for argument.

The maximum number of attempted retries before giving up. If the number of retries exceeds this value a MaxRetriesExceededError exception will be raised.

备注

You have to call retry() manually, as it won't automatically retry on exception..

The default is 3. A value of None will disable the retry limit and the task will retry forever until it succeeds.

Task.throws

Optional tuple of expected error classes that shouldn't be regarded as an actual error.

Errors in this list will be reported as a failure to the result backend, but the worker won't log the event as an error, and no traceback will be included.

Example:
@task(throws=(KeyError, HttpNotFound)):
def get_foo():
    something()
Error types:

Expected errors (in Task.throws)

Logged with severity INFO, traceback excluded.

Unexpected errors

Logged with severity ERROR, with traceback included.

Task.default_retry_delay: Default time in seconds before a retry of the task should be executed. Can be either int or float. Default is a three minute delay.

Task.rate_limit: Set the rate limit for this task type (limits the number of tasks that can be run in a given time frame). Tasks will still complete when a rate limit is in effect, but it may take some time before it's allowed to start.

If this is None no rate limit is in effect. If it is an integer or float, it is interpreted as "tasks per second".

The rate limits can be specified in seconds, minutes or hours by appending "/s", "/m" or "/h" to the value. Tasks will be evenly distributed over the specified time frame.

Example: "100/m" (hundred tasks a minute). This will enforce a minimum delay of 600ms between starting two tasks on the same worker instance.

Default is the task_default_rate_limit setting: if not specified means rate limiting for tasks is disabled by default.

Note that this is a per worker instance rate limit, and not a global rate limit. To enforce a global rate limit (e.g., for an API with a maximum number of requests per second), you must restrict to a given queue.

Task.time_limit: The hard time limit, in seconds, for this task. When not set the workers default is used.

Task.soft_time_limit: The soft time limit for this task. When not set the workers default is used.

Task.ignore_result: Don't store task state. Note that this means you can't use AsyncResult to check if the task is ready, or get its return value.

Note: Certain features will not work if task results are disabled. For more details check the Canvas documentation.

Task.store_errors_even_if_ignored: If True, errors will be stored even if the task is configured to ignore results.

Task.serializer: A string identifying the default serialization method to use. Defaults to the task_serializer setting. Can be pickle, json, yaml, or any custom serialization methods that have been registered with kombu.serialization.registry.

Please see 序列化器 for more information.

Task.compression: A string identifying the default compression scheme to use.

Defaults to the task_compression setting. Can be gzip, or bzip2, or any custom compression schemes that have been registered with the kombu.compression registry.

Please see 压缩 for more information.

Task.backend: The result store backend to use for this task. An instance of one of the backend classes in celery.backends. Defaults to app.backend, defined by the result_backend setting.

Task.acks_late: If set to True messages for this task will be acknowledged after the task has been executed, not just before (the default behavior).

Note: This means the task may be executed multiple times should the worker crash in the middle of execution. Make sure your tasks are idempotent.

The global default can be overridden by the task_acks_late setting.

Task.track_started: If True the task will report its status as "started" when the task is executed by a worker. The default value is False as the normal behavior is to not report that level of granularity. Tasks are either pending, finished, or waiting to be retried. Having a "started" status can be useful for when there are long running tasks and there's a need to report what task is currently running.

The host name and process id of the worker executing the task will be available in the state meta-data (e.g., result.info['pid'])

The global default can be overridden by the task_track_started setting.

参见

The API reference for Task.

任务半谓词¶

Semipredicates

中文

worker 将任务包装在一个跟踪函数中，用于记录任务的最终状态。可以使用多种异常来通知此函数，以改变其处理任务返回的方式。

译注: Semipredicates

在 Celery 的上下文中，"task semipredicates" 是一个相对专业的术语，其翻译需要结合技术语义和中文表达习惯。以下是针对该术语的详细解析和推荐译法：

术语解析

Semipredicate（半谓词）：
- 计算机科学中的原意：指既返回结果又返回状态（如成功/失败）的函数，例如 C 语言的 fopen() 在失败时返回 NULL 同时设置 errno。
- 在 Celery 中的延伸：可能指任务（task）在执行时既产生返回值，又隐含状态信息（如 SUCCESS、FAILURE 或重试状态）。

Task Semipredicates：

指 Celery 任务设计中同时承载业务逻辑结果和自身执行状态的特性，例如：

@app.task(bind=True)
def my_task(self, x, y):
    try:
        return x / y  # 业务结果
    except ZeroDivisionError:
        self.retry(countdown=60)  # 状态控制

推荐翻译方案

根据上下文可选择以下译法：

英文术语	推荐中文翻译	适用场景
Task Semipredicates	任务半谓词	强调技术实现（学术/底层文档）
	任务状态复合体	强调状态与结果的结合（设计文档）
	双态任务	简洁表达（非正式场合）

使用示例

(1) 技术文档中的翻译

原文：

Celery tasks act as semipredicates, returning both computed values and their own execution status.

译文：

Celery 任务作为任务半谓词，既返回计算结果，又携带自身执行状态。

(2) 设计文档中的翻译

原文：

The semipredicate nature of tasks allows for robust error handling.

译文：

任务的状态复合体特性使其支持健壮的错误处理。

注意事项

一致性：在同一个项目中保持术语翻译统一。
注释说明：首次出现时可添加英文原词和简要解释，例如：

任务半谓词（Task Semipredicates，指同时返回结果和状态的任务）

受众适配：面向开发者可直接用英文术语，面向非技术读者建议意译。

相关术语对照表

英文术语	中文翻译
Task	任务
Predicate Function	谓词函数
Stateful Task	有状态任务
Idempotent Task	幂等任务

通过以上方式，可以准确传达 "task semipredicates" 在 Celery 中的技术内涵。

英文

The worker wraps the task in a tracing function that records the final state of the task. There are a number of exceptions that can be used to signal this function to change how it treats the return of the task.

忽略¶

Ignore

中文

任务可以抛出 Ignore 异常以强制 Worker 忽略该任务。这意味着任务的状态不会被记录，但消息仍会被确认（从队列中移除）。

这可用于实现自定义的类似撤销（revoke）的功能，或者手动存储任务结果。

示例：将撤销的任务保存在 Redis 的集合中：

from celery.exceptions import Ignore

@app.task(bind=True)
def some_task(self):
    if redis.ismember('tasks.revoked', self.request.id):
        raise Ignore()

示例：手动存储任务结果：

from celery import states
from celery.exceptions import Ignore

@app.task(bind=True)
def get_tweets(self, user):
    timeline = twitter.get_timeline(user)
    if not self.request.called_directly:
        self.update_state(state=states.SUCCESS, meta=timeline)
    raise Ignore()

英文

The task may raise Ignore to force the worker to ignore the task. This means that no state will be recorded for the task, but the message is still acknowledged (removed from queue).

This can be used if you want to implement custom revoke-like functionality, or manually store the result of a task.

Example keeping revoked tasks in a Redis set:

from celery.exceptions import Ignore

@app.task(bind=True)
def some_task(self):
    if redis.ismember('tasks.revoked', self.request.id):
        raise Ignore()

Example that stores results manually:

from celery import states
from celery.exceptions import Ignore

@app.task(bind=True)
def get_tweets(self, user):
    timeline = twitter.get_timeline(user)
    if not self.request.called_directly:
        self.update_state(state=states.SUCCESS, meta=timeline)
    raise Ignore()

拒绝¶

Reject

中文

任务可以抛出 Reject 异常，以使用 AMQP 的 basic_reject 方法拒绝任务消息。除非启用了 Task.acks_late，否则此操作不会产生任何效果。

拒绝一条消息的效果与确认（ack）类似，但某些消息代理还可能实现了其他功能可供利用。例如，RabbitMQ 支持死信交换器（Dead Letter Exchanges）的概念，可以将队列配置为使用一个死信交换器，拒绝的消息会被重新投递到该交换器。

拒绝还可以用于重新排队消息，但请务必谨慎使用，因为这可能很容易导致无限的消息循环。

示例：当任务导致内存溢出时使用 reject：

import errno
from celery.exceptions import Reject

@app.task(bind=True, acks_late=True)
def render_scene(self, path):
    file = get_file(path)
    try:
        renderer.render_scene(file)

    # 如果文件太大无法载入内存，
    # 则拒绝该任务，使其被投递到死信交换器，
    # 以便我们手动检查具体情况。
    except MemoryError as exc:
        raise Reject(exc, requeue=False)
    except OSError as exc:
        if exc.errno == errno.ENOMEM:
            raise Reject(exc, requeue=False)

    # 对于其他错误，10 秒后重试。
    except Exception as exc:
        raise self.retry(exc, countdown=10)

示例：重新排队消息：

from celery.exceptions import Reject

@app.task(bind=True, acks_late=True)
def requeues(self):
    if not self.request.delivery_info['redelivered']:
        raise Reject('no reason', requeue=True)
    print('received two times')

有关 basic_reject 方法的更多信息，请参阅所使用消息代理的文档。

英文

The task may raise Reject to reject the task message using AMQPs basic_reject method. This won't have any effect unless Task.acks_late is enabled.

Rejecting a message has the same effect as acking it, but some brokers may implement additional functionality that can be used. For example RabbitMQ supports the concept of Dead Letter Exchanges where a queue can be configured to use a dead letter exchange that rejected messages are redelivered to.

Reject can also be used to re-queue messages, but please be very careful when using this as it can easily result in an infinite message loop.

Example using reject when a task causes an out of memory condition:

import errno
from celery.exceptions import Reject

@app.task(bind=True, acks_late=True)
def render_scene(self, path):
    file = get_file(path)
    try:
        renderer.render_scene(file)

    # if the file is too big to fit in memory
    # we reject it so that it's redelivered to the dead letter exchange
    # and we can manually inspect the situation.
    except MemoryError as exc:
        raise Reject(exc, requeue=False)
    except OSError as exc:
        if exc.errno == errno.ENOMEM:
            raise Reject(exc, requeue=False)

    # For any other error we retry after 10 seconds.
    except Exception as exc:
        raise self.retry(exc, countdown=10)

Example re-queuing the message:

from celery.exceptions import Reject

@app.task(bind=True, acks_late=True)
def requeues(self):
    if not self.request.delivery_info['redelivered']:
        raise Reject('no reason', requeue=True)
    print('received two times')

Consult your broker documentation for more details about the basic_reject method.

重试¶

Retry

中文

Retry 异常由 Task.retry 方法抛出，用于告知 Worker 该任务正在被重试。

英文

The Retry exception is raised by the Task.retry method to tell the worker that the task is being retried.

工作原理¶

How it works

中文

以下是技术细节。虽然这部分内容不是你必须了解的，但你可能会感兴趣。

所有定义的任务都会列在一个注册表中。注册表包含任务名称和对应的任务类的列表。你可以自己检查这个注册表：

>>> from proj.celery import app
>>> app.tasks
{'celery.chord_unlock':
    <@task: celery.chord_unlock>,
'celery.backend_cleanup':
    <@task: celery.backend_cleanup>,
'celery.chord':
    <@task: celery.chord>}

这是 Celery 内置任务的列表。请注意，只有在定义任务的模块被导入时，这些任务才会被注册。

默认加载器会导入 imports 配置项中列出的所有模块。

app.task() 装饰器负责将你的任务注册到应用程序的任务注册表中。

当任务被发送时，实际上并不会发送函数代码，只会发送任务名称以便执行。当 Worker 收到消息时，它可以在任务注册表中查找该名称，找到执行代码。

这意味着你的 Worker 应该始终与客户端使用相同的软件版本。这是一个缺点，但替代方案是一个尚未解决的技术挑战。

英文

Here come the technical details. This part isn't something you need to know, but you may be interested.

All defined tasks are listed in a registry. The registry contains a list of task names and their task classes. You can investigate this registry yourself:

>>> from proj.celery import app
>>> app.tasks
{'celery.chord_unlock':
    <@task: celery.chord_unlock>,
'celery.backend_cleanup':
    <@task: celery.backend_cleanup>,
'celery.chord':
    <@task: celery.chord>}

This is the list of tasks built into Celery. Note that tasks will only be registered when the module they're defined in is imported.

The default loader imports any modules listed in the imports setting.

The app.task() decorator is responsible for registering your task in the applications task registry.

When tasks are sent, no actual function code is sent with it, just the name of the task to execute. When the worker then receives the message it can look up the name in its task registry to find the execution code.

This means that your workers should always be updated with the same software as the client. This is a drawback, but the alternative is a technical challenge that's yet to be solved.

技巧和最佳实践¶

Tips and Best Practices

忽略不需要的结果¶

Ignore results you don't want

中文

如果你不关心任务的结果，务必设置 ignore_result 选项，因为存储结果会浪费时间和资源。

@app.task(ignore_result=True)
def mytask():
    something()

也可以使用 task_ignore_result 配置项全局禁用结果存储。

通过在调用 apply_async 时传递 ignore_result 布尔参数，可以在每次执行时启用/禁用结果存储。

@app.task
def mytask(x, y):
    return x + y

# 不会存储结果
result = mytask.apply_async((1, 2), ignore_result=True)
print(result.get()) # -> None

# 会存储结果
result = mytask.apply_async((1, 2), ignore_result=False)
print(result.get()) # -> 3

默认情况下，当配置了结果后端时，任务会*不忽略结果*（ignore_result=False）。

选项优先级顺序如下：

全局 task_ignore_result
ignore_result 选项
任务执行选项 ignore_result

英文

If you don't care about the results of a task, be sure to set the ignore_result option, as storing results wastes time and resources.

@app.task(ignore_result=True)
def mytask():
    something()

Results can even be disabled globally using the task_ignore_result setting.

Results can be enabled/disabled on a per-execution basis, by passing the ignore_result boolean parameter, when calling apply_async.

@app.task
def mytask(x, y):
    return x + y

# No result will be stored
result = mytask.apply_async((1, 2), ignore_result=True)
print(result.get()) # -> None

# Result will be stored
result = mytask.apply_async((1, 2), ignore_result=False)
print(result.get()) # -> 3

By default tasks will not ignore results (ignore_result=False) when a result backend is configured.

The option precedence order is the following:

Global task_ignore_result
ignore_result option
Task execution option ignore_result

避免启动同步子任务¶

Avoid launching synchronous subtasks

中文

让一个任务等待另一个任务的结果是非常低效的，如果工作进程池耗尽，甚至可能导致死锁。

相反，应该使你的设计异步化，例如使用回调。

不推荐的做法：

@app.task
def update_page_info(url):
    page = fetch_page.delay(url).get()
    info = parse_page.delay(page).get()
    store_page_info.delay(url, info)

@app.task
def fetch_page(url):
    return myhttplib.get(url)

@app.task
def parse_page(page):
    return myparser.parse_document(page)

@app.task
def store_page_info(url, info):
    return PageInfo.objects.create(url, info)

推荐的做法：

def update_page_info(url):
    # fetch_page -> parse_page -> store_page
    chain = fetch_page.s(url) | parse_page.s() | store_page_info.s(url)
    chain()

@app.task()
def fetch_page(url):
    return myhttplib.get(url)

@app.task()
def parse_page(page):
    return myparser.parse_document(page)

@app.task(ignore_result=True)
def store_page_info(info, url):
    PageInfo.objects.create(url=url, info=info)

在这里，我通过将不同的 signature() 任务链接起来创建了任务链。你可以在 Canvas: 设计工作流中阅读有关任务链和其他强大构造的信息。

默认情况下，Celery 不允许在任务内同步运行子任务，但在极少数或极端情况下，你可能需要这样做。警告：启用子任务同步执行不推荐！

@app.task
def update_page_info(url):
    page = fetch_page.delay(url).get(disable_sync_subtasks=False)
    info = parse_page.delay(page).get(disable_sync_subtasks=False)
    store_page_info.delay(url, info)

@app.task
def fetch_page(url):
    return myhttplib.get(url)

@app.task
def parse_page(page):
    return myparser.parse_document(page)

@app.task
def store_page_info(url, info):
    return PageInfo.objects.create(url, info)

英文

Having a task wait for the result of another task is really inefficient, and may even cause a deadlock if the worker pool is exhausted.

Make your design asynchronous instead, for example by using callbacks.

Bad:

@app.task
def update_page_info(url):
    page = fetch_page.delay(url).get()
    info = parse_page.delay(page).get()
    store_page_info.delay(url, info)

@app.task
def fetch_page(url):
    return myhttplib.get(url)

@app.task
def parse_page(page):
    return myparser.parse_document(page)

@app.task
def store_page_info(url, info):
    return PageInfo.objects.create(url, info)

Good:

def update_page_info(url):
    # fetch_page -> parse_page -> store_page
    chain = fetch_page.s(url) | parse_page.s() | store_page_info.s(url)
    chain()

@app.task()
def fetch_page(url):
    return myhttplib.get(url)

@app.task()
def parse_page(page):
    return myparser.parse_document(page)

@app.task(ignore_result=True)
def store_page_info(info, url):
    PageInfo.objects.create(url=url, info=info)

Here I instead created a chain of tasks by linking together different signature()'s. You can read about chains and other powerful constructs at Canvas: 设计工作流.

By default Celery will not allow you to run subtasks synchronously within a task, but in rare or extreme cases you might need to do so. WARNING: enabling subtasks to run synchronously is not recommended!

@app.task
def update_page_info(url):
    page = fetch_page.delay(url).get(disable_sync_subtasks=False)
    info = parse_page.delay(page).get(disable_sync_subtasks=False)
    store_page_info.delay(url, info)

@app.task
def fetch_page(url):
    return myhttplib.get(url)

@app.task
def parse_page(page):
    return myparser.parse_document(page)

@app.task
def store_page_info(url, info):
    return PageInfo.objects.create(url, info)

示例¶

Example

中文

让我们看一个实际的例子：一个博客，评论需要进行垃圾邮件过滤。当评论被创建时，垃圾邮件过滤器在后台运行，因此用户无需等待其完成。

我有一个 Django 博客应用，允许对博客文章进行评论。下面是我为这个应用描述的部分模型、视图和任务。

英文

Let's take a real world example: a blog where comments posted need to be filtered for spam. When the comment is created, the spam filter runs in the background, so the user doesn't have to wait for it to finish.

I have a Django blog application allowing comments on blog posts. I'll describe parts of the models/views and tasks for this application.

`blog/models.py`¶

中文

评论模型如下所示：

from django.db import models
from django.utils.translation import ugettext_lazy as _


class Comment(models.Model):
    name = models.CharField(_('name'), max_length=64)
    email_address = models.EmailField(_('email address'))
    homepage = models.URLField(_('home page'),
                            blank=True, verify_exists=False)
    comment = models.TextField(_('comment'))
    pub_date = models.DateTimeField(_('Published date'),
                                    editable=False, auto_add_now=True)
    is_spam = models.BooleanField(_('spam?'),
                                default=False, editable=False)

    class Meta:
        verbose_name = _('comment')
        verbose_name_plural = _('comments')

在发表评论的视图中，我首先将评论写入数据库，然后在后台启动垃圾邮件过滤任务。

英文

The comment model looks like this:

from django.db import models
from django.utils.translation import ugettext_lazy as _


class Comment(models.Model):
    name = models.CharField(_('name'), max_length=64)
    email_address = models.EmailField(_('email address'))
    homepage = models.URLField(_('home page'),
                            blank=True, verify_exists=False)
    comment = models.TextField(_('comment'))
    pub_date = models.DateTimeField(_('Published date'),
                                    editable=False, auto_add_now=True)
    is_spam = models.BooleanField(_('spam?'),
                                default=False, editable=False)

    class Meta:
        verbose_name = _('comment')
        verbose_name_plural = _('comments')

In the view where the comment is posted, I first write the comment to the database, then I launch the spam filter task in the background.

`blog/views.py`¶

中文

from django import forms
from django.http import HttpResponseRedirect
from django.template.context import RequestContext
from django.shortcuts import get_object_or_404, render_to_response

from blog import tasks
from blog.models import Comment


class CommentForm(forms.ModelForm):

    class Meta:
        model = Comment


def add_comment(request, slug, template_name='comments/create.html'):
    post = get_object_or_404(Entry, slug=slug)
    remote_addr = request.META.get('REMOTE_ADDR')

    if request.method == 'post':
        form = CommentForm(request.POST, request.FILES)
        if form.is_valid():
            comment = form.save()
            # 异步检查垃圾邮件。
            tasks.spam_filter.delay(comment_id=comment.id,
                                    remote_addr=remote_addr)
            return HttpResponseRedirect(post.get_absolute_url())
    else:
        form = CommentForm()

    context = RequestContext(request, {'form': form})
    return render_to_response(template_name, context_instance=context)

为了过滤评论中的垃圾邮件，我使用了 Akismet ，这是一个用于过滤 Wordpress 免费博客平台中发布评论的垃圾邮件服务。 Akismet 对个人使用是免费的，但商业使用需要付费。你需要注册该服务以获得 API 密钥。

为了调用 Akismet 的 API，我使用了 Michael Foord 编写的 akismet.py 库。

英文

from django import forms
from django.http import HttpResponseRedirect
from django.template.context import RequestContext
from django.shortcuts import get_object_or_404, render_to_response

from blog import tasks
from blog.models import Comment


class CommentForm(forms.ModelForm):

    class Meta:
        model = Comment


def add_comment(request, slug, template_name='comments/create.html'):
    post = get_object_or_404(Entry, slug=slug)
    remote_addr = request.META.get('REMOTE_ADDR')

    if request.method == 'post':
        form = CommentForm(request.POST, request.FILES)
        if form.is_valid():
            comment = form.save()
            # Check spam asynchronously.
            tasks.spam_filter.delay(comment_id=comment.id,
                                    remote_addr=remote_addr)
            return HttpResponseRedirect(post.get_absolute_url())
    else:
        form = CommentForm()

    context = RequestContext(request, {'form': form})
    return render_to_response(template_name, context_instance=context)

To filter spam in comments I use Akismet, the service used to filter spam in comments posted to the free blog platform Wordpress. Akismet is free for personal use, but for commercial use you need to pay. You have to sign up to their service to get an API key.

To make API calls to Akismet I use the akismet.py library written by Michael Foord.

`blog/tasks.py`¶

from celery import Celery

from akismet import Akismet

from django.core.exceptions import ImproperlyConfigured
from django.contrib.sites.models import Site

from blog.models import Comment


app = Celery(broker='amqp://')


@app.task
def spam_filter(comment_id, remote_addr=None):
    logger = spam_filter.get_logger()
    logger.info('Running spam filter for comment %s', comment_id)

    comment = Comment.objects.get(pk=comment_id)
    current_domain = Site.objects.get_current().domain
    akismet = Akismet(settings.AKISMET_KEY, 'http://{0}'.format(domain))
    if not akismet.verify_key():
        raise ImproperlyConfigured('Invalid AKISMET_KEY')


    is_spam = akismet.comment_check(user_ip=remote_addr,
                        comment_content=comment.comment,
                        comment_author=comment.name,
                        comment_author_email=comment.email_address)
    if is_spam:
        comment.is_spam = True
        comment.save()

    return is_spam

任务/Tasks¶

基础知识¶

如何导入任务装饰器？¶

多个装饰器¶

绑定任务¶

任务继承¶

名称¶

更改自动命名行为¶

任务请求¶

示例¶

日志记录¶

参数检查¶

在参数中隐藏敏感信息¶

重试¶

使用自定义重试延迟¶

已知异常的自动重试¶

使用 Pydantic 进行参数验证¶

联合类型、泛型参数¶

可选参数/返回值¶

返回值处理¶

Pydantic 参数¶

选项列表¶

常规¶

状态¶

结果后端¶

RPC 结果后端 (RabbitMQ/QPid)¶

数据库结果后端¶

内置状态¶

PENDING¶

STARTED¶

SUCCESS¶

FAILURE¶

RETRY¶

REVOKED¶

自定义状态¶

创建可 pickle 的异常¶

任务半谓词¶

忽略¶

拒绝¶

重试¶

自定义任务类¶

实例化¶

每个任务的使用¶

App范围的使用¶

处理程序¶

请求和自定义请求¶

工作原理¶

技巧和最佳实践¶

忽略不需要的结果¶

更多优化技巧¶

避免启动同步子任务¶

性能和策略¶

粒度¶

数据本地性¶

状态¶

数据库事务¶

使用 Celery 5.4 及以下版本¶

示例¶

blog/models.py¶

blog/views.py¶

blog/tasks.py¶

`blog/models.py`¶

`blog/views.py`¶

`blog/tasks.py`¶