Skip to content

Error Handling

Warning

This feature is in beta mode and subject to changes. Any feedback is appreciated.

What follows applies to the SWF executor; the local one only handles raises_on_failure.

A workflow can declare a method to be called when a task or child workflow fails or times out. By default, errors are handled as follows:

  • aretry attribute defines how many times the task or workflow is retried; it defaults to 0 (no retry)
  • if too many retries were attempted, the raises_on_failure attribute is checked. If False, a future is returned with its exception set; if True, the workflow is aborted.

Groups and Chains have raises_on_failure too, along with bubbles_exception_on_failure and (for Chains) break_on_failure, allowing for more fine-grained control.

If defined, Workflow.on_task_failure is called before the Executor.default_failure_handling default error handling. It is passed a TaskFailureContext object with error details and can modify it or return a new TaskFailureContext instance.

The TaskFailureContext currently have these members:

  • a_task: failed ActivityTask or WorkflowTask
  • task_name: activity or workflow name
  • exception_class: TaskException or WorkflowException
  • exception: raised exception (shortcut to future.exception); None for unfinished tasks
  • retry_count: current retry count (0 for first retry)
  • task_error: for a TaskFailed exception, name of the inner exception if available
  • reason: TaskFailed.reason or str(exception)
  • details: TaskFailed.details or None
  • future: failed future
  • event: quite opaque dict, experimental
  • history: History object, experimental
  • decision: described below
  • retry_wait_timeout: ditto

The TaskFailureContext.decision (nothing in common with SWF’s decisions) should be set by on_task_failure. It can be:

  • none: default value, continue with default handling (potential retries then abort)
  • abort: use default abort strategy
  • ignore: discard the exception, consider the task finished; the future’s result may be user-modified
  • cancel: mark the future as canceled
  • retry: schedule the task again; its args and kwargs may be user-modified (well, the whole task)
  • retry_later: schedule the task after retry_wait_timeout seconds, with args and kwargs potentially altered
  • handled: on_task_failure somewhat handled the failure; use the future and task it has possibly modified (one strategy here is for on_task_failure to call the executor’s default_failure_handling method, and make workflow-specific processing according to its return value)

The Workflow.on_task_failure method is guaranteed to be called only once on a given replay, but may be called again with the same failure in subsequent replays: it must thus be idempotent.