Skip to content

Scheduler

The scheduler is a very simple python program (monoprocess, 1 thread, synchronous) who behaves like Unix's cron daemon:

  • every minute, he wakes up to see if there is something to do
  • then, he goes back to sleep

He doesn't actually process the actions by himself, he sends them to celery.

The scheduler's entrypoint is scheduler_server.py.

Scheduled actions structure

The scheduled action are handled using a pydantic model: ScheduledActionModel which performs all the validation needed to build the action

Here is an exemple:

{
    "name": "weekly update",
    "active": "true",
    "operations": {
        "refresh": {"domains": ["a", "b"]},
        "publish": {"data": "true", "design": "false", "permissions": "false"},
        "send_execsum": "true",
    },
    "when": {"cron_rule": "0 7 * * 1", "timezone": "Europe/Paris"},
    "recipients": {"users": ["toco@tou.can"], "groups": ["manager"]},
}

In this example, a task is scheduled every morning at 7 AM, Paris timezone. It has 3 steps: - Refresh 2 domains "a" and "b" - Publish data - Send the small app executive summary to the recipients specified below

The recipients contain a single user with id toco@tou.can but also all the users part of the manager group

Features available

Operations:

The operations are defined in the OperationModel: - Data can be refreshed (on every domains or specified ones) - Publishing data, design and permissions can be done (everything or specified ones) - Emails templates can be sent to the recipients specified - Executive summaries can be sent to the recipients specified - Maintenance actions are by default executed daily at 4 AM

When:

The rules of when the actions are executed is defined in the WhenModel: - Based on a cron rule on a specified timezone - Based on a due date - On and after an event occurence (see events)

Recipients:

The recipients are defined in the RecipientFilterModel: - Multiple named users with their id - Every users part of one of the specified groups

By default, inactive and inactivated users are removed from the recipient list, it is however possible to filter the recipients to include or target these inactive and inactivated users using these booleans (not mandatory, False by default) : - include_inactivated_users - include_inactive_users - only_inactivated_users - only_inactive_users

Events

Actions can be event-driven, on or after them see OnEventModel and AfterEventModel: - On user activation - On user created - On user last seen date

Example event driven action:

{
    "name": "bring back inactive users",
    "active": "true",
    "operations": {
        "send_email": {"template": "inactive"},
    },
    "when": {"after": "instance.user.last_seen", "delay": "P7DT30S"},
    "recipients": {"groups": ["all"]},
}

Fetching scheduled actions

Every minute, the scheduler fetches all the active scheduled actions as ScheduledActionModel of every small app, then processes them one by one to look for matching actions to execute (cf. get_scheduled_action_last_occurence)

Running tasks

Tasks are sent to celery. The scheduled action is wraped into an operation and executed using either execute_scheduled_action_quick or execute_scheduled_action depending on if it has heavy operation (refresh or/and publish)

Here is a snippet of code similar to the one used to send the task to celery:

fingerprint = f'{action.uid} | {last_occurence}'
celery_task_id = str_to_uuid(fingerprint)
execute_scheduled_action.delay(
    action.uid, 
    enforce_task_id=celery_task_id, 
    triggerrer=f'Scheduled action "{action.name}"'
)

Note the celery_task_id variable: we craft a unique ID to retrieve the operation status afterwards, using get_operation_state function.

Handling scheduler downtime

Question

What if the scheduler goes down for a minute or two? Is there a risk for some tasks to have been skipped ?

Nope! Because when the scheduler looks for tasks to schedule, actually, he doesn't only filter on the current minute, but on the last 5 minutes (cf. get_cron_last_occurence function).

Question

Then, if he wakes up every minute, doesn't he trigger each action 5 times?

Nope, because before sending a task to celery, he checks if an operation with the same celery_task_id already exists in database.

Misc

  • to avoid spamming "new data available" email notifications each time a publish operation is scheduled, the scheduler runs with bucket.notify = False. More details here.
  • scheduler tasks (metatask_schedule) are sent to the default celery queue, which has a default maximum concurrency of 2 parallel workers.