Scheduler¶
The scheduler is a very simple python program (monoprocess, 1 thread, synchronous) who behaves like Unix's cron daemon:
- every minute, he wakes up to see if there is something to do
- then, he goes back to sleep
He doesn't actually process the actions by himself, he sends them to celery.
The scheduler's entrypoint is scheduler_server.py.
Scheduled actions structure¶
The scheduled action are handled using a pydantic model: ScheduledActionModel
which performs all the validation needed to build the action
Here is an exemple:
{
"name": "weekly update",
"active": "true",
"operations": {
"refresh": {"domains": ["a", "b"]},
"publish": {"data": "true", "design": "false", "permissions": "false"},
"send_execsum": "true",
},
"when": {"cron_rule": "0 7 * * 1", "timezone": "Europe/Paris"},
"recipients": {"users": ["toco@tou.can"], "groups": ["manager"]},
}
In this example, a task is scheduled every morning at 7 AM, Paris timezone. It has 3 steps: - Refresh 2 domains "a" and "b" - Publish data - Send the small app executive summary to the recipients specified below
The recipients contain a single user with id toco@tou.can but also all the users part of the
manager group
Features available¶
Operations:¶
The operations are defined in the OperationModel:
- Data can be refreshed (on every domains or specified ones)
- Publishing data, design and permissions can be done (everything or specified ones)
- Emails templates can be sent to the recipients specified
- Executive summaries can be sent to the recipients specified
- Maintenance actions are by default executed daily at 4 AM
When:¶
The rules of when the actions are executed is defined in the WhenModel:
- Based on a cron rule on a specified timezone
- Based on a due date
- On and after an event occurence (see events)
Recipients:¶
The recipients are defined in the RecipientFilterModel:
- Multiple named users with their id
- Every users part of one of the specified groups
By default, inactive and inactivated users are removed from the recipient list,
it is however possible to filter the recipients to include or target these inactive
and inactivated users using these booleans (not mandatory, False by default) :
- include_inactivated_users
- include_inactive_users
- only_inactivated_users
- only_inactive_users
Events¶
Actions can be event-driven, on or after them see OnEventModel and AfterEventModel:
- On user activation
- On user created
- On user last seen date
Example event driven action:
{
"name": "bring back inactive users",
"active": "true",
"operations": {
"send_email": {"template": "inactive"},
},
"when": {"after": "instance.user.last_seen", "delay": "P7DT30S"},
"recipients": {"groups": ["all"]},
}
Fetching scheduled actions¶
Every minute, the scheduler fetches all the active scheduled actions as ScheduledActionModel
of every small app, then processes them one by one to look for matching actions to execute
(cf. get_scheduled_action_last_occurence)
Running tasks¶
Tasks are sent to celery. The scheduled action is wraped into an operation and
executed using either execute_scheduled_action_quick or execute_scheduled_action depending on
if it has heavy operation (refresh or/and publish)
Here is a snippet of code similar to the one used to send the task to celery:
fingerprint = f'{action.uid} | {last_occurence}'
celery_task_id = str_to_uuid(fingerprint)
execute_scheduled_action.delay(
action.uid,
enforce_task_id=celery_task_id,
triggerrer=f'Scheduled action "{action.name}"'
)
Note the celery_task_id variable: we craft a unique ID to retrieve the operation status
afterwards, using get_operation_state function.
Handling scheduler downtime¶
Question
What if the scheduler goes down for a minute or two? Is there a risk for some tasks to have been skipped ?
Nope! Because when the scheduler looks for tasks to schedule, actually, he doesn't only
filter on the current minute, but on the last 5 minutes (cf. get_cron_last_occurence function).
Question
Then, if he wakes up every minute, doesn't he trigger each action 5 times?
Nope, because before sending a task to celery, he checks if an operation with the
same celery_task_id already exists in database.
Misc¶
- to avoid spamming "new data available" email notifications each time a publish operation
is scheduled, the scheduler runs with
bucket.notify = False. More details here. - scheduler tasks (
metatask_schedule) are sent to thedefaultcelery queue, which has a default maximum concurrency of 2 parallel workers.