Query Cache¶
As a well behaving internet citizen, Toucan tries to avoid requesting repeatedly data from other services.
To avoid asking many times the same data from other servers, we use a cache.
Configuration¶
The cache¶
The cache is configured in config.yml
app_params:
[...]
query_cache:
status: enable
min_time: 50 ms
There are 2 differents configurations key:
* status: can be enable or disable
* min time: we do not want to cache each and every request. This is the minimum amount of time
that a query must take before being put in cache. Default value is 50ms.
min_time units¶
min_time is a time delta represented as a string. Usable units are:
'ms': 'milliseconds',
's': 'seconds',
'sec': 'seconds',
'seconds': 'seconds',
'm': 'minutes',
'min': 'minutes',
'minutes': 'minutes',
'h': 'hours',
'hours': 'hours',
'd': 'days',
'day': 'days',
'days': 'days',
'w': 'weeks',
'weeks': 'weeks',
The cache database¶
The cache is stored in a redis database.
The connection to this database can also be configured in config.yml
databases:
[...]
cache:
host: redis
port: 6379
ssl: false
database: 1
default_ttl: 300
Usage¶
The cache should be entirely automatic, and no human action should be required.
Storing data in cache¶
if a query takes more than min_time (cf. configuration) to fetch data,
the data will be stored in the redis instance defined in the database named cache in config.yml
Data will be stored at the key: {small_app_id}:{stage}:{local_or_live}:{hash(connector, datasource)}
The hash¶
the hash(connector, datasource) part is connector dependant.
This is what is hashed:
unique_identifier = {
'connector': self.get_unique_identifier(),
'permissions': permissions,
'offset': offset,
'limit': limit,
'datasource': self._render_datasource(data_source)
}
get_unique_identifier¶
This method returns a unique identifier for a connector. If two connectors are instantiated from the same configuration, their identifier should be equal.
The default implementation of get_unique_identifier() returns a json with ALL data from the connector.
Some of this data might be irrelevant for hashing purpose. Connectors can reimplement it.
example: The MongoConnector contains a MongoClient instance, called client, that is used to communicate with the mongo server.
If a mongo connector is instantiated twice, with the same parameters, with the default implementation, it will have 2 different hashes.
It is not what we want, so we explicitely exclude this client from the hash:
def get_unique_identifier(self) -> dict:
return self.json(
exclude={'client'}
)
_render_datasource¶
In Toucan, datasources are parametrized. For instance, a pseudo-datasource such as
query: 'select * from {{ TABLE_NAME}}
parameters:
TABLE_NAME: foobar
It should also be the same as:
query: 'select * from {{ TABLE_NAME}}
parameters:
TABLE_NAME: foobar
useless: 4242
the 'canon' version of such a datasource would be:
query: 'select * from foobar'
the _render_datasource method canonizes datasource. It applies the parameters to the query, and then drop parameters.
The default implementation uses nosql_apply_parameters_to_query to render the parameters,
but this is overrideable by connectors. The Snowflake connector uses SqlQueryHelper.prepare_query
Cache invalidation¶
Automatic invalidation¶
The following events will invalidate the cache:
- TTL (Time To Live): each cache entry will remain valid for a given time
- refresh data: the refresh data operation will invalidate local staging cache entries for the small app.
- publish: the publish operation will invalidate local production cache entries for the small app.
Live data will be invalidated ONLY when the TTL is expired
Local data will be invalidated when the TTL is expired
OR if we refresh data (staging only)
OR if we publish (production only)
Manual invalidation¶
If it is needed, it is possible to manually wipe the cache by calling the following routes:
DELETE /query-cacheto wipe the cache for all small apps
This can only be done by a super admin.
TTL¶
the TTL can be configured at 3 different places
by order of priority:
* The query itself can have a TTL configured in the field cache_ttl
domain: "connectors_velib_live"
name: "OpenDataParis"
load: false
live_data: true
parameters: {}
url: "velib-disponibilite-en-temps-reel/records?rows=100"
filter: ".records[].record.fields | {capacity: .capacity, name: .name, available_bikes: .numbikesavailable, lat: .coordonnees_geo.lat, lon: .coordonnees_geo.lon }"
cache_ttl: 1s
- the connector can have a TTL, also in the field
cache_ttl - finally, a default TTL can be configured in config.yml