Data fetching¶

Small app data must be retrieved by clients to fill our beautiful charts. This is mostly done using <small-app>/query/<query-id>/execution and <small-app>/query/preview routes.

A new data execution service is being developed to replace this API: https://github.com/ToucanToco/data-execution-service/

Queries¶

Some queries are directly provided by end-users, as stored in the front_config. This is OK for local mongo queries, but not for live-ones to other sources, which could suffer from injection. These second type of queries are saved in a dedicated collection, and are only editable by app designers via the /query and /query/<uid> routes.

Query structure¶

Code mode queries over mongo¶

{
  domain: "widget_leaderboard",
  country: {
    $nin: [
      "Espagne",
      "Netherland"
    ]
   }
 }

Manually written Mongo Queries were the first to exists in Toucan. The query is directly POSTED to /data Those queries are either a $match object or an aggregation pipeline but the former is normalized into the later so everything is ran using mongo db.collection.aggregate() method

Code mode query over live domain¶

Connector queries are objects with some common fields, like the name of the connector used, the domain (acting as an ID), some boolean (live_data & load), and some others fields specific to the connector.

Because there is a Mongo Connector, queries for this specific connector will look exactly like what we’ve referred to as “Code mode query over Mongo” before. The only difference will be the presence of common fields next to the query and in the context.

{
  name: "microstrategy"
  domain: "connectors_microstrategy_live"
  live_data: false
  load: true
  parameters: {}
  id: "42FF415D4E162846C87D4FAD8B26BF4E"
  dataset: "cube"
}

{
  name: "OpenDataParis"
  domain: "connectors_velib_live"
  live_data: true
  load: false
  parameters: {}
  url: "velib-disponibilite-en-temps-reel/records?rows=100"
  filter: ".records[].record.fields | {capacity: .capacity, name: .name, available_bikes: .numbikesavailable, lat: .coordonnees_geo.lat, lon: .coordonnees_geo.lon }"
}

{
  name: "denodo"
  domain: "connectors_denodo_live"
  live_data: true
  load: false
  parameters: {}
  query: '''
      SELECT department, sum(porfolio_value_currency)
      FROM contact_portfolio_currencies
      WHERE currency_code = %(currency_code)s AND department IS NOT NULL
    GROUP BY department
  '''
  database: "toucan"
}

These objects are defined in the ETL config.

The data is then requested like a normal mongo query.

{
    domain: "connectors_microstrategy_live"
}

Variables¶

Notice in the previous snippet that the denodo query has live_data: true and defines a variable: WHERE currency_code = %(currency_code)s. Such connector query is not ready to use, it require some variable mapping that look like this :

{
  domain: "connectors_denodo_live",
  currency_code: "EUR"
}

VQB query¶

These queries are created with the YouPrep GUI (weaverbird). A VQB query is a pipeline. Each step transforms the data. cf https://weaverbird.toucantoco.com/docs/steps/

The acronym VQB means Visual Query Builder, which was the first project name of YouPrep/weaverbird. We tend to use it less, in favor of YouPrep.

pipeline: [
  { name: 'domain', domain: 'my_domain' },
  { name: 'filter', condition: {'column': 'price', 'operator': 'le', 'value': 42} },
]

Over mongo¶

VQB queries over mongo are translated by the front-end to mongo queries. The query is then POSTed to /data like a code-mode mongo query.

Over live data¶

The pipeline is posted as-is to /data along with extra domains.

The backend uses Pandas to execute the pipeline

# - with inline variables
extraDomains: {
  ConnectFour: {
    connectionUid: 'ConnectFour',
    config: {
      by: 'The Cool Kids',
      and: '<%= existing_front_var %>',
    },
  },
}
pipeline: [
  { name: 'domain', domain: 'ConnectFour' },
  { name: 'rename', toRename: [['foo', '<%= existing_front_var %>']] },
]

Extra domains¶

When creating a live data query via the GUI, a user must first configure the datasource. This datasource will be an extraDomain that will be used for the domain step of the query.

There is no other way to create extraDomains. queries created via the GUI can only have a single extraDomain

The extra domains is a dictionnary sent along with the pipeline that contains ad hoc connections configurations.

The key is the domain name. this name can be referenced in the domain, join and append steps in the pipeline.

The value is a dictionnary with 2 entries:

connectionUid¶

the extra domain must use an already configured connection. It is referenced by its Uid

config¶

The config is an inlined Code mode query over live domain minus the name, domain, load and live_data attributes.

The uid, which is used to identify the connection that will be used, is in the connectionUid key, at the root of the extra domain definition. The domain is already the key under extraDomains. load and live_data have no point being here.

Query collection¶

Storage¶

Queries can be stored in a collection, and executed later.

Queries are indexed using a uid field. Query models can be found in models/small_app_db/query/model.py. Different models exist for different type of queries, and the QueryType enum lists all types of queries available. Each query document has a type field, and its value is a member of QueryType.

Here are all the existing types available (described above in the query structure section): - vqb_pipeline_over_connection - vqb_pipeline_over_datastore - vqb_pipeline_over_existing_dataset - code_mode_query_over_live_domain - code_mode_query_over_datastore

These queries are stored in the queries-staging collection in the relevant smallApp database, and are copied to queries when the app is released to production.

The CRUD operations for these queries are done using the following routes: - /<small_app_id>/query (create) - /<small_app_id>/query/<uid> (read, update, delete).

Execution¶

To execute stored queries, one simply has to POST its UID to /data

{
  "query_uid": "aaaabe-4242-fewfwg",
  "variables": {}
}

variables¶

stored queries can use variables. Let's say that we stored the following query

{pipeline: [
  { name: 'domain', domain: 'a_domain' },
  { name: 'filter',  "condition": {"column": 'name', "operator": "eq", "value": "{{ foobar }}" }},
]}

to execute the query, the following is posted to /data:

{ 
  'query_uid': 'some_uid',
  'variables: {
     'foobar': 'John'
   }
}

some variables, such as the username, or the current date, are provided directly by the backend.

New query execution API¶

Although using /data endpoint still works for executing all type queries, it will be deprecated in favor of new routes for query execution:

/<small_app_id>/query/<query_uid>/execution-context to retrieve the execution context of a query stored in database and referenced by its uid (GET)
/<small_app_id>/query/<query_uid>/execute to execute a query referenced by its uid (this route expects a POST with a json body containing the execution context)

For now, the execution context only contains the value of the variables required for query interpolation and execution (cf. section "variables" above).

Example of response received when calling the execution-context endpoint:

{
  "variables": {
    "__front_var_0__": "<%= someExpression %>"
  }
}

Example of payload accepted by the execute endpoint:

{
  "variables": {
    "__front_var_0__": "France"
  }
}

Preview endpoints¶

The 2 endpoints presented above have their counterpart for previews:

/<small_app_id>/query/preview-context
/<small_app_id>/query/preview

Unlike the previous ones, these routes don't have a query_uid parameter. This is because these routes are meant to be used by app builders writing new queries, hence not being stored in the database yet. These routes require at least contributor privileges, because we don't want unprivileged users to be able to execute arbitrary queries using any connector configured.

The payload expected by these endpoints is also different: they expect a full query definition (without uid) in the POSTed json body. At the time of writing, only queries with pipeline (VQBPipelineOverXXX) are accepted.

Example of payload sent to endpoint /<small_app_id>/query/preview:

{
    "type": "vqb_pipeline_over_connection",
    "extraDomains": {
        "mydomain": {
            "connectionUid": "rustybeer",
            "config": {
                "url": "/hops"
            }
        }
    },
    "pipeline": [{
        "name": "domain",
        "domain": "mydomain"
    }, {
        "name": "filter",
        "condition": {
            "column": "country",
            "operator": "eq",
            "value": "France"
        }
    }]
}

Pagination¶

:x:

Search among data¶

:x:

Caching strategy¶

See query cache

Permission / filtering¶

:x: