5. Filters

Filters are used to constrain the dataset. Filter definitions define

  • how a dataset column should be transformed before it can be filtered [1]

  • what canned filter values are possible

  • a user expression parser, if allowed

Footnotes

Simply put; they define types of where clauses and how the filter information is to be presented to the end-user.

5.1. General structure

Filters are defined in the configuration directory under filters as yaml files. Consider the following directory structure:

Directory

Description

config/filters

Repository for all filter definitions.

config/filters/name.yaml

A single filter definition.

5.2. Filter definition

A filter is constructed from a filter definition. The following attributes may be used in the filter definition:

  • name – presentation name as displayed to the end-user, and when searching for filters.

  • description – presentation description, used as pop-up and when searching for filters.

  • columnname – filter identifier, should be unique across all filters, used in query composition. Convention is to also use this value as the filename.

  • select – the expression that should be used on the source table to obtain the value to be filtered. If multiple expressions are required, a list of expressions can be used here.

  • filtersOptional any number of key: value statements where key is presented to the user, and value represents the where clause to be used in the actual selection. %t can be used in the where clause to denote the column to be filtered. %t0, %t1 can be used in case of multiple select expressions. Make sure to consider NULL values in the filter clause because for example %t0 < %t1 will result in NULL and exclusion if one of the select expressions (%t0 or %t1) is NULL which is not always intended. In this example an alternative would be to use LEAST(%t0, %t1) to ignore NULL values, or avoid NULL values altogether by using COALESCE with a default value in the select expression.

  • userexprOptional function that defines a parser that returns a where clause given a user expression.

  • distinctlistOptional an expression that retrieves all known values. This provides filters should be used instead of those. Typical usecase: get all known doctors.

  • distinctoperatorOptional an operator that is used for the construction of ordering of the found filters after they have been retrieved using the distinctlist. Typical usecase: get a list of known dates, prefixed with a <=.

  • distinctorderOptional Sort the distinctlist in either ascending (asc) or descending (desc) order (as presented to the user). Default is ascending order.

  • exclusiveOptional a boolean that is use to construct exclusive filters. This enables the creation of NOT(value=x OR value=y) filters instead of the normal (value=x OR value=y) filters. If not set it defaults to False. Make sure to consider NULL values in the filter clause because for example NOT(value=NULL) will result in NULL and exclusion which is probably not what is intended.

  • isarrayOptional a boolean that is used to indicate that the distinctlist will return an array of values that needs to be unpacked. This can be used when a row returns a number of items that need to be put into the filters not as array but as singular values.

  • highlowOptional an two element array that specifies the name for the highest and lowest item to be added to the filters array. This allows users to quickly select the highest or lowest item as the filter condition.

5.2.1. User expressions

If a filter has a userexpr statement, users can add new values to be filtered on simply by typing them in. userexpr then defines the types of user expression possible, by pointing to one these parsers:

  • parse_comparator_numeric – expressions of the form comparator number. E.g. <32, >= 5.6, =10 or 10.

  • parse_range_numeric – expressions of ranges in the forms number-number, [number;number[, comparator number or number. Where brackets denote that a range bound is included or excluded. E.g. 1-2, [1;2], [1;3[, >9, 10.

  • parse_dutch_postal_code – expressions of the form 0123AB where a smaller string or inclusion of the _ or % denotes a like query. E.g. 0123AB, 0123_, 0123%.

  • parse_comparator_date – expressions of the form comparator date. E.g. < 2020-01-01, >= 2021-01-01. Date formats supported are year-month-day, day.month.year and day/month/year. Dates without an explicit comparator are understood as = date.

  • parse_date_from – takes in the same date formats as parse_comparator_date, but always takes the unadorned date value as the start. E.g. say 2020-01-01 to mean >= 2020-01-01.

  • parse_date_to – takes in the same date formats as parse_comparator_date, but takes the given date value as the end date. E.g. input 2021-01-01 to mean <= 2021-01-01. Often used together with another filter that supplies a parse_date_from.

  • parse_range_date – expressions of ranges in the forms date,date, [date,date], comparator date or date. Where brackets denote that a range bound is included or excluded. E.g. 2020-01-01,2021-01-01, [2020-01-01,2021-01-01[, > 2020-01-01, 2020-01-01.

5.3. Filter Examples

5.3.1. Filter with userexpr

filter:
  name: Leeftijd
  description: |-
    Leeftijd van de patient (ten tijde van de referentiedatum).
    U kunt hier ook een range intypen zoals 70-80.
  select: extract(year from age(reference_date, patient_birthdate))
  columnname: age
  filters:
    "All": null
    "<30": "%t <30"
    "<40": "%t <40"
    "<50": "%t <50"
    "<60": "%t <60"
    "<70": "%t <70"
    "<80": "%t <80"
    ">80": "%t >80"
    ">70": "%t >70"
    ">60": "%t >60"
    ">50": "%t >50"
    ">40": "%t >40"
    ">30": "%t >30"
    "70-80": "(%t >=70 AND %t <= 80)"
  userexpr: parserangenumeric

Interesting to note:

  • Arguments can be quoted when convenient. Filter keys and values are quoted by MGRID in all examples because the lower-than and greater-than sign are special characters in yaml.

  • Arguments may be multiline. One way to indicate that is by inserting |- and then following with an indented text block.

  • The filter clause (e.g., %t <30) should consider NULL values as needed. For example if comparisons should return TRUE in the presence of NULL (i.e. be included for the specific filter key), explicitly check for NULL or use NULL-aware functions such as COALESCE, GREATEST, LEAST, etc., or avoid NULL values altogether by using COALESCE with a default value in the select expression.

  • filters "All": null makes All in the ui map to no selection, i.e. we want all values possible.

  • a userexpr that shows one of the parse functions that can be used to convert user input to SQL where clauses.

5.3.2. Filter with Distinctlist

filter:
  name: Referentiedatum - Einde Periode
  description: |-
    Met dit filter kan een reeks aan referentiedata geselecteerd worden die in dit geval voor Einde Periode liggen.
    Kan gebruikt worden in een pivot, om meerdere kolommen referentiedata weer te geven en het verloop van een waarde te zien.
  select: reference_date
  columnname: enddate
  distinctlist: reference_date
  distinctoperator: <=
This describes an End date filter that allows an end-user to select all rows

leading up to a particular end date.

Items of interest here:

  • distinctlist retrieves all possible dates from the database, and defines select and filters entries implicitly

  • distinctoperator defines the operator to be used in the filters construction.

5.3.3. Filter with multiple select

filter:
  name: Datum Laatste Consult
  description: |
    Datum laatste consult. Dit betreft het algemene onderzoek genaamd "Consult".
  select:
  - reference_date
  - date_last_consult
  columnname: date_last_consult
  filters:
    "All": null
    "Determined": "%t1 IS NOT NULL"
    "Never": "%t1 IS NULL"
    ">6 months": "((age(%t0, %t1) > '6 month') OR %t1 IS NULL)"
    "<6 months": "age(%t0, %t1) <= '6 month'"
    ">1 year": "((age(%t0, %t1) > '1 year') OR %t1 IS NULL)"
    "<1 year": "age(%t0, %t1) <= '1 year'"
    ">18 months": "((age(%t0, %t1) > '18 month') OR %t1 IS NULL)"
    "<18 months": "age(%t0, %t1) <= '18 month'"
    ">2 years": "((age(%t0, %t1) > '2 year') OR %t1 IS NULL)"
    "<2 years": "age(%t0, %t1) <= '2 year'"
    "Year to date": "date_trunc('year', %t0) = date_trunc('year', %t1)"

This describes a Date last consult filter that allows an end-user to select all rows for patients that have not been seen by their treating physician for some time. Of interest here is the use of the select as a list, because multiple source columns are needed in the filters clauses.

5.4. Advanced Filter Examples

5.4.1. Filter to select for regular expression item in an array

Consider the following usecase:

We keep an array with administered drugs. These drug descriptions contain ATC codes. ATC codes are hierarchical in nature, where leading characters provide drug classification.

Given this structure it stands to reason that we want to:

  • select for a particular drug amongst the drugs:

    SELECT drugs FROM table WHERE particular = ANY(drugs)
    
  • select for a type of drug, where type is indicated by abbreviated ATC code:

    SELECT drugs FROM table WHERE abbrev LIKE ANY(drugs)
    

There are two issues with the last statement;

  • like is written as column LIKE %abbrev%, so abbrev in our example should be the second argument

  • Similar reasoning holds for the regexp operator ~* ANY(array) can only be provided in the form operand operator ANY(array)

To work around this we introduce a new operator called ~*@ that allows ANY to stay in current position, but commutes the arguments before executing the regexp operator.

As a mental note to the name consider that ~* is the case insensitive regular expression operator, and that @ is used for many array operators. So ~*@ denotes the case insensitive regular expression operator for array use.

With the new operator ~*@ the query can be written as:

SELECT drugs FROM table WHERE '^abbrev' ~*@ ANY(drugs)

A complete example that details use in a filter follows:

filter:
  name: Prescription medications
  description: |-
    Prescription medication by (partial) Anatomical Therapeutic Chemical (ATC) classification code.
  select: prescriptionmedications
  columnname: prescriptionmedications
  filters:
    "A: Alimentary tract and metabolism": "'^A' ~*@ ANY(%t)"
    "A01: Stomatologicals preparations": "'^A01' ~*@ ANY(%t)"
    "A01A: Stomatologicals preparations": "'^A01A' ~*@ ANY(%t)"
    ... lines omitted for brevity
    "V10XA: Iodine (131 i) compounds": "'^V10XA' ~*@ ANY(%t)"
    "V10XX: Various therapeutic radiopharmaceuticals": "'^V10XX' ~*@ ANY(%t)"
    "V20: Surgical dressings": "'^V20' ~*@ ANY(%t)"

Similarly the physician needs to be able to select all patients that have not had a particular medication. The negation of the filter is described below:

filter:
  name: Exclude Prescription medications
  description: |-
    Exclude prescription medication by (partial) Anatomical Therapeutic Chemical (ATC) classification code.
  select: prescriptionmedications
  columnname: excludeprescriptionmedications
  filters:
    "A: Alimentary tract and metabolism": "('^A' ~*@ ANY(%t)) IS NOT TRUE"
    "A01: Stomatologicals preparations": "('^A01' ~*@ ANY(%t)) IS NOT TRUE"
    "A01A: Stomatologicals preparations": "('^A01A' ~*@ ANY(%t)) IS NOT TRUE"
    ... lines omitted for brevity
    "V10XA: Iodine (131 i) compounds": "('^V10XA' ~*@ ANY(%t)) IS NOT TRUE"
    "V10XX: Various therapeutic radiopharmaceuticals": "('^V10XX' ~*@ ANY(%t)) IS NOT TRUE"
    "V20: Surgical dressings": "('^V20' ~*@ ANY(%t)) IS NOT TRUE"

This filter expression is true when no medication matches, but also when no medications have been given at all (NULL column).