5. Filters¶
Filters are used to constrain the dataset. Filter definitions define
how a dataset column should be transformed before it can be filtered [1]
what canned filter values are possible
a user expression parser, if allowed
Footnotes
Simply put; they define types of where clauses and how the filter information is to be presented to the end-user.
5.1. General structure¶
Filters are defined in the configuration directory under filters
as yaml
files. Consider the following directory structure:
Directory |
Description |
---|---|
|
Repository for all filter definitions. |
|
A single filter definition. |
5.2. Filter definition¶
A filter is constructed from a filter definition. The following attributes may be used in the filter definition:
name
– presentation name as displayed to the end-user, and when searching for filters.description
– presentation description, used as pop-up and when searching for filters.columnname
– filter identifier, should be unique across all filters, used in query composition. Convention is to also use this value as the filename.select
– the expression that should be used on the source table to obtain the value to be filtered. If multiple expressions are required, a list of expressions can be used here.filters
– Optional any number ofkey: value
statements wherekey
is presented to the user, andvalue
represents the where clause to be used in the actual selection.%t
can be used in the where clause to denote the column to be filtered.%t0
,%t1
can be used in case of multipleselect
expressions. Make sure to consider NULL values in the filter clause because for example%t0 < %t1
will result in NULL and exclusion if one of the select expressions (%t0
or%t1
) is NULL which is not always intended. In this example an alternative would be to useLEAST(%t0, %t1)
to ignore NULL values, or avoid NULL values altogether by using COALESCE with a default value in the select expression.userexpr
– Optional function that defines a parser that returns a where clause given a user expression.distinctlist
– Optional an expression that retrieves all known values. This providesfilters
should be used instead of those. Typical usecase: get all known doctors.distinctoperator
– Optional an operator that is used for the construction of ordering of the foundfilters
after they have been retrieved using thedistinctlist
. Typical usecase: get a list of known dates, prefixed with a<=
.distinctorder
– Optional Sort the distinctlist in either ascending (asc
) or descending (desc
) order (as presented to the user). Default is ascending order.exclusive
– Optional a boolean that is use to construct exclusive filters. This enables the creation ofNOT(value=x OR value=y)
filters instead of the normal(value=x OR value=y)
filters. If not set it defaults toFalse
. Make sure to consider NULL values in the filter clause because for exampleNOT(value=NULL)
will result in NULL and exclusion which is probably not what is intended.isarray
– Optional a boolean that is used to indicate that thedistinctlist
will return an array of values that needs to be unpacked. This can be used when a row returns a number of items that need to be put into thefilters
not as array but as singular values.highlow
– Optional an two element array that specifies the name for the highest and lowest item to be added to thefilters
array. This allows users to quickly select the highest or lowest item as the filter condition.
5.2.1. User expressions¶
If a filter has a userexpr
statement, users can add new values to be
filtered on simply by typing them in. userexpr
then defines the types of
user expression possible, by pointing to one these parsers:
parse_comparator_numeric
– expressions of the formcomparator number
. E.g. <32, >= 5.6, =10 or 10.parse_range_numeric
– expressions of ranges in the formsnumber-number
,[number;number[
,comparator number
ornumber
. Where brackets denote that a range bound is included or excluded. E.g. 1-2, [1;2], [1;3[, >9, 10.parse_dutch_postal_code
– expressions of the form0123AB
where a smaller string or inclusion of the_
or%
denotes a like query. E.g. 0123AB, 0123_, 0123%.parse_comparator_date
– expressions of the formcomparator date
. E.g. < 2020-01-01, >= 2021-01-01. Date formats supported are year-month-day, day.month.year and day/month/year. Dates without an explicit comparator are understood as = date.parse_date_from
– takes in the same date formats asparse_comparator_date
, but always takes the unadorned date value as the start. E.g. say 2020-01-01 to mean >= 2020-01-01.parse_date_to
– takes in the same date formats asparse_comparator_date
, but takes the given date value as the end date. E.g. input 2021-01-01 to mean <= 2021-01-01. Often used together with another filter that supplies aparse_date_from
.parse_range_date
– expressions of ranges in the formsdate,date
,[date,date]
,comparator date
ordate
. Where brackets denote that a range bound is included or excluded. E.g. 2020-01-01,2021-01-01, [2020-01-01,2021-01-01[, > 2020-01-01, 2020-01-01.
5.3. Filter Examples¶
5.3.1. Filter with userexpr¶
filter:
name: Leeftijd
description: |-
Leeftijd van de patient (ten tijde van de referentiedatum).
U kunt hier ook een range intypen zoals 70-80.
select: extract(year from age(reference_date, patient_birthdate))
columnname: age
filters:
"All": null
"<30": "%t <30"
"<40": "%t <40"
"<50": "%t <50"
"<60": "%t <60"
"<70": "%t <70"
"<80": "%t <80"
">80": "%t >80"
">70": "%t >70"
">60": "%t >60"
">50": "%t >50"
">40": "%t >40"
">30": "%t >30"
"70-80": "(%t >=70 AND %t <= 80)"
userexpr: parserangenumeric
Interesting to note:
Arguments can be quoted when convenient. Filter keys and values are quoted by MGRID in all examples because the lower-than and greater-than sign are special characters in yaml.
Arguments may be multiline. One way to indicate that is by inserting
|-
and then following with an indented text block.The filter clause (e.g.,
%t <30
) should consider NULL values as needed. For example if comparisons should return TRUE in the presence of NULL (i.e. be included for the specific filter key), explicitly check for NULL or use NULL-aware functions such as COALESCE, GREATEST, LEAST, etc., or avoid NULL values altogether by using COALESCE with a default value in the select expression.filters
"All": null
makes All in the ui map to no selection, i.e. we want all values possible.a
userexpr
that shows one of the parse functions that can be used to convert user input to SQL where clauses.
5.3.2. Filter with Distinctlist¶
filter:
name: Referentiedatum - Einde Periode
description: |-
Met dit filter kan een reeks aan referentiedata geselecteerd worden die in dit geval voor Einde Periode liggen.
Kan gebruikt worden in een pivot, om meerdere kolommen referentiedata weer te geven en het verloop van een waarde te zien.
select: reference_date
columnname: enddate
distinctlist: reference_date
distinctoperator: <=
- This describes an End date filter that allows an end-user to select all rows
leading up to a particular end date.
Items of interest here:
distinctlist
retrieves all possible dates from the database, and definesselect
andfilters
entries implicitlydistinctoperator
defines the operator to be used in thefilters
construction.
5.3.3. Filter with multiple select¶
filter:
name: Datum Laatste Consult
description: |
Datum laatste consult. Dit betreft het algemene onderzoek genaamd "Consult".
select:
- reference_date
- date_last_consult
columnname: date_last_consult
filters:
"All": null
"Determined": "%t1 IS NOT NULL"
"Never": "%t1 IS NULL"
">6 months": "((age(%t0, %t1) > '6 month') OR %t1 IS NULL)"
"<6 months": "age(%t0, %t1) <= '6 month'"
">1 year": "((age(%t0, %t1) > '1 year') OR %t1 IS NULL)"
"<1 year": "age(%t0, %t1) <= '1 year'"
">18 months": "((age(%t0, %t1) > '18 month') OR %t1 IS NULL)"
"<18 months": "age(%t0, %t1) <= '18 month'"
">2 years": "((age(%t0, %t1) > '2 year') OR %t1 IS NULL)"
"<2 years": "age(%t0, %t1) <= '2 year'"
"Year to date": "date_trunc('year', %t0) = date_trunc('year', %t1)"
This describes a Date last consult filter that allows an end-user to select
all rows for patients that have not been seen by their treating physician for
some time. Of interest here is the use of the select
as a list, because
multiple source columns are needed in the filters
clauses.
5.4. Advanced Filter Examples¶
5.4.1. Filter to select for regular expression item in an array¶
Consider the following usecase:
We keep an array with administered drugs. These drug descriptions contain ATC codes. ATC codes are hierarchical in nature, where leading characters provide drug classification.
Given this structure it stands to reason that we want to:
select for a particular drug amongst the drugs:
SELECT drugs FROM table WHERE particular = ANY(drugs)
select for a type of drug, where type is indicated by abbreviated ATC code:
SELECT drugs FROM table WHERE abbrev LIKE ANY(drugs)
There are two issues with the last statement;
like is written as
column LIKE %abbrev%
, so abbrev in our example should be the second argumentSimilar reasoning holds for the regexp operator
~* ANY(array)
can only be provided in the formoperand operator ANY(array)
To work around this we introduce a new operator called ~*@
that allows ANY
to stay in current position, but commutes the arguments before executing the
regexp operator.
As a mental note to the name consider that ~*
is the case insensitive
regular expression operator, and that @
is used for many array
operators. So ~*@
denotes the case insensitive regular expression operator
for array use.
With the new operator ~*@
the query can be written as:
SELECT drugs FROM table WHERE '^abbrev' ~*@ ANY(drugs)
A complete example that details use in a filter follows:
filter:
name: Prescription medications
description: |-
Prescription medication by (partial) Anatomical Therapeutic Chemical (ATC) classification code.
select: prescriptionmedications
columnname: prescriptionmedications
filters:
"A: Alimentary tract and metabolism": "'^A' ~*@ ANY(%t)"
"A01: Stomatologicals preparations": "'^A01' ~*@ ANY(%t)"
"A01A: Stomatologicals preparations": "'^A01A' ~*@ ANY(%t)"
... lines omitted for brevity
"V10XA: Iodine (131 i) compounds": "'^V10XA' ~*@ ANY(%t)"
"V10XX: Various therapeutic radiopharmaceuticals": "'^V10XX' ~*@ ANY(%t)"
"V20: Surgical dressings": "'^V20' ~*@ ANY(%t)"
Similarly the physician needs to be able to select all patients that have not had a particular medication. The negation of the filter is described below:
filter:
name: Exclude Prescription medications
description: |-
Exclude prescription medication by (partial) Anatomical Therapeutic Chemical (ATC) classification code.
select: prescriptionmedications
columnname: excludeprescriptionmedications
filters:
"A: Alimentary tract and metabolism": "('^A' ~*@ ANY(%t)) IS NOT TRUE"
"A01: Stomatologicals preparations": "('^A01' ~*@ ANY(%t)) IS NOT TRUE"
"A01A: Stomatologicals preparations": "('^A01A' ~*@ ANY(%t)) IS NOT TRUE"
... lines omitted for brevity
"V10XA: Iodine (131 i) compounds": "('^V10XA' ~*@ ANY(%t)) IS NOT TRUE"
"V10XX: Various therapeutic radiopharmaceuticals": "('^V10XX' ~*@ ANY(%t)) IS NOT TRUE"
"V20: Surgical dressings": "('^V20' ~*@ ANY(%t)) IS NOT TRUE"
This filter expression is true when no medication matches, but also when no
medications have been given at all (NULL
column).