4. Dataset Types¶
4.1. Dataset types¶
The Dataset Builder allows creation of three distinct types of datasets:
unit datasets
unit-day datasets
unit-event datasets
4.1.1. Unit datasets¶
Unit datasets consist of one row per unit or combination of units. This type of dataset can be used to create datasets that for instance contain:
a number of aggregates per patient (e.g.: last measured blood pressure, average weight of the patient)
a number of aggregates per encounter (e.g.: average measured heartrate per encounter)
a number of aggregates per combination of organisation and location (e.g. average number of complications per organisation-location)
4.1.2. unit-day datasets¶
Unit-day datasets consist of one row per unit per day. This type of dataset can be used to create datasets that for instance contain:
per patient, per day (for which INRs are recorded for this patient) the last measured INR
per patient, per day the average bloodpressure
Note
Unit-day datasets with multiple features columns will mostly contain empty (NULL) columns. If for instance, a patient has a bloodpressure measurement the 1st of January and an INR on the 2nd of January the result will be a row for both dates with an empty INR column for the 1st of January row and an empty bloodpressure column for the 2nd of January row.
4.1.3. Unit-event datasets¶
Unit-event datasets consist of one row per event. This results in the largest tables, since no aggregate is applied to the features. The resulting Dataset contains one row per feature row (e.g. one row per measured bloodpressure). The Unit Columns are also present in each row.
4.2. Which one to pick¶
Different use cases require different dataset types. For instance, a dataset that requires for every patient one row containing the last known status of a number of aggregated features (last measured blood pressure, average INR last month, number of foot checkups this year) can be built using the unit datasets. A dataset that should contain for each patient (or research subject) all of its measured INRs (a long instead of wide table), can be built as a unit-event dataset. The unit day dataset is used for use cases where that output consists of measurements (on a patient, or perhaps per practitioner) that have to be aggregated on a daily basis.
4.3. Multiple datasets, simple datamarts¶
Often, the Dataset Builder is used to return not one but multiple datasets. Combining a unit datasets with unit-event datasets allows for creation of simple dimensional datamarts, containing a few dimensions and facts. E.g. to build a datamart that links patients and practioners to visits, medications and observations you would create two unit datasets for patients and practitioners and three unit-event datasets: visits, medications and observations. The unit-event columns will only contain ids for the patient and practitioner units, thus mitigating the redundancy of having all Unit Columns present in the unit-event table.