User Guide

This guide describes general topics about the HL7v3 and SQL integration. We assume general knowledge about HL7v3 and SQL. No particular Unix or programming experience is required.

General Information

The Healthcare Datatypes is a specification of datatypes drafted for the purpose of information exchange, with XML as the target information medium. To be able to use data expressed in HL7v3 format as close as possible in a relational database system, MGRID has created the HDL that consists of a set of user defined types (UDT) and user defined functions (UDF).

The following aspects of the type implementation require special attention:

  1. The object oriented type hierarchy, see Type hierarchy.
  2. The NullFlavors, see NullFlavor Handling.
  3. The different type categories; simple and complex. See Complex and Simple types.

Complex and Simple types

  1. All Healthcare Datatypes have a complex type implementation. Complex types have a JSON literal form. Complex types can be recognized by their uppercase typename, such as CD, PQ and ANY. Complex types exist to persist input transformed from XML HL7v3 messages.

    To date two different versions of the Healthcare Datatypes have been defined by HL7; datatypes R1 and R2. Starting with Normative Edition 2012, the RIM has switched from the R1 types to the R2 types. The HDL implementations are bound to schema r1 and r2. Users can select the release they want by adding the appropriate schema to their search_path.

    See Complex Types for more information and a complete list.

    SELECT cvinval(code := 'DOCBODY', "codeSystem" := '2.16.840.1.113883.5.6');
                                       cvinval
    ------------------------------------------------------------------------------
     {"code": "DOCBODY", "dataType": "CV", "codeSystem": "2.16.840.1.113883.5.6"}
    (1 row)
    
  2. Some Healthcare Datatypes have a simple type implementation. Simple types, that have a lowercase typename, are scalar types that resemble the built-in types of the database server, such as integer, text, and date. See Simple Types for more information and a complete list. These types are bound to the schema hl7.

    The simple types exist to allow in-database computations and additional indexing on Healthcare Datatypes, such as physical quantity conversions and partial ordering, concept descriptor subtree search, and sets of interval of time containment and overlap queries.

    For types that have a simple type implementation, CASTs are implemented that allow conversion from complex to simple form, to enable computations.

    SELECT cvinval(code := 'DOCBODY', "codeSystem" := '2.16.840.1.113883.5.6')::cv;
                cvinval
    -------------------------------
     DOCBODY:2.16.840.1.113883.5.6
    (1 row)
    

    ... to enable computations.

    SELECT cvinval(code := 'DOCBODY', "codeSystem" := '2.16.840.1.113883.5.6')::cv
              << 'CONTAINER'::cv('ActClass');
     ?column?
    ----------
     t
    (1 row)
    

    There is no R1 or R2 distinction for simple types: it is possible to cast both the R1 and the R2 version of a complex type to the simple type.

Schemas (namespaces)

The Healthcare Datatypes are created in the schemas

  1. hl7 for the simple types
  2. r1 for the complex R1 types
  3. r2 for the complex R2 types
  4. hdl for functions and support types

It is necessary to add these schemas to the database its default search_path, by using the ALTER DATABASE command. See ERROR: type “<name>” does not exist.

A few of this HDL named types conflict with the reserved words of PostgreSQL. These are any, and, or, not and precision. For these names it is necessary to always prefix the typename with hl7.:

show search_path;
             search_path
-------------------------------------
 "",public,hl7,hdl,r1

Since any is a reserved word, the type any cannot be named directly:

create table ta (a any);
ERROR:  syntax error at or near "any"
LINE 1: create table ta (a any);

Since any also conflicts with PostgreSQLs own any, adding quotes does not work either:

create table ta (a "any");
ERROR:  column "a" has pseudo-type "any"

The correct way to name HL7s any is to prefix it with the schema name:

create table ta (a hl7.any);
CREATE TABLE

Type hierarchy

The Healthcare Datatypes are specified using a type hierarchy. The simple types, R1 complex types and R2 complex types each have their own hierarchy. The root of the simple type hierarchy is the any type. All simple types can be cast to any. The root type of the complex types are ANY and HXIT, for the R1 and R2 types respectively.

If a type is a subtype of another type, conversion from one type to the other is implemented with a type cast. With the type cast, it is possible to insert e.g. a PQ value in a database column with type ANY. With the type cast, it is also possible to convert the ANY datavalue back to its original type PQ. Examples where ANY values are cast back to their original type can be found in Appending operators to select paths and Does complex data value contain a value?.

It is also possible to switch from the complex type hierarchy to the simple type hierarchy with casts. For instance, it is possible to cast the complex type PQ to the simple type pq.

NullFlavor Handling

NullFlavors, such as NullFlavor.UNK, extend the normal domain of each healthcare datatype. Almost every datavalue can be a NullFlavor instead of a normal value. In every aspect of working with the Healthcare Datatypes, NullFlavor handling plays a role. NullFlavors are defined in NullFlavors and Interaction between database NULL and NullFlavors describes the interaction between the database servers NULL and the NullFlavors.

BL returning functions vs operators

For each BL returning property of the Healthcare Datatypes such as equal and lessthan, two implementation forms exist:

  1. A BL-returning user defined function (UDF) with the name of the property, such as equal. This function may be used in SELECT lists. It is not endorsed to use these functions in WHERE clauses, where operators are preferred instead.
  2. An operator, such as =. In WHERE clauses the operator form of properties is preferred, since operators enable the use of indexes. Though operators may be used in SELECT lists, it is not recommended to do so, since the operators do not return NullFlavors. In SELECT lists, the function form of the property is recommended.

Proper use of functions and operators in select lists and where clauses

 equal
-------
 true

=> SELECT equal('1 l'::pq, '1 dm3')  -- use functions in select lists
    WHERE '2 l'::pq > '0.1 ml';      -- use operators in where clause

Using indexes

The default behaviour of the CREATE INDEX command is to create an index with the default btree operator class for that datatype. While the btree indexes are useful for primary key and total ordering purposes, the most useful operators on Healthcare Datatypes can be indexed by gist indexes. This section shows how to create gist indexes. Section Index support lists all the index types and operators. An example subset of that table is repeated below.

 indexed_type  | index_method |     opclass_name      | is_default |        operators
---------------+--------------+-----------------------+------------+--------------------------
 cv            | gist         | gist_cv_ops           | t          | = >> <<
 ii            | gist         | gist_ii_ops           | t          | =
 ii[]          | gin          | gin__ii_ops           | t          | && <@ @> =
 ivl_pq        | gist         | gist_ivl_pqops        | t          | && = ~ @
 ivl_ts        | gist         | gist_ivl_tsops        | t          | && = ~ @
 ivl_ts        | gist         | gist_ivl_tsunionops   | f          | && = ~ @
 pq            | btree        | pq_ops_identical      | f          | ==
 pq            | gist         | gist_pq_ops           | t          | < <= = >= >
 pq[]          | gin          | gin__pq_ops           | t          | && <@ @> =

For operator classes that are not default for the index type and index method, it is necessary to specify the operator class name when creating the index, as is shown in the example below.

Use an index range scan on pq

This example shows how to create an index for a table with a pq column, and how to check if a query uses the index for a range scan with <.

CREATE TABLE mytable (mycol pq);
INSERT INTO mytable SELECT a::text || ' [yd_us]' FROM generate_series(1,10000) AS a(a);
INSERT INTO mytable SELECT a::text || ' [ft_us]' FROM generate_series(1,10000) AS a(a);
CREATE INDEX myindex ON mytable USING GIST(mycol gist_pq_ops);
ANALYZE mytable;
EXPLAIN SELECT * FROM mytable WHERE mycol < '10 [ft_us]';
                              QUERY PLAN
----------------------------------------------------------------------
 Bitmap Heap Scan on mytable  (cost=4.18..18.81 rows=4 width=86)
   Recheck Cond: (mycol < '10 [ft_us]'::pq)
   ->  Bitmap Index Scan on myindex  (cost=0.00..4.18 rows=4 width=0)
         Index Cond: (mycol < '10 [ft_us]'::pq)
 Planning time: 0.165 ms
(5 rows)

SELECT * FROM mytable WHERE mycol < '4 [ft_us]' ORDER BY mycol;
   mycol
-----------
 1 [ft_us]
 2 [ft_us]
 3 [ft_us]
 1 [yd_us]
(4 rows)

Create a multicolumn gist index on ii and cv

This example shows how to create a multicolumn index. Because the operator class names are not specified, the default operator classes gist_ii_ops and gist_cv_ops are used.

CREATE TABLE mytable (id ii, code cv);
CREATE INDEX myindex ON mytable USING gist(id, code);
INSERT INTO mytable VALUES ('38e0055e-3e3d-11de-ba73-f72d0933defc','274589008'::cv('SNOMED-CT'));
SET enable_seqscan = false;
EXPLAIN ANALYZE SELECT * FROM mytable
  WHERE id = '38e0055e-3e3d-11de-ba73-f72d0933defc'
  AND code << '74627003'::cv('SNOMED-CT');
                                QUERY PLAN
----------------------------------------------------------------------------------
 Index Scan using myindex on mytable  (cost=2.50..10.77 rows=1 width=43)
                                      (actual time=25.716..25.719 rows=1 loops=1)
   Index Cond: ((id = '38e0055e-3e3d-11de-ba73-f72d0933defc'::ii)
                AND (code << '74627003:2.16.840.1.113883.6.96'::cv('SNOMED-CT')))
 Total runtime: 0.128 ms
(3 rows)

Working with code systems

This section describe how to perform common actions when working with codesystems in the database.

Installing a codesystem

Before a codesystem can be used to instantiate CV coded values, the codesystem must be loaded into the tables created by the HL7 base table module.

Installing the codesystem files on the host operating system is described in Installing the software. Once the files are installed, a codesystem can be loaded in a database by loading the corresponding extension.

Installing a codesystem in a database

CREATE EXTENSION snomedctvocab_20110731;
CREATE EXTENSION

Once the codesystem is loaded, it can be used by instantiating CV data values that use the codesystem, in the same database.

Codesystems are scoped within a single database. If the codesystem needs to be used in another database, the codesystem must be loaded in the other database as well. If different versions of the same codesystem must be used in parallel, this can be achieved by creating a separate database for each codesystem version that must be loaded.

Exploring a codesystem

Use the function codesystem to query contents of HL7v3 codesystems.

Display a tree of the full codesystem

With the codesystem function a complete codesystem is shown as text. This is useful relatively small codesystems, like most of HL7v3s codesystems. The codesystem name is matched case sensitive. When no codesystem is found, no results and no error is shown.

select codesystem('EntityClass');
              codesystem
--------------------------------------
 entity (ENT)
  health chart entity (HCE)
  living subject (LIV)
   non-person living subject (NLIV)
    animal (ANM)
    microorganism (MIC)
    plant (PLNT)
   person (PSN)
  material (MAT)
   chemical substance (CHEM)
   food (FOOD)
   manufactured material (MMAT)
    container (CONT)
     holder (HOLD)
    device (DEV)
     certificate representation (CER)
     imaging modality (MODDV)
  organization (ORG)
   public institution (PUB)
   state (STATE)
    Nation (NAT)
  place (PLC)
   city or town (CITY)
   country (COUNTRY)
   county or parish (COUNTY)
   state or province (PROVINCE)
  group (RGRP)
(27 rows)

Search a codesystem for a text

It is also possible to query a codesystem for codes that have a certain text as part of their description:

SELECT codesystem('ActClass','obs');
                codesystem
-------------------------------------------
 genomic observation (GEN)
 observation (OBS)
 correlated observation sequences (OBSCOR)
 observation series (OBSSER)
 specimen observation (SPCOBS)
(5 rows)

Instead of using a substring search on the designation, it is also possible to query for subtrees of hierarchical codesystems ‘above’ or ‘under’ a given coded value. This is shown in examples Show the ancestor hierarchy of a coded value and Show the descendants of a coded value.

Working with coded values

MGRID HDL contains CV implemented as a base type, that together with its functions and operators provide robust in-database vocabulary support. Functions on ‘cv’ lists the functions and operators on CV.

Creating coded values

Constructor methods for ‘cv’ describes how coded values can be instantiated with the standard literal form, as well as using a type modifier to specify the conceptdomain. A CV constrained to a particular conceptdomain, such as CV('ActClass'), can be used to create tables:

Create a table column for codes from a specific conceptdomain

This example requires that a HL7v3 vocabulary extension is loaded.

CREATE TABLE examplecv (a cv('ActClass'));
INSERT INTO examplecv VALUES ('GEN'),('OBS'),('PCPR'),('SBADM');
SELECT * from examplecv;

                                       a
--------------------------------------------------------------------------------
 GEN:2.16.840.1.113883.5.6@2009-10-20:2.16.840.1.113883.1.11.11527@2009-10-20
 OBS:2.16.840.1.113883.5.6@2009-10-20:2.16.840.1.113883.1.11.11527@2009-10-20
 PCPR:2.16.840.1.113883.5.6@2009-10-20:2.16.840.1.113883.1.11.11527@2009-10-20
 SBADM:2.16.840.1.113883.5.6@2009-10-20:2.16.840.1.113883.1.11.11527@2009-10-20
(4 rows)

Lookup information on a coded value

The example below shows how to lookup the displayname of a code. Note that this example requires that a SNOMED-CT vocabulary extension is loaded.

Lookup information on a coded value
select displayname('409586006'::cv('SNOMED-CT'));
 displayname
-------------
 Complaint
Lookup information on coded values in a table

The example below shows how to get code and codesystem information from a CV stored in the table. Note: this table is created by example Create a table column for codes from a specific conceptdomain. See Functions on ‘cv’ for a complete list of functions that can be used.

SELECT displayname(a), codesystemname(a),
code(a), codesystem(a), implies(a,'OBS'::cv('ActClass'))
FROM examplecv;
       displayname        | codesystemname | code  |      codesystem       | implies
--------------------------+----------------+-------+-----------------------+---------
 genomic observation      | ActClass       | GEN   | 2.16.840.1.113883.5.6 | true
 observation              | ActClass       | OBS   | 2.16.840.1.113883.5.6 | true
 care provision           | ActClass       | PCPR  | 2.16.840.1.113883.5.6 | false
 substance administration | ActClass       | SBADM | 2.16.840.1.113883.5.6 | false
(4 rows)

For codesystems with an IsA hierarchy, it is possible to view the ancestor hierachy with the function ancestors. This function is provided for browse functionality only, and is not optimized to be used in other queries.

Show the ancestor hierarchy of a coded value
select ancestors('409586006'::cv('SNOMED-CT'));
                        ancestors
----------------------------------------------------------
 SNOMED CT Concept (138875005)
  Clinical finding (404684003)
   Clinical history and observation findings (250171008)
    General finding of observation of patient (118222006)
     General problem AND/OR complaint (105721009)
      Complaint (409586006)
(6 rows)

For codesystems with a polyhierarchy, where concepts can have more than one parent, the ancestor function will show a best effort in displaying the hierarchy, as is shown in the example below. In this example, some of the displaynames are indented more than one position compared to the preceding line. This non-uniform indentation is an indication of the existence of more than one parent, as the number of spaces for each code is the equal to the number of ancestors. All concepts shown are ancestors of the input code, ‘fracture of femur’ in the example below. Since the indenting does not show a straight line, it is clear that one or more concepts in the ancestor hierarchy have more than one parent. The polyhierarchy is branched off under ‘clinical finding’ from ‘finding by site’ and ‘disease’.

select ancestors('71620000'::cv('SNOMED-CT'));
                              ancestors
---------------------------------------------------------------------
 SNOMED CT Concept (138875005)
  Clinical finding (404684003)
   Finding by site (118234003)
    Finding of body region (301857004)
     Finding of limb structure (302293008)
   Disease (64572001)
     Disorder by body site (123946008)
      Disorder of body system (362965005)
        Disorder of extremity (128605003)
      Disorder of connective tissue (105969002)
    Musculoskeletal finding (106028002)
        Disorder of musculoskeletal system (928000)
     Bone finding (118953000)
          Musculoskeletal and connective tissue disorder (312225001)
         Disorder of skeletal system (88230002)
             Disorder of bone (76069003)
    Traumatic AND/OR non-traumatic injury (417163006)
     Traumatic injury (417746004)
      Finding of lower limb (116312005)
       Injury of anatomical site (123397009)
            Injury of musculoskeletal system (105606008)
          Disorder of lower extremity (118937003)
       Finding of thigh (419003001)
          Injury of connective tissue (385424001)
              Injury of lower extremity (127279002)
            Disorder of thigh (128135006)
                 Injury of thigh (7523003)
                   Bone injury (284003005)
                    Fracture of bone (125605004)
                           Fracture of lower limb (46866001)
                               Fracture of femur (71620000)
(31 rows)
Show the descendants of a coded value

Use the descendants function to view the subtree under the given coded value. Note that for a large codesystem as SNOMED-CT, when a general code such as clinical finding is queried, the function will take a lot of memory and time to complete. Non-uniform indentation indicates the presence of concepts with more than one parent.

select descendants('ORG'::cv('EntityClass'));
select descendants('54441004'::cv('SNOMED-CT'));
        descendants
---------------------------
 organization (ORG)
  public institution (PUB)
  state (STATE)
   Nation (NAT)
(4 rows)

                     descendants
------------------------------------------------------
 Fracture of shaft of femur (54441004)
        Closed fracture of shaft of femur (26442006)
            Open fracture of shaft of femur (6628008)
(3 rows)

Importing xml messages

This section covers import and export of information to and from a RIM database. As working with the HL7v3 requires an exact match with the datatype specification and message type at hand, it is recommended to use messages as an interface between the MGRID HDM RIM and an application using it, instead of manually writing software that populates and queries a RIM database. By adopting the method of generating message parsers and exporters from models, the system architecture benefits from a loosly coupled architecture.

Create a message parser

How to create a converter that takes a XML message and transforms it into SQL is described in the documentation of the MGRID Messaging SDK, available from the documentation page on www.mgrid.net.

Using a message parser to populate a RIM database

All message parsers that are generated with the MGRID messaging suite, take a HL7v3 XML file and translate that to a set of SQL statements bundled in an anonymous block. When the output is fed to the database, the anonymous block is either loaded succesfully as a whole, or no records are loaded at all, if an error is raised. The translator can be used as a part of a bigger application, but also run from the command line, as is shown below.

Load a CDA R2 xml instance in a RIM database

cd mgrid-messaging/cda_r2
python convert_CDA_R2.py tests/source/mgrid.1/patientID-PS.1.2.3.xml
DO
$$DECLARE

document0 bigint;
organization1 bigint;
person2 bigint;

... (rows omitted here)

participation147 := Participation_insert(_mif := 'POCD_MT000040', _clonename := 'Performer1', "act" := act65, "functionCode" ...

participation148 := Participation_insert(_mif := 'POCD_MT000040', _clonename := 'RecordTarget', "act" := document0, ...

END$$;

To load this anonymous block in the database, pipe the output to psql:

cd mgrid-messaging/cda_r2
python convert_CDA_R2.py tests/source/mgrid.1/patientID-PS.1.2.3.xml | psql <database>

Querying the RIM

The MGRID HDM is a database implementation of the HL7v3 RIM. With the HDL and HDM, it is possible to query HL7v3 information in a relational database with SQL. This section covers topics that are of importance when querying the RIM.

Safe querying a RIM database

If a database user queries the Observation table, without taking into account the attributes that carry meta information, such as the negation indicators, nullFlavors or uncertainties, information can be misinterpreted. For instance, an Observation with a code for headache and value negation indicator true, carries the meaning that the patient had no headache. But when queried without the negation indicator in the WHERE clause or the SELECT list, the meaning is exactly the opposite. Clearly this is an error. A query that ignores the negation indicator is correct only, when there are no negated acts in the database. As this cannot be assumed for all RIM databases, such a query is considered ‘unsafe’. The topic how to query individual components from the RIM, without losing context or meaning, is called ‘Safe querying of the RIM’.

The following table lists possible places where necessary context can be found, that cannot be omitted without changing meaning. Queries on the RIM that do not take into account all of these attributes will misinterpret information, if there is information in the RIM database where a non-queried attribute is set to a non-default value.

Context Attributes

Attribute
Act.nullFlavor
Act.moodCode
Act.actionNegationInd
Act.negationInd
Act.statusCode
Act.uncertaintyCode
Observation.valueNegationInd
ActRelationship.inversionInd
ActRelationship.negationInd
ActRelationship.uncertaintyCode
Participation.negationInd

Table: Context Attributes

Context conduction

Note that Context Attributes did not list context conduction codes. Context conduction takes care of propagating and overriding participations through relationships of acts. With active context conduction in the database, there is no need to traverse the RIM graph to find, for instance, a record target participation. Instead this record target participation, if specified with a conducting code, will be associated with each act in the context, as follows from the conducting act relationships.

Querying with templates

Another way to limit the effect of misinterpretation is by querying using template ids. If for instance a query selects only data from the RIM from C-CDA templates, it is known a priori that no Act will have a non default value for Act.uncertaintyCode, so that attribute can be ignored.