Ontologies for Longitudinal Health Records: Difference between revisions

From Research Wiki
Research page: ontologies, terminologies, standards for PHKG and longitudinal records
 
Added AIDAVA Ontology Architecture section — reference KG, PHKG instance model, use case extensions
Line 152: Line 152:
* "Genomics on FHIR — a feasibility study to support a National Strategy for Genomic Medicine" — Nature (2024)
* "Genomics on FHIR — a feasibility study to support a National Strategy for Genomic Medicine" — Nature (2024)
* "TIMER: temporal instruction modeling and evaluation for longitudinal clinical records" — npj Digital Medicine (2025)
* "TIMER: temporal instruction modeling and evaluation for longitudinal clinical records" — npj Digital Medicine (2025)
== AIDAVA Ontology Architecture ==
[[AIDAVA]] uses a reference knowledge graph architecture where each Personal Health Knowledge Graph (PHKG) is an instance of a common reference model based on multiple ontologies.
=== Reference Knowledge Graph ===
The AIDAVA reference knowledge graph integrates:
* '''SNOMED CT''' — Primary clinical concept representation. All clinical observations, diagnoses, procedures, and findings are mapped to SNOMED CT concepts.
* '''HL7 FHIR Resource Profiles''' — Structural framework. Data is organized according to FHIR resource types (Patient, Observation, Condition, MedicationRequest, Encounter, etc.) with profiles specific to each use case.
* '''LOINC''' — Measurement identification. Each laboratory test, vital sign, and clinical measurement is identified by its LOINC code.
* '''Domain-specific terminologies''' — Additional vocabularies for specific use cases:
** '''ICD-10/11''' — Disease classification for reporting
** '''RxNorm''' — Medication nomenclature
** '''Orphanet''' — Rare disease coding
** '''HPO''' — Phenotype annotation (use case specific)
=== PHKG Instance Model ===
Each patient's PHKG is structured as:
# '''Patient node''' — central entity with demographic and identifier data
# '''Encounter nodes''' — healthcare visits, linked to temporal data
# '''Observation nodes''' — clinical measurements (LOINC-coded, SNOMED-described)
# '''Condition nodes''' — diagnoses and problems (SNOMED CT coded)
# '''Procedure nodes''' — treatments and interventions (SNOMED CT coded)
# '''Medication nodes''' — prescriptions and administrations (RxNorm coded)
Relationships between nodes encode temporal sequences, causal links, and clinical context — enabling longitudinal analysis across the patient's entire health history.
=== Use Case Specific Extensions ===
'''Breast Cancer Registry (Use Case 1):'''
* Extends with: TNM staging, histology codes (ICD-O-3), treatment protocols
* Data sources: Structured registry data across 3 university hospitals
* Languages: Dutch, German, Estonian
'''Cardiovascular Longitudinal Records (Use Case 2):'''
* Extends with: Cardiac imaging codes, biomarker reference ranges, risk scores
* Data sources: Heterogeneous EHR data integrated over time
* Languages: Dutch, German, Estonian
=== FAIRification Pipeline ===
AIDAVA's ontology architecture enables automated FAIRification:
# '''Findable:''' Each concept gets a persistent URI linked to the ontology
# '''Accessible:''' FHIR API endpoints expose data in standard formats
# '''Interoperable:''' SNOMED/LOINC/FHIR mappings enable cross-institutional data exchange
# '''Reusable:''' Rich metadata and provenance tracking via ontology relationships


== See Also ==
== See Also ==

Revision as of 14:08, 14 April 2026

Ontologies, terminologies, and standards used in longitudinal health records and Personal Health Knowledge Graphs (PHKG). This page maps the technology stack used by AIDAVA and related projects for health data interoperability, FAIRification, and semantic integration.

Core Clinical Terminologies

SNOMED CT

Systematized Nomenclature of Medicine — Clinical Terms

  • Purpose: Comprehensive clinical terminology covering diseases, findings, procedures, body structures, organisms, substances, etc.
  • Scale: 350,000+ concepts, 1M+ relationships
  • Governance: SNOMED International (non-profit)
  • Use in PHKG: Primary concept representation for clinical data. AIDAVA uses SNOMED CT as the backbone ontology for its reference knowledge graph — each PHKG instance maps clinical observations to SNOMED concepts.
  • Key feature: Compositional — can express complex clinical concepts by combining simpler ones (post-coordination)
  • Mapping: Maps to ICD-10, LOINC, Read Codes, and national terminologies
  • Source: https://www.snomed.org

LOINC

Logical Observation Identifiers Names and Codes

  • Purpose: Universal standard for identifying medical laboratory observations, clinical measurements, and survey instruments
  • Scale: 100,000+ codes
  • Governance: Regenstrief Institute
  • Use in PHKG: Identifies what was measured (lab test, vital sign, clinical observation). SNOMED describes the concept; LOINC identifies the measurement.
  • Key feature: Every code has 6 axes: component, property, time, system, scale, method
  • Example: LOINC 2345-7 = "Glucose [Mass/volume] in Serum or Plasma"
  • Source: https://loinc.org

ICD-10 / ICD-11

International Classification of Diseases

  • Purpose: Standard diagnostic classification for epidemiology, health management, and clinical purposes
  • Governance: WHO
  • Use in PHKG: Disease classification and mortality coding. Less granular than SNOMED CT but universally mandated for billing and reporting.
  • Key difference from SNOMED: ICD is a classification (flat hierarchy for reporting); SNOMED is a terminology (rich relationships for clinical reasoning)
  • Mapping: SNOMED CT ↔ ICD-10 maps maintained by SNOMED International
  • Source: https://icd.who.int

RxNorm

Normalized Names for Clinical Drugs

  • Purpose: Standardized nomenclature for clinical drugs in the US, increasingly used globally
  • Governance: NLM (US National Library of Medicine)
  • Use in PHKG: Medication representation in longitudinal records — linking prescriptions, dispensing, and administration
  • Key feature: Provides ingredient, dose form, and strength as separate concepts
  • Source: https://www.nlm.nih.gov/research/umls/rxnorm

Health Data Models & Standards

HL7 FHIR

Fast Healthcare Interoperability Resources

  • Purpose: Standard for exchanging healthcare data electronically
  • Current version: FHIR R4 (Release 4), R5 available
  • Governance: HL7 International
  • Use in PHKG: Defines the resource types (Patient, Observation, Condition, MedicationRequest, etc.) that structure health data exchange. AIDAVA maps its PHKG nodes to FHIR resource profiles.
  • Key feature: RESTful API, JSON/XML, modular resources
  • FHIR Shorthand (FSH): Authoring language for FHIR Implementation Guides and profiles
  • Source: https://hl7.org/fhir

OMOP CDM

Observational Medical Outcomes Partnership Common Data Model

  • Purpose: Standardized data model for observational health data — enables multi-site research
  • Governance: OHDSI (Observational Health Data Sciences and Informatics)
  • Use in PHKG: Common representation for longitudinal observational data across institutions. Researchers can run the same analytics across different hospital systems.
  • Key tables: Person, Condition_occurrence, Drug_exposure, Measurement, Observation, Procedure_occurrence
  • Mapped terminologies: SNOMED CT (conditions), RxNorm (drugs), LOINC (measurements)
  • Tools: ATLAS (cohort definition), OHDSI network studies
  • Source: https://ohdsi.org

openEHR

Open Electronic Health Record

  • Purpose: Open standard for EHR architecture — archetype-based clinical data modeling
  • Governance: openEHR Foundation
  • Use in PHKG: Clinical Knowledge Manager (CKM) provides archetypes (reusable clinical data models). Unlike FHIR (exchange-focused), openEHR is storage/persistence-focused.
  • Key feature: Two-level modeling — reference model (technical) + archetypes (clinical)
  • Different from FHIR: openEHR defines how to STORE data; FHIR defines how to EXCHANGE it
  • Source: https://www.openehr.org

Phenopackets

Phenotype Data Exchange Format

  • Purpose: Standard format for representing phenotypic data linked to genomic data
  • Governance: GA4GH
  • Use in PHKG: Structured phenotype representation for rare disease, linking patient phenotypes (HPO terms) to genomic variants
  • Source: https://phenopackets.org

Domain-Specific Ontologies

Human Phenotype Ontology (HPO)

  • Purpose: Standard vocabulary for phenotypic abnormalities in human disease
  • Scale: 18,000+ terms, 300,000+ annotations to diseases
  • Use in PHKG: Describing patient phenotypes longitudinally — tracking symptoms and signs over time
  • Source: https://hpo.jax.org

Gene Ontology (GO)

  • Purpose: Standard representation of gene function across species
  • Domains: Molecular function, biological process, cellular component
  • Use in PHKG: Linking genomic data to functional annotations in longitudinal genomics records
  • Source: http://geneontology.org

Orphanet Nomenclature

  • Purpose: Standard terminology for rare diseases
  • Scale: 6,000+ rare diseases
  • Use in PHKG: Rare disease identification in longitudinal records, linking to Orphacodes for cross-border data exchange
  • Source: https://www.orpha.net

GA4GH Standards

Global Alliance for Genomics and Health

  • Purpose: Framework for responsible genomic data sharing
  • Key standards:
    • Beacon API: Query whether a dataset contains a particular genomic variant
    • VCF: Variant Call Format for genomic variants
    • Phenopackets: Phenotype data linked to genomics (see above)
    • Passport/DUO: Data use ontology for access control
  • Use in PHKG: Genomic data representation and sharing in longitudinal health records
  • Source: https://www.ga4gh.org

FAIRification & Semantic Web

FAIR Principles

Findable, Accessible, Interoperable, Reusable

  • Applied to health data through:
    • Persistent identifiers (DOIs, URIs)
    • Rich metadata (Dublin Core, DCAT)
    • Standard vocabularies (all ontologies above)
    • Open protocols (REST APIs, SPARQL)
  • AIDAVA connection: AIDAVA's first technology pillar is "Automation of quality enhancement and FAIRification" of collected health data

RDF / OWL / SPARQL

  • RDF: Resource Description Framework — graph data model for representing knowledge
  • OWL: Web Ontology Language — for defining ontologies with rich axioms
  • SPARQL: Query language for RDF databases
  • Use in PHKG: PHKGs are typically represented as RDF graphs, with SNOMED/LOINC/FHIR as the ontology layer

BioPortal

  • Purpose: Repository of biomedical ontologies
  • Scale: 900+ ontologies, 14M+ terms
  • Governance: Stanford BMIR
  • Use in PHKG: Source for ontology mappings, concept searches, and cross-ontology alignment
  • Source: https://bioportal.bioontology.org

Ontology Integration Architecture (PHKG)

A typical Personal Health Knowledge Graph integrates these ontologies in layers:

  1. Top: Patient-specific nodes (this patient, this observation, this encounter)
  2. Middle: FHIR Resource profiles structuring the data (Observation, Condition, Medication)
  3. Bottom: Terminology codes (SNOMED CT for concepts, LOINC for measurements, RxNorm for drugs)

Cross-cutting: ICD for classification/reporting, OMOP CDM for research analytics, HPO for phenotyping, GA4GH for genomics.

Key Research Papers

  • "An ontology-based rare disease common data model harmonising international registries, FHIR, and Phenopackets" — Nature (2025)
  • "CONNECTED: leveraging digital twins and personal knowledge graphs in healthcare digitalization" — Frontiers (2025)
  • "FAIRification of health-related data using semantic web technologies in the Swiss Personalized Health Network" — Nature (2024)
  • "A multimodal vision knowledge graph of cardiovascular disease" — Nature (2025)
  • "Genomics on FHIR — a feasibility study to support a National Strategy for Genomic Medicine" — Nature (2024)
  • "TIMER: temporal instruction modeling and evaluation for longitudinal clinical records" — npj Digital Medicine (2025)

AIDAVA Ontology Architecture

AIDAVA uses a reference knowledge graph architecture where each Personal Health Knowledge Graph (PHKG) is an instance of a common reference model based on multiple ontologies.

Reference Knowledge Graph

The AIDAVA reference knowledge graph integrates:

  • SNOMED CT — Primary clinical concept representation. All clinical observations, diagnoses, procedures, and findings are mapped to SNOMED CT concepts.
  • HL7 FHIR Resource Profiles — Structural framework. Data is organized according to FHIR resource types (Patient, Observation, Condition, MedicationRequest, Encounter, etc.) with profiles specific to each use case.
  • LOINC — Measurement identification. Each laboratory test, vital sign, and clinical measurement is identified by its LOINC code.
  • Domain-specific terminologies — Additional vocabularies for specific use cases:
    • ICD-10/11 — Disease classification for reporting
    • RxNorm — Medication nomenclature
    • Orphanet — Rare disease coding
    • HPO — Phenotype annotation (use case specific)

PHKG Instance Model

Each patient's PHKG is structured as:

  1. Patient node — central entity with demographic and identifier data
  2. Encounter nodes — healthcare visits, linked to temporal data
  3. Observation nodes — clinical measurements (LOINC-coded, SNOMED-described)
  4. Condition nodes — diagnoses and problems (SNOMED CT coded)
  5. Procedure nodes — treatments and interventions (SNOMED CT coded)
  6. Medication nodes — prescriptions and administrations (RxNorm coded)

Relationships between nodes encode temporal sequences, causal links, and clinical context — enabling longitudinal analysis across the patient's entire health history.

Use Case Specific Extensions

Breast Cancer Registry (Use Case 1):

  • Extends with: TNM staging, histology codes (ICD-O-3), treatment protocols
  • Data sources: Structured registry data across 3 university hospitals
  • Languages: Dutch, German, Estonian

Cardiovascular Longitudinal Records (Use Case 2):

  • Extends with: Cardiac imaging codes, biomarker reference ranges, risk scores
  • Data sources: Heterogeneous EHR data integrated over time
  • Languages: Dutch, German, Estonian

FAIRification Pipeline

AIDAVA's ontology architecture enables automated FAIRification:

  1. Findable: Each concept gets a persistent URI linked to the ontology
  2. Accessible: FHIR API endpoints expose data in standard formats
  3. Interoperable: SNOMED/LOINC/FHIR mappings enable cross-institutional data exchange
  4. Reusable: Rich metadata and provenance tracking via ontology relationships

See Also