Ontologies for Longitudinal Health Records: Difference between revisions

Revision as of 14:08, 14 April 2026

Ontologies, terminologies, and standards used in longitudinal health records and Personal Health Knowledge Graphs (PHKG). This page maps the technology stack used by AIDAVA and related projects for health data interoperability, FAIRification, and semantic integration.

Core Clinical Terminologies

SNOMED CT

Systematized Nomenclature of Medicine — Clinical Terms

Purpose: Comprehensive clinical terminology covering diseases, findings, procedures, body structures, organisms, substances, etc.
Scale: 350,000+ concepts, 1M+ relationships
Governance: SNOMED International (non-profit)
Use in PHKG: Primary concept representation for clinical data. AIDAVA uses SNOMED CT as the backbone ontology for its reference knowledge graph — each PHKG instance maps clinical observations to SNOMED concepts.
Key feature: Compositional — can express complex clinical concepts by combining simpler ones (post-coordination)
Mapping: Maps to ICD-10, LOINC, Read Codes, and national terminologies
Source: https://www.snomed.org

LOINC

Logical Observation Identifiers Names and Codes

Purpose: Universal standard for identifying medical laboratory observations, clinical measurements, and survey instruments
Scale: 100,000+ codes
Governance: Regenstrief Institute
Use in PHKG: Identifies what was measured (lab test, vital sign, clinical observation). SNOMED describes the concept; LOINC identifies the measurement.
Key feature: Every code has 6 axes: component, property, time, system, scale, method
Example: LOINC 2345-7 = "Glucose [Mass/volume] in Serum or Plasma"
Source: https://loinc.org

ICD-10 / ICD-11

International Classification of Diseases

Purpose: Standard diagnostic classification for epidemiology, health management, and clinical purposes
Governance: WHO
Use in PHKG: Disease classification and mortality coding. Less granular than SNOMED CT but universally mandated for billing and reporting.
Key difference from SNOMED: ICD is a classification (flat hierarchy for reporting); SNOMED is a terminology (rich relationships for clinical reasoning)
Mapping: SNOMED CT ↔ ICD-10 maps maintained by SNOMED International
Source: https://icd.who.int

RxNorm

Normalized Names for Clinical Drugs

Purpose: Standardized nomenclature for clinical drugs in the US, increasingly used globally
Governance: NLM (US National Library of Medicine)
Use in PHKG: Medication representation in longitudinal records — linking prescriptions, dispensing, and administration
Key feature: Provides ingredient, dose form, and strength as separate concepts
Source: https://www.nlm.nih.gov/research/umls/rxnorm

Health Data Models & Standards

HL7 FHIR

Fast Healthcare Interoperability Resources

Purpose: Standard for exchanging healthcare data electronically
Current version: FHIR R4 (Release 4), R5 available
Governance: HL7 International
Use in PHKG: Defines the resource types (Patient, Observation, Condition, MedicationRequest, etc.) that structure health data exchange. AIDAVA maps its PHKG nodes to FHIR resource profiles.
Key feature: RESTful API, JSON/XML, modular resources
FHIR Shorthand (FSH): Authoring language for FHIR Implementation Guides and profiles
Source: https://hl7.org/fhir

OMOP CDM

Observational Medical Outcomes Partnership Common Data Model

Purpose: Standardized data model for observational health data — enables multi-site research
Governance: OHDSI (Observational Health Data Sciences and Informatics)
Use in PHKG: Common representation for longitudinal observational data across institutions. Researchers can run the same analytics across different hospital systems.
Key tables: Person, Condition_occurrence, Drug_exposure, Measurement, Observation, Procedure_occurrence
Mapped terminologies: SNOMED CT (conditions), RxNorm (drugs), LOINC (measurements)
Tools: ATLAS (cohort definition), OHDSI network studies
Source: https://ohdsi.org

openEHR

Open Electronic Health Record

Purpose: Open standard for EHR architecture — archetype-based clinical data modeling
Governance: openEHR Foundation
Use in PHKG: Clinical Knowledge Manager (CKM) provides archetypes (reusable clinical data models). Unlike FHIR (exchange-focused), openEHR is storage/persistence-focused.
Key feature: Two-level modeling — reference model (technical) + archetypes (clinical)
Different from FHIR: openEHR defines how to STORE data; FHIR defines how to EXCHANGE it
Source: https://www.openehr.org

Phenopackets

Phenotype Data Exchange Format

Purpose: Standard format for representing phenotypic data linked to genomic data
Governance: GA4GH
Use in PHKG: Structured phenotype representation for rare disease, linking patient phenotypes (HPO terms) to genomic variants
Source: https://phenopackets.org

Domain-Specific Ontologies

Human Phenotype Ontology (HPO)

Purpose: Standard vocabulary for phenotypic abnormalities in human disease
Scale: 18,000+ terms, 300,000+ annotations to diseases
Use in PHKG: Describing patient phenotypes longitudinally — tracking symptoms and signs over time
Source: https://hpo.jax.org

Gene Ontology (GO)

Purpose: Standard representation of gene function across species
Domains: Molecular function, biological process, cellular component
Use in PHKG: Linking genomic data to functional annotations in longitudinal genomics records
Source: http://geneontology.org

Orphanet Nomenclature

Purpose: Standard terminology for rare diseases
Scale: 6,000+ rare diseases
Use in PHKG: Rare disease identification in longitudinal records, linking to Orphacodes for cross-border data exchange
Source: https://www.orpha.net

GA4GH Standards

Global Alliance for Genomics and Health

Purpose: Framework for responsible genomic data sharing
Key standards:
- Beacon API: Query whether a dataset contains a particular genomic variant
- VCF: Variant Call Format for genomic variants
- Phenopackets: Phenotype data linked to genomics (see above)
- Passport/DUO: Data use ontology for access control
Use in PHKG: Genomic data representation and sharing in longitudinal health records
Source: https://www.ga4gh.org

FAIRification & Semantic Web

FAIR Principles

Findable, Accessible, Interoperable, Reusable

Applied to health data through:
- Persistent identifiers (DOIs, URIs)
- Rich metadata (Dublin Core, DCAT)
- Standard vocabularies (all ontologies above)
- Open protocols (REST APIs, SPARQL)
AIDAVA connection: AIDAVA's first technology pillar is "Automation of quality enhancement and FAIRification" of collected health data

RDF / OWL / SPARQL

RDF: Resource Description Framework — graph data model for representing knowledge
OWL: Web Ontology Language — for defining ontologies with rich axioms
SPARQL: Query language for RDF databases
Use in PHKG: PHKGs are typically represented as RDF graphs, with SNOMED/LOINC/FHIR as the ontology layer

BioPortal

Purpose: Repository of biomedical ontologies
Scale: 900+ ontologies, 14M+ terms
Governance: Stanford BMIR
Use in PHKG: Source for ontology mappings, concept searches, and cross-ontology alignment
Source: https://bioportal.bioontology.org

Ontology Integration Architecture (PHKG)

A typical Personal Health Knowledge Graph integrates these ontologies in layers:

Top: Patient-specific nodes (this patient, this observation, this encounter)
Middle: FHIR Resource profiles structuring the data (Observation, Condition, Medication)
Bottom: Terminology codes (SNOMED CT for concepts, LOINC for measurements, RxNorm for drugs)

Cross-cutting: ICD for classification/reporting, OMOP CDM for research analytics, HPO for phenotyping, GA4GH for genomics.

Key Research Papers

"An ontology-based rare disease common data model harmonising international registries, FHIR, and Phenopackets" — Nature (2025)
"CONNECTED: leveraging digital twins and personal knowledge graphs in healthcare digitalization" — Frontiers (2025)
"FAIRification of health-related data using semantic web technologies in the Swiss Personalized Health Network" — Nature (2024)
"A multimodal vision knowledge graph of cardiovascular disease" — Nature (2025)
"Genomics on FHIR — a feasibility study to support a National Strategy for Genomic Medicine" — Nature (2024)
"TIMER: temporal instruction modeling and evaluation for longitudinal clinical records" — npj Digital Medicine (2025)

AIDAVA Ontology Architecture

AIDAVA uses a reference knowledge graph architecture where each Personal Health Knowledge Graph (PHKG) is an instance of a common reference model based on multiple ontologies.

Reference Knowledge Graph

The AIDAVA reference knowledge graph integrates:

SNOMED CT — Primary clinical concept representation. All clinical observations, diagnoses, procedures, and findings are mapped to SNOMED CT concepts.
HL7 FHIR Resource Profiles — Structural framework. Data is organized according to FHIR resource types (Patient, Observation, Condition, MedicationRequest, Encounter, etc.) with profiles specific to each use case.
LOINC — Measurement identification. Each laboratory test, vital sign, and clinical measurement is identified by its LOINC code.
Domain-specific terminologies — Additional vocabularies for specific use cases:
- ICD-10/11 — Disease classification for reporting
- RxNorm — Medication nomenclature
- Orphanet — Rare disease coding
- HPO — Phenotype annotation (use case specific)

PHKG Instance Model

Each patient's PHKG is structured as:

Patient node — central entity with demographic and identifier data
Encounter nodes — healthcare visits, linked to temporal data
Observation nodes — clinical measurements (LOINC-coded, SNOMED-described)
Condition nodes — diagnoses and problems (SNOMED CT coded)
Procedure nodes — treatments and interventions (SNOMED CT coded)
Medication nodes — prescriptions and administrations (RxNorm coded)

Relationships between nodes encode temporal sequences, causal links, and clinical context — enabling longitudinal analysis across the patient's entire health history.

Use Case Specific Extensions

Breast Cancer Registry (Use Case 1):

Extends with: TNM staging, histology codes (ICD-O-3), treatment protocols
Data sources: Structured registry data across 3 university hospitals
Languages: Dutch, German, Estonian

Cardiovascular Longitudinal Records (Use Case 2):

Extends with: Cardiac imaging codes, biomarker reference ranges, risk scores
Data sources: Heterogeneous EHR data integrated over time
Languages: Dutch, German, Estonian

FAIRification Pipeline

AIDAVA's ontology architecture enables automated FAIRification:

Findable: Each concept gets a persistent URI linked to the ontology
Accessible: FHIR API endpoints expose data in standard formats
Interoperable: SNOMED/LOINC/FHIR mappings enable cross-institutional data exchange
Reusable: Rich metadata and provenance tracking via ontology relationships

@@ Line 152: / Line 152: @@
 * "Genomics on FHIR — a feasibility study to support a National Strategy for Genomic Medicine" — Nature (2024)
 * "TIMER: temporal instruction modeling and evaluation for longitudinal clinical records" — npj Digital Medicine (2025)
+== AIDAVA Ontology Architecture ==
+[[AIDAVA]] uses a reference knowledge graph architecture where each Personal Health Knowledge Graph (PHKG) is an instance of a common reference model based on multiple ontologies.
+=== Reference Knowledge Graph ===
+The AIDAVA reference knowledge graph integrates:
+* '''SNOMED CT''' — Primary clinical concept representation. All clinical observations, diagnoses, procedures, and findings are mapped to SNOMED CT concepts.
+* '''HL7 FHIR Resource Profiles''' — Structural framework. Data is organized according to FHIR resource types (Patient, Observation, Condition, MedicationRequest, Encounter, etc.) with profiles specific to each use case.
+* '''LOINC''' — Measurement identification. Each laboratory test, vital sign, and clinical measurement is identified by its LOINC code.
+* '''Domain-specific terminologies''' — Additional vocabularies for specific use cases:
+** '''ICD-10/11''' — Disease classification for reporting
+** '''RxNorm''' — Medication nomenclature
+** '''Orphanet''' — Rare disease coding
+** '''HPO''' — Phenotype annotation (use case specific)
+=== PHKG Instance Model ===
+Each patient's PHKG is structured as:
+# '''Patient node''' — central entity with demographic and identifier data
+# '''Encounter nodes''' — healthcare visits, linked to temporal data
+# '''Observation nodes''' — clinical measurements (LOINC-coded, SNOMED-described)
+# '''Condition nodes''' — diagnoses and problems (SNOMED CT coded)
+# '''Procedure nodes''' — treatments and interventions (SNOMED CT coded)
+# '''Medication nodes''' — prescriptions and administrations (RxNorm coded)
+Relationships between nodes encode temporal sequences, causal links, and clinical context — enabling longitudinal analysis across the patient's entire health history.
+=== Use Case Specific Extensions ===
+'''Breast Cancer Registry (Use Case 1):'''
+* Extends with: TNM staging, histology codes (ICD-O-3), treatment protocols
+* Data sources: Structured registry data across 3 university hospitals
+* Languages: Dutch, German, Estonian
+'''Cardiovascular Longitudinal Records (Use Case 2):'''
+* Extends with: Cardiac imaging codes, biomarker reference ranges, risk scores
+* Data sources: Heterogeneous EHR data integrated over time
+* Languages: Dutch, German, Estonian
+=== FAIRification Pipeline ===
+AIDAVA's ontology architecture enables automated FAIRification:
+# '''Findable:''' Each concept gets a persistent URI linked to the ontology
+# '''Accessible:''' FHIR API endpoints expose data in standard formats
+# '''Interoperable:''' SNOMED/LOINC/FHIR mappings enable cross-institutional data exchange
+# '''Reusable:''' Rich metadata and provenance tracking via ontology relationships
 == See Also ==