Stamen Health Executive Summary

From Research Wiki
Revision as of 16:03, 15 April 2026 by Admin 3julmthh (talk | contribs) (Major rewrite of 95% chapter: defined accuracy metrics, added compounding problem, removed unsupported 95% claim, added HITL costs, corrected MDR direction, dropped unlocks-everything table, added feasibility study milestone)

Executive summary and one-page plan for Stamen Health, the EHDS compliance and health data infrastructure startup from Oslo.

For full strategic analysis, see Stamen Health. For market data, see PHKG Business Models & Market. For AIDAVA research foundation, see AIDAVA.

What This Is

Stamen Health is a planned startup that would take the research architecture from AIDAVA (EU Horizon Europe, €7.7M, 2022-2026) and turn it into a commercial product for EU private hospitals that need to comply with EHDS (European Health Data Space).

The core product: connect to hospital systems, curate fragmented data (structured + unstructured) into structured knowledge graphs using SNOMED CT and FHIR, and produce EHDS-compliant output.

What AIDAVA actually proved: Automated curation of 45% of documents. 20 minutes per document. Tested in 3 languages (Dutch, German, Estonian) across breast cancer and cardiovascular use cases. Usability rated "good" by pilot users, but AI explanations rated "suboptimal." Ends August 2026 as a research prototype — not a commercial product.

What Stamen would need to prove: That this architecture can be production-hardened, sold to private hospitals at commercial scale, and maintained as a reliable compliance tool. None of this has been done yet.

The Opportunity — What's Real

EHDS is real, but distant. The regulation requires standardized health data availability by 2029-2030. National transposition is happening now. Hospitals will need tools — but the spending wave hasn't started yet. This is a build-now-for-2028-revenue bet, not a 2026 revenue story.

Norwegian data advantage is real, but underutilized. Health Minister Vestre said it plainly (April 2026): Norway has the data but isn't using it. Helsedataservice applications up 30% in 2025. Government wants data to flow faster. But "wanting" and "having infrastructure" are different things.

No one does the full pipeline commercially. The competitive analysis shows companies doing pieces: PicnicHealth (US, data collection), Better (FHIR), Averbis (German NLP), Healx (knowledge graphs). Nobody combines ingestion → NLP → KG → FAIRification → multi-stakeholder reuse as a commercial product. But "nobody does it" can mean either "opportunity" or "there's a reason."

Private hospitals are an underserved market. Public hospitals in the Nordics have national EHR systems (DIPS, Metavision). Private hospitals lack equivalent infrastructure and will feel EHDS pressure first — they need to comply without government IT departments.

The Hard Parts — What Will Be Difficult

Product

  • 45% automation is not enough. Hospitals won't pay for a tool that only automates half their data curation. The product needs 80%+ automation to be commercially viable. Getting from 45% to 80% is a multi-year engineering problem, not a feature request.
  • AIDAVA is a research prototype. It was built by 13-14 academic partners across 9 countries. Hardening it for production — reliability, error handling, hospital IT integration, data privacy compliance — is a completely different engineering challenge than the research.
  • Clinical NLP across languages. AIDAVA's NLP works in Dutch, German, Estonian. Norwegian clinical text has its own quirks (bokmål/nynorsk, medical abbreviations, dialect in notes). Each new language is significant work.
  • SNOMED CT coding at scale. Automated SNOMED CT coding from unstructured text is still an open research problem. AIDAVA's 45% rate reflects this.

Sales

  • Hospitals are slow buyers. Enterprise sales cycles in healthcare are 6-18 months. Private hospitals have smaller IT teams and less budget than public systems. The first 3 customers will take 12+ months to close.
  • No existing budget line. EHDS compliance tools don't have a named budget category yet. Hospitals don't have a "EHDS compliance software" line item. You're creating demand, not capturing it.
  • Competition will come. Better (Slovenia), InterSystems ($5B), IQVIA ($35B), and potentially Microsoft/Google will build EHDS compliance tools. The window for a startup is narrow — probably 2026-2028 before big players move.

Team

  • Finding a clinical NLP / knowledge graph CTO is extremely hard. The intersection of SNOMED CT expertise, production ML engineering, and health data privacy is tiny. In Norway, it's close to zero. This role likely needs to come from the AIDAVA network (Maastricht, Tartu, etc.) — meaning relocation or remote co-founder.
  • COO with hospital sales experience. The plan calls for a Norwegian COO. Someone who has sold to Nordic hospitals, understands procurement, and can navigate hospital politics. These people exist but they're expensive and already employed.
  • Three co-founders with different skill sets. Clinical NLP, hospital sales, and operations. Getting three people aligned on vision, equity split, and working style is hard. Most startups fail on co-founder dynamics, not technology.

Funding

  • €25K/month founder salaries will be challenged. Innovation Norway early-stage approvals are 60-90K NOK/month. SkatteFUNN has fewer restrictions but is a tax credit, not cash. Plan for 70K NOK/month per founder in year 1.
  • Seed round timing. The realistic path: grants first (€300-500K from Innovation Norway + SkatteFUNN), then seed (€2-5M) at month 12-18 after proving the product with 2-3 pilot hospitals. Don't raise seed before having hospital traction.
  • EHDS VC interest is real but early. Nordic VCs are interested in health data platforms (Tandem Health raised €42.6M from Kinnevik). But they want product traction, not just a research prototype.

Realistic 12-Month Plan

Month What What This Actually Means
1-3 Founding team + AIDAVA license Co-founders signed, equity split agreed, AIDAVA technology transfer/licensing negotiated with Maastricht University. Innovation Norway application submitted.
4-6 Architecture + first conversation Take AIDAVA's research code, decide what to rebuild vs. reuse. Have conversations with 5-10 private hospitals. NOT a pilot — just listening to what they actually need.
7-9 Prototype + pilot agreement Working prototype that does ONE thing well: e.g., curating radiology reports into FHIR. Sign pilot agreement with 1 Norwegian private hospital.
10-12 Pilot data + seed prep Run the pilot. Measure: automation rate, time savings, hospital satisfaction. Use pilot data for seed deck. Apply for IPN/SkatteFUNN for year 2 R&D.

Year 1 budget: €300-500K total. Innovation Norway Innovation Contract (~€160K), SkatteFUNN (19% tax credit on R&D), founder contributions, possibly small angel round.

Year 1 team: 3 co-founders + 1 engineer (contract or part-time).

Year 1 goal: NOT revenue. Year 1 goal is: working prototype + first pilot agreement + seed deck with traction data.

Financial Model (Realistic)

Metric Year 1 Year 2 Year 3
Pilots/Customers 1 pilot 2-3 customers 5-8 customers
Revenue €0 €100-200K €500K-1M
Team 3-4 6-8 10-15
Funding source Grants + angel Seed €2-5M Series A €5-10M
Burn rate/month €30-40K €80-120K €200-300K

Revenue model: Per-patient curation fee (€5-15 per record) + platform subscription (€2-5K/month per hospital). Year 1 is zero revenue — this is a grant-funded build year.

Break-even estimate: Year 3-4, with 10+ paying hospitals. Earlier if there's a strong licensing/partnership deal with a hospital group.

Why It Could Work

  1. Timing is right: EHDS deadlines (2029-2030) mean hospitals need to start building infrastructure now. First movers who can sell EHDS compliance have a 2-3 year window before big tech enters.
  2. Norway is the right base: Government wants data to flow faster, has strong data infrastructure, and the trust advantage for EU expansion (Norway = GDPR-conscious, not US big tech).
  3. AIDAVA de-risks the technology: 45% automation with a research prototype across 3 languages. The architecture works — the question is engineering, not science.
  4. No full-stack competitor exists: Companies do pieces. Nobody offers the complete ingestion → curation → compliance → multi-stakeholder reuse pipeline.
  5. Non-dilutive funding is available: €300-500K in Norwegian grants is realistic for year 1. This funds the build phase without giving up equity.

Why It Could Fail

  1. EHDS gets delayed or diluted: Regulatory timelines slip. Hospitals don't feel urgency until 2028.
  2. 45% → 80% is a hard engineering problem: The automation gap closes too slowly. Product isn't good enough for commercial use by year 2.
  3. Co-founder team doesn't hold: Clinical NLP expert, hospital sales operator, and operations lead don't gel. One leaves.
  4. Big tech moves faster than expected: Microsoft, Google, or InterSystems build EHDS compliance tools. Startup window closes.
  5. Hospitals don't buy from startups: Private hospitals prefer established vendors. Trust takes too long to build.
  6. AIDAVA licensing doesn't work out: Maastricht University or other partners make licensing terms unfavorable.

What Actually Matters in Year 1

  1. Ship one thing that works. Not a full platform — one vertical (e.g., radiology reports → FHIR). Make it reliable.
  2. Get one hospital to pay (or commit to pay).' Even a symbolic €5K pilot contract proves someone wants this.
  3. Don't raise too early. Grants fund year 1. Raise seed only when you have traction data.
  4. Find the CTO first. The technical co-founder is the hardest hire and the most important one. Everything else follows.
  5. Don't build for Norway only. EHDS is EU-wide. Every feature should be built for cross-border use from day one.


Curation Accuracy — The Central Technical Question

AIDAVA achieved 45% automated curation with 2022-era open-source NLP tools and early prototypes. The project's conclusion: "true potential for automation in data curation into a harmonised semantic standard, under the form of a Personal Health Knowledge Graph" (D1.7, Jan 2025). The architecture works — the tools were weak.

This chapter addresses the question on which the entire Stamen thesis depends: what curation accuracy is achievable with 2026-era tools, and what does that level enable?

What "Accuracy" Means — And Why the Definition Matters

"Accuracy" is meaningless without specifying the unit of measurement:

Metric What It Measures Difficulty Relevant For
Document-level accuracy % of documents fully correctly curated Hardest — one error = document fails Patient-facing trust
Fact-level accuracy % of extracted facts (medications, dates, diagnoses) correct Medium — most facts are straightforward GP overview, clinical trial matching
SNOMED CT concept-level coding % of clinical concepts correctly coded to ontology Hard — open research problem EHDS compliance, interoperability
Recall (no missed facts) % of relevant facts captured Different from precision — missing is worse than wrong Clinical safety
Precision (no wrong facts) % of extracted facts that are correct Different from recall — wrong is dangerous Clinical safety

These are wildly different problems. Best-in-class LLM clinical summarisation achieves ~96.55% sentence-level fidelity (1.47% hallucination + 3.45% omission) in the best configuration — but a separate evaluation found 47% of 100 LLM-generated ED summaries omitted clinically relevant information at the document level. Same technology, very different "accuracy" depending on how you slice it.

A defensible year-1 milestone: specify which metric, on which document type, against which gold standard, and test it on real Nordic clinical data.

The Compounding Problem

Even granting 95% per-document accuracy, medical records compound. A patient with 50 documents in their PHKG has a fully correct record with probability 0.95^50 = 7.7%. At 99% per-fact with 500 facts: 0.99^500 = 0.7%.

This isn't a quibble — it's the central problem. The chapter previously claimed 95% means "production-quality, queryable." For patient-level decisions (GP care, second opinions, clinical trial matching), 95% per-document accuracy means almost no patient has a fully correct record.

The real engineering question isn't "can we hit 95% per document" — it's "can we hit accuracy levels where the patient-level error rate is acceptable for the use case." Those numbers are much higher than 95% and may be infeasible with current technology.

What LLMs Actually Achieve on Clinical Tasks (2026 Evidence)

Study Task Result Caveats
Best-case LLM summarisation Sentence-level fidelity 96.55% (1.47% hallucination + 3.45% omission) Best config, consultation transcripts, not longitudinal multi-doc curation
ED summary evaluation Document-level clinical relevance 47% had clinically relevant omissions Same underlying tech, very different result at document level
PreA RCT (Nature Medicine 2026) Specialist consultation duration 28.7% reduction Specialist care, not GP; LLM-assisted not fully automated
SNOMED CT coding from text Concept-level accuracy Open research problem AIDAVA's own product section acknowledges this

The honest range for LLM-aided curation on AIDAVA's task in 2026 is probably 60–85% depending on document type, language, and metric. Asserting 95% is a stretch goal stated as a planning assumption — a category error.

The "Replaces a Curation Team" Claim — Needs Verification

The previous version of this chapter claimed hospitals have €300-600K curation teams that Stamen would replace. This needs verification before going in a pitch deck:

  • Nordic hospital context: The €300-600K figure looks imported from a US billing-coding context where hospitals run dedicated coding shops for fee-for-service billing. In Nordic public hospitals (and most Nordic privates), curation isn't a separately staffed function — clinicians do it as part of documentation, and dedicated medical coders do ICD-10 coding for DRG/ISF reimbursement, but at much smaller headcount than implied.
  • Software rarely cuts FTEs in healthcare: One of the most consistently documented findings in health IT economics — promised FTE savings reallocate rather than reduce. CFOs know this and discount FTE-replacement pitches. Savings come from avoided hires and reduced overtime, not headcount cuts.
  • Verify with a Norwegian hospital CFO. If the curation team size is wrong, the entire ROI flip disappears.

HITL Costs Are Missing from Unit Economics

The previous chapter said HITL needs "trained clinical coders — expensive, hard to scale" but didn't include the cost. Quick math:

  • 100,000 documents/year/hospital
  • 5% HITL rate = 5,000 documents
  • 5 minutes per HITL review
  • €60/hr fully-loaded clinical coder cost
  • = ~€25K/year in HITL labor per hospital

Not catastrophic, but it's in the model and it scales with customer count. HITL becomes Stamen's COGS, eroding gross margin as the business grows. The "€0.50-1/doc" figure looks like marginal compute cost, not true unit cost including HITL.

95% Does NOT Unlock Every Downstream Opportunity

The previous "95% unlocks everything" table was wrong. Our own prior analyses of opportunities #2, #4, and #5 showed binding constraints are NOT curation accuracy:

Opportunity Claimed Constraint at 45% Actual Binding Constraint
Pre-Consultation Triage "History data incomplete" Trust/adoption, MDR clearance, NHS competitive density (AccuRx, eConsult, Anima)
GP Diagnostic Overview "Summary unreliable" Clinical safety (omissions, automation bias), Epic/Helseplattformen, MDR/AI Act
B2B Preventive Health "Partial health trajectory" Null causal ROI in RCTs, BHT incumbency, GDPR Art. 9 employer-consent
Data Intermediary "Pharma doesn't trust 45%" EHDS secondary-use framework not finalized, hospital reputational risk, de-identification

For at least four of five downstream opportunities, jumping from 45% to 95% doesn't move the binding constraint. Accuracy is necessary but nowhere near sufficient. Claiming "every opportunity becomes viable only past 95%" inverts cause and effect.

MDR Direction Runs the Other Way

The previous version said MDR risk goes from "high" at 45% to "lower" at 95%. This is backwards:

  • Under EU MDR + AI Act, a tool that curates clinical data flowing into care decisions is likely Class IIa or higher regardless of accuracy.
  • Higher accuracy doesn't reduce regulatory burden — it makes certification more likely to succeed.
  • HITL doesn't lower the regulatory class; it changes the conformity assessment route.
  • Higher automation arguably increases regulatory scrutiny because the system does more clinical work autonomously.

MDR is a process and certification problem, not an accuracy problem.

What AIDAVA Actually Concluded — Needs Direct Verification

The "true potential for automation" quote is from D1.7 (Jan 2025, sensitive). It's the foundation of the "tools have improved → higher accuracy is achievable" argument. Pull the actual AIDAVA evaluation deliverable and quote from it directly. If AIDAVA's authors concluded "the architecture has limits beyond which more sophisticated NLP won't help," the central premise collapses.

Recommendation: contact Remzi Celebi (coordinator, Maastricht University) directly about D1.7 conclusions. This is a 30-minute conversation that could save months of wrong assumptions.

The Honest Framing

AIDAVA's 45% rate reflects 2022-era tooling. Modern LLMs plausibly improve this materially — to a range of perhaps 60–85% depending on document type and language. Whether further improvement to the 90%+ range is achievable with current technology is an open empirical question that needs to be tested in year 1 with a working prototype on real Nordic clinical data.

The economics of the business model are highly sensitive to this number. A feasibility study to estimate the achievable accuracy ceiling is the single most important year-1 technical milestone. The seed-round narrative depends on the answer.

This is not "if we hit 95%, everything 10x's." This is: "we need to find out what's achievable, because everything downstream depends on the answer."

Year-1 Milestone: Accuracy Feasibility Study

Task What Why
1. Define accuracy metric Per-fact recall on medication/diagnosis extraction from discharge summaries Most relevant for GP overview and EHDS compliance
2. Gold standard 200 Norwegian discharge summaries manually coded by clinical coder Representative of real-world complexity
3. Test LLM pipeline Run through best-available NLP pipeline (GPT-4 class + SNOMED CT mapping) Benchmark achievable accuracy with 2026 tools
4. Report results Per-fact recall, precision, document-level accuracy, SNOMED CT coding rate This number is the company's most important asset

If the feasibility study shows 70%+ per-fact recall: good enough for GP overview, patient apps, and EHDS compliance use cases with HITL. The business case works at this level — not as a "replaces curation team" pitch, but as a "makes curation 3x faster with 0.5 FTE oversight" pitch.

If the study shows <60%: the thesis doesn't work with current technology. Pivot to narrower use cases (e.g., structured document matching, not full curation) or wait for tooling to improve.

The right framing is not "95% or bust." It's: "what can we achieve, and what does that level enable?" That's an honest, testable, fundable proposition.

Further Opportunities — Beyond EHDS Compliance

The core EHDS compliance product is the foundation, but the same PHKG technology enables five additional revenue streams. Detailed analysis: PHKG Business Opportunities.

Opportunity Customer Value Prop Timeline
Pre-Consultation Triage GP practices LLM-based structured history-taking for GP EHR. Assistant not replacement. Strong RCT evidence (PreA 2026, 28.7% consultation reduction). Static questionnaires have 25 years of null results. Year 1-2
Patient Group Apps Patients (rare/chronic) Consolidate scattered data into one PHKG, AI explains and curates it, bring for second opinions. Proves the tech with real users. Year 1-2
GP Diagnostic Overview GP practices Full patient history structured and summarized in PHKG. GP sees the whole patient, orders fewer redundant tests. Year 2-3
B2B Preventive Health Employers Employee health screening data tracked in PHKG over years. Personalized health trajectories. Partner with screening providers (Neko, Nightingale). Year 3+
Data Intermediary Pharma/CROs Hospitals sell curated patient cohorts for clinical trials via PHKG. High revenue per transaction but complex regulation. Year 3+

Recommended sequence: Start with pre-consultation triage (simplest, fastest) and patient group apps (proves PHKG). Build toward core EHDS compliance. Add data monetization only after hospital data is flowing.

For full analysis of each opportunity — market context, competitors, revenue potential, challenges, and verdicts — see PHKG Business Opportunities.

See Also