Implementation plan — Monitoring setup primitives¶
Status: Ready for handoff (fresh Claude Code session) Driver ADR: docs/adr/0001-monitoring-setup-inputs.md Related PRs: #48 (demo Make target), #65 (seed workaround), #67 (this ADR)
This plan is self-contained. A fresh Claude Code session should be able to pick it up cold, in order.
1. Required reading (before any code)¶
- docs/adr/0001-monitoring-setup-inputs.md — the why. Read this fully.
- scripts/seed_immunology.py — current seed entrypoint; the conditions list lives here.
- ogur/sources/clinicaltrials.py —
_search_studiesis the function whose 500-result cap drove the failure. - ogur/sources/opentargets.py — the GraphQL client to extend for discovery.
- ogur/engine/evidence_pipeline.py, specifically
_select_top_competitorsat line 608 — the second place that needs fixing (selection by signal volume, not configured competitors). - ogur/api/routes/evidence.py —
_LANDSCAPE_CONFIGis the hardcoded competitor + headline-endpoint config that needs to move intoLandscape. - frontend/src/components/asset/LandscapeTab.tsx — the indication-filter UI that motivates multi-indication scope.
2. Demo scope decision¶
After feedback: keep multiple indications to exercise the Landscape tab's indication selector (activeIndication state, line 112). Concretely:
Rationale:
| Indication | Why include | Headline endpoints |
|---|---|---|
| Atopic Dermatitis | Lead indication, richest data | EASI-75, IGA 0/1, Pruritus NRS |
| Asthma | Type 2 inflammation, distinct competitor set (tezepelumab, mepolizumab, benralizumab) | ACQ-6, FEV1, exacerbation rate |
| CRSwNP | Type 2 inflammation, smaller competitor set, exercises filter without bloat | NPS, SNOT-22 |
Excluded for the demo: EoE, Prurigo Nodularis, COPD. Less data, less mature comparative endpoints, would dilute the head-to-head story.
3. Current state baseline (ogur.db, post-PR #65)¶
Captured on the post-make demo-immunology DB so the next session has a real starting point.
3.1 Headline counts¶
| Metric | Value |
|---|---|
| Total signals (immunology-001) | 3,883 |
| Distinct drug names | 1,360 |
Distinct indications (from Signal.indication) |
1 (degenerate: all tagged "Atopic Dermatitis") |
| Distinct companies | 941 |
| Distinct CT.gov NCTs | 1,237 |
| Drug profiles | 169 |
| Protocol profiles (parsed CT.gov v2) | 24 |
| Evidence records | 685 |
| Evidence records (paired, comparator_value IS NOT NULL) | 422 |
3.2 Per-source signal distribution¶
clinicaltrials 2435 ← dominant; the source the bug fix targeted
sec 665 ← high but not drug-keyed (company filings)
openalex 342
opentargets 169
openfda 100
lens 100
pubmed 71
conference 1 ← ⚠️ effectively broken; flag for separate fix
The conference source returning 1 row is a known gap to flag in this plan — the holo_conference / conference path is the only visual-extraction lane and should not be silent.
3.3 Configured-competitor coverage (post-PR #65)¶
PR #65 succeeded at getting all 7 competitors into the DB:
drug signals sources nct_trials
dupilumab 215 4 160
abrocitinib 50 4 7
baricitinib 34 4 5
upadacitinib 16 2 13
lebrikizumab 11 2 10
tralokinumab 10 2 9
nemolizumab 7 2 6
But the evidence pipeline still picks the wrong drugs because _select_top_competitors orders by signal volume:
top-N actually picked: placebo, dupilumab, abrocitinib, cyclosporine,
baricitinib, apremilast, crisaborole, cetirizine,
aspirin, betamethasone
what we want picked: dupilumab, lebrikizumab, tralokinumab, nemolizumab,
baricitinib, abrocitinib, upadacitinib
This is the second bug behind the "only 2 cards in Clinical Evidence" symptom and is solved by the candidate_drugs field from ADR-0001 — see Phase 2.
3.4 Indication coverage (this is the key gap)¶
SELECT indication, COUNT(*) FROM signal WHERE landscape_id='immunology-001' GROUP BY indication;
-- → Atopic Dermatitis: 3883
All 3,883 signals are tagged with the landscape's singular indication field, so the Landscape tab's indication selector is a no-op in the demo. Multi-indication seeding is needed to exercise it.
4. Target state (after this plan lands)¶
4.1 Headline counts (expected order of magnitude)¶
| Metric | Current | Target | Notes |
|---|---|---|---|
| Total signals | 3,883 | 6,000–10,000 | +Asthma, +CRSwNP queries |
| Distinct CT.gov NCTs | 1,237 | 3,000+ | Each indication adds its own ~1,000 |
Distinct indications in Signal.indication |
1 | 3 | AD, Asthma, CRSwNP |
candidate_drugs length |
n/a (new field) | 12–25 | per-indication union, deduplicated |
| Evidence records (paired) for headline endpoints | 422 (mostly placebo/dupilumab) | 200+ across 5+ targeted competitors | Signal-count selection replaced |
| Clinical Evidence section cards rendered | 2 | ≥ 5 | dupilumab, lebrikizumab, tralokinumab, nemolizumab, baricitinib at minimum |
4.2 Acceptance criteria (objective)¶
-
make demo-immunologyproduces a Clinical Evidence section with ≥ 5 competitor cards. - The Landscape tab's indication selector shows 3 tabs (AD, Asthma, CRSwNP) in addition to "All".
-
discover_competitors.pyis idempotent: running it twice on the same DB produces the samecandidate_drugslist. -
Landscape.conditionsis removed; nothing in the codebase references it (grep -rn 'landscape.conditions'returns 0 matches outside migrations / the field's definition). - All tests pass:
make check.
5. Eval framework — AD use case¶
This is what we measure to call the system "good enough" on the AD slice. Numbers go into a markdown report under evals/monitoring_setup/ad_eval.md.
5.1 Recall (against canonical AD competitor set)¶
The canonical set, agreed up-front:
canonical_ad_competitors = {
"dupilumab", # IL-4Rα mAb — anchor
"lebrikizumab", # IL-13 mAb
"tralokinumab", # IL-13 mAb
"nemolizumab", # IL-31Rα mAb
"baricitinib", # JAK1/2 inhibitor
"abrocitinib", # JAK1 inhibitor
"upadacitinib", # JAK1 inhibitor
"ruxolitinib", # topical JAK1/2 (Opzelura) — currently missing
"eblasakimab", # IL-13Rα1 mAb (ASLAN004) — early stage, may miss
}
| Metric | Target | How to measure |
|---|---|---|
| Discovery recall | ≥ 8/9 | set(discover_competitors output) ∩ canonical_ad_competitors |
| CT.gov trial coverage per drug | ≥ 80% of CT.gov UI count | Manual: open clinicaltrials.gov, search "drug + atopic dermatitis", record N. Compare to SELECT COUNT(DISTINCT source_id) FROM signal WHERE drug_name=X AND indication LIKE '%Atopic%'. |
| PubMed paper coverage | Spot check 3 drugs | Compare SELECT COUNT(*) FROM signal WHERE source='pubmed' AND drug_name=X to PubMed UI count for "drug + atopic dermatitis". |
5.2 Evidence completeness (per competitor, per headline endpoint)¶
For each competitor in the configured list, count paired evidence rows per headline endpoint:
SELECT drug_name, endpoint, COUNT(*) as n
FROM evidencerecord
WHERE drug_name IN (... canonical set ...)
AND comparator_value IS NOT NULL
AND endpoint LIKE '%EASI%' OR endpoint LIKE '%IGA%' OR endpoint LIKE '%Pruritus%'
GROUP BY drug_name, endpoint;
Targets: - ≥ 5 drugs with ≥ 1 paired EASI-75 row - ≥ 4 drugs with ≥ 1 paired IGA 0/1 row - ≥ 3 drugs with ≥ 1 paired Pruritus NRS row
If any drug has 0 paired rows on any endpoint, log it in the eval report — it's either a real data gap (drug never reported on that endpoint) or an extractor miss.
5.3 UI render check (manual)¶
Boot make demo-immunology && make dev && make frontend, navigate to /asset/dupilumab, click the Landscape tab, and screenshot:
- Indication selector shows 3 tabs (AD / Asthma / CRSwNP) + "All".
- Clicking "Atopic Dermatitis" filters the comparator grid to AD competitors.
- Each card shows ≥ 1 EASI-75 row and ≥ 1 IGA 0/1 row.
- Clicking "Asthma" reveals tezepelumab, mepolizumab, benralizumab cards (or whichever are returned by discovery).
5.4 Quality (manual, one-time)¶
Pull 20 random EvidenceRecord rows for the canonical set. For each, verify:
endpointstring matches a known headline endpoint (no "primary endpoint" stubs).valueandcomparator_valueare sensible (no transposed arms, no negative percentages, etc.).source_urlactually opens to the trial / paper that supports the row.
Pass criterion: ≥ 17/20 (85%).
5.5 Eval automation script¶
scripts/eval_ad_use_case.py should run the above SQL and print a markdown table to evals/monitoring_setup/ad_eval.md. Wire it into the Make target chain:
demo-immunology: seed-immunology-full evidence-pilot eval-ad ## ...
eval-ad: ## AD use-case eval report
uv run python scripts/eval_ad_use_case.py
6. Implementation phases¶
Phase 1 — Schema migration (1 day)¶
Files: ogur/models/landscape.py, new alembic migration (or in-place SQLModel migration since dev DB is local).
Add to Landscape:
indications: str # JSON list[str]; replaces `indication` (singular)
moa: str | None # JSON list[str]; nullable
horizon: str # "phase_1+" | "phase_2+" | "phase_3+" | "approved"
candidate_drugs: str # JSON list[str]; populated by discover_competitors
last_discovered_at: datetime | None
Keep Landscape.conditions until Phase 4 — old code still reads it. Add a deprecation comment.
For the local SQLite path, the simplest migration is rm -f ogur.db && make demo-immunology. Document this explicitly in the PR.
Phase 2 — discover_competitors.py + evidence-pipeline rewire (2 days)¶
New file: scripts/discover_competitors.py.
Skeleton:
"""Populate Landscape.candidate_drugs from Open Targets graph queries.
Inputs (read from Landscape row):
- indications: list[str]
- targets: list[str]
- moa: list[str] | None
- horizon: str
Output: Landscape.candidate_drugs JSON list, Landscape.last_discovered_at datetime.
Idempotent: re-running produces the same list (modulo Open Targets data updates).
"""
Open Targets GraphQL fragment (knownDrugs field on the disease + target intersection):
query CandidateDrugs($efoId: String!, $targetSymbol: String!) {
disease(efoId: $efoId) {
knownDrugs(size: 100) {
rows {
drug { id name }
targetClass: target { approvedSymbol }
phase
mechanismOfAction
}
}
}
}
The function _filter_by_horizon(rows, horizon) keeps rows where phase >= horizon_threshold.
Edit: ogur/engine/evidence_pipeline.py line 608, replace _select_top_competitors:
def _select_top_competitors(landscape_id, top_n, source_types) -> list[str]:
"""Read candidate_drugs from the landscape, fall back to signal-count for legacy."""
with get_session() as session:
landscape = session.exec(
select(Landscape).where(Landscape.id == landscape_id)
).first()
if landscape and landscape.candidate_drugs:
drugs = json.loads(landscape.candidate_drugs)
return drugs[:top_n]
# Legacy fallback (delete in Phase 4)
return _select_top_competitors_by_signal_count(...)
Phase 3 — Source adoption (1 day)¶
Edit: ogur/sources/clinicaltrials.py. fetch() should iterate over both landscape.indications (use query.cond) AND landscape.candidate_drugs (use query.intr). Dedup at the NCT level via seen_hashes.
async def fetch(self, landscape: Landscape) -> list[Signal]:
signals = []
seen_hashes = set()
for indication in json.loads(landscape.indications):
studies = await self._search_by_condition(indication)
# ... extract & dedup ...
for drug in json.loads(landscape.candidate_drugs or "[]"):
studies = await self._search_by_intervention(drug)
# ... extract & dedup ...
return signals
Edit similarly: ogur/sources/pubmed.py, ogur/sources/openalex.py, ogur/sources/patents.py. Each gets a per-drug query loop in addition to the per-indication broad sweep.
Phase 4 — seed_immunology.py cleanup + Make integration (15 min)¶
Edit: scripts/seed_immunology.py. Replace _CONDITIONS with explicit field assignments:
indications = ["Atopic Dermatitis", "Asthma", "Chronic Rhinosinusitis with Nasal Polyposis"]
targets = ["IL-4Rα", "IL-4", "IL-13", "TSLP", "IL-33", "OX40L", "JAK1", "JAK2", "IL-31Rα", "IgE"]
moa = None # leave empty for immunology; oncology landscapes will set this
horizon = "phase_2+"
The seed script then runs discover_competitors.py first (synchronously), then the source loop reads the populated candidate_drugs.
Edit: Makefile, update seed-immunology-full:
seed-immunology-full: ## Reproducible immunology DB
rm -f ogur.db
uv run python scripts/seed_immunology.py # creates landscape
uv run python scripts/discover_competitors.py immunology-001 # populates candidate_drugs
uv run python scripts/seed_immunology_sources.py # NEW: split out source-fetching loop
uv run python scripts/build_drug_profiles.py
uv run python scripts/build_target_graph.py
uv run python scripts/seed_kiqs.py
uv run python scripts/generate_briefing.py --landscape immunology-001
uv run python scripts/analyze_drug.py dupilumab immunology-001
(Or keep seed_immunology.py as the single entrypoint and have it orchestrate the discover→fetch sequence internally — author's choice.)
Phase 5 — Landscape.conditions removal (1 hour)¶
After Phase 4 verifies green: git grep -n 'landscape.conditions\|landscape\.conditions' should return only the model definition + deprecation comment. Delete the column, drop migration.
7. Comparative statistics (deliverable)¶
Generate before/after numbers. Run on:
- Before: the post-PR #65 DB (numbers in §3 above).
- After: the DB produced by make demo-immunology once this plan lands.
Required tables (auto-generated by scripts/eval_ad_use_case.py):
- Headline counts diff — same shape as §3.1, side by side.
- Per-source signal counts diff — flag the conference=1 issue if it persists.
- Per-drug coverage diff — for each canonical AD competitor, before / after (signals, sources, NCTs).
- Indication distribution diff — should go from 1 row (AD: 3883) to 3 rows (AD / Asthma / CRSwNP, each non-zero).
- Top-N selection diff — what
_select_top_competitorsreturns before vs. after. - Evidence pipeline outputs diff —
evidence recordsandpaired recordscounts per drug, ranked. - Clinical Evidence card count — manual screenshot count: 2 (current) vs. ≥ 5 (target).
Output format:
Persist this diff as evals/monitoring_setup/before_after.md so future regressions can be A/B'd against it.
8. Risks & open issues to resolve in flight¶
- Open Targets coverage of early-stage drugs. Eblasakimab (ASLAN004) and similar Phase 1 / Phase 2 compounds may not be in the Open Targets graph yet. Mitigation: keep the broad indication sweep as a safety net (already in Phase 3); flag any canonical-set members missing from
candidate_drugsin the eval report. - Open Targets rate limits. Discovery query may hit GraphQL throttling on first run. Mitigation: implement with the existing
tenacityretry pattern fromogur/sources/opentargets.py; cache to a local JSON next to the DB. - Conference source = 1 signal. This is not in scope here but the eval will surface it; flag in the implementation PR description so it gets a follow-up issue.
- Phase mapping. Open Targets uses ints (1, 2, 3, 4); our
SignalType.PHASE_TRANSITIONenum uses strings ("Phase 1", "Phase II"). The mapping table forhorizonneeds to be defined indiscover_competitors.pyand unit tested. - MoA taxonomy alignment. ADR-0001 open question #1 — adopt Open Targets vocabulary, alias on read. Document the canonical MoA values in
docs/data-sources.md. - The
Signal.indicationcolumn inheritance. Currently every signal inheritslandscape.indication(singular). When indications becomes a list, signals must be tagged with their origin indication (whichquery.condproduced them), not the landscape's name. This requires plumbing through_search_studies→_extract_signals. Easy to miss; tests should assertSignal.indicationdistribution per indication.
9. PR strategy¶
- PR-A (this plan + ADR): docs only, ready for review now (#67).
- PR-B (Phase 1): schema changes + tests. Standalone, mergeable.
- PR-C (Phase 2):
discover_competitors.py+ evidence-pipeline rewire. Depends on PR-B. - PR-D (Phase 3 + 4 + 5): source adoption, seed integration, conditions removal. Single PR because they all touch the same flow.
- PR-E (eval):
scripts/eval_ad_use_case.py+ before/after report. Can run in parallel with PR-D.
Total estimated effort: 4–5 person-days, plus eval review. The eval is what gives confidence to delete Landscape.conditions.