Implementation plan — Monitoring setup primitives¶

Status: Ready for handoff (fresh Claude Code session) Driver ADR: docs/adr/0001-monitoring-setup-inputs.md Related PRs: #48 (demo Make target), #65 (seed workaround), #67 (this ADR)

This plan is self-contained. A fresh Claude Code session should be able to pick it up cold, in order.

1. Required reading (before any code)¶

docs/adr/0001-monitoring-setup-inputs.md — the why. Read this fully.
scripts/seed_immunology.py — current seed entrypoint; the conditions list lives here.
ogur/sources/clinicaltrials.py — _search_studies is the function whose 500-result cap drove the failure.
ogur/sources/opentargets.py — the GraphQL client to extend for discovery.
ogur/engine/evidence_pipeline.py, specifically _select_top_competitors at line 608 — the second place that needs fixing (selection by signal volume, not configured competitors).
ogur/api/routes/evidence.py — _LANDSCAPE_CONFIG is the hardcoded competitor + headline-endpoint config that needs to move into Landscape.
frontend/src/components/asset/LandscapeTab.tsx — the indication-filter UI that motivates multi-indication scope.

2. Demo scope decision¶

After feedback: keep multiple indications to exercise the Landscape tab's indication selector (activeIndication state, line 112). Concretely:

indications = ["Atopic Dermatitis", "Asthma", "Chronic Rhinosinusitis with Nasal Polyposis"]

Rationale:

Indication	Why include	Headline endpoints
Atopic Dermatitis	Lead indication, richest data	EASI-75, IGA 0/1, Pruritus NRS
Asthma	Type 2 inflammation, distinct competitor set (tezepelumab, mepolizumab, benralizumab)	ACQ-6, FEV1, exacerbation rate
CRSwNP	Type 2 inflammation, smaller competitor set, exercises filter without bloat	NPS, SNOT-22

Excluded for the demo: EoE, Prurigo Nodularis, COPD. Less data, less mature comparative endpoints, would dilute the head-to-head story.

3. Current state baseline (`ogur.db`, post-PR #65)¶

Captured on the post-make demo-immunology DB so the next session has a real starting point.

3.1 Headline counts¶

Metric	Value
Total signals (immunology-001)	3,883
Distinct drug names	1,360
Distinct indications (from `Signal.indication`)	1 (degenerate: all tagged "Atopic Dermatitis")
Distinct companies	941
Distinct CT.gov NCTs	1,237
Drug profiles	169
Protocol profiles (parsed CT.gov v2)	24
Evidence records	685
Evidence records (paired, comparator_value IS NOT NULL)	422

3.2 Per-source signal distribution¶

clinicaltrials  2435   ← dominant; the source the bug fix targeted
sec              665   ← high but not drug-keyed (company filings)
openalex         342
opentargets      169
openfda          100
lens             100
pubmed            71
conference         1   ← ⚠️ effectively broken; flag for separate fix

The conference source returning 1 row is a known gap to flag in this plan — the holo_conference / conference path is the only visual-extraction lane and should not be silent.

3.3 Configured-competitor coverage (post-PR #65)¶

PR #65 succeeded at getting all 7 competitors into the DB:

drug          signals  sources  nct_trials
dupilumab     215      4        160
abrocitinib    50      4          7
baricitinib    34      4          5
upadacitinib   16      2         13
lebrikizumab   11      2         10
tralokinumab   10      2          9
nemolizumab     7      2          6

But the evidence pipeline still picks the wrong drugs because _select_top_competitors orders by signal volume:

top-N actually picked:  placebo, dupilumab, abrocitinib, cyclosporine,
                        baricitinib, apremilast, crisaborole, cetirizine,
                        aspirin, betamethasone

what we want picked:    dupilumab, lebrikizumab, tralokinumab, nemolizumab,
                        baricitinib, abrocitinib, upadacitinib

This is the second bug behind the "only 2 cards in Clinical Evidence" symptom and is solved by the candidate_drugs field from ADR-0001 — see Phase 2.

3.4 Indication coverage (this is the key gap)¶

SELECT indication, COUNT(*) FROM signal WHERE landscape_id='immunology-001' GROUP BY indication;
-- → Atopic Dermatitis: 3883

All 3,883 signals are tagged with the landscape's singular indication field, so the Landscape tab's indication selector is a no-op in the demo. Multi-indication seeding is needed to exercise it.

4. Target state (after this plan lands)¶

4.1 Headline counts (expected order of magnitude)¶

Metric	Current	Target	Notes
Total signals	3,883	6,000–10,000	+Asthma, +CRSwNP queries
Distinct CT.gov NCTs	1,237	3,000+	Each indication adds its own ~1,000
Distinct indications in `Signal.indication`	1	3	AD, Asthma, CRSwNP
`candidate_drugs` length	n/a (new field)	12–25	per-indication union, deduplicated
Evidence records (paired) for headline endpoints	422 (mostly placebo/dupilumab)	200+ across 5+ targeted competitors	Signal-count selection replaced
Clinical Evidence section cards rendered	2	≥ 5	dupilumab, lebrikizumab, tralokinumab, nemolizumab, baricitinib at minimum

4.2 Acceptance criteria (objective)¶

make demo-immunology produces a Clinical Evidence section with ≥ 5 competitor cards.
The Landscape tab's indication selector shows 3 tabs (AD, Asthma, CRSwNP) in addition to "All".
discover_competitors.py is idempotent: running it twice on the same DB produces the same candidate_drugs list.
Landscape.conditions is removed; nothing in the codebase references it (grep -rn 'landscape.conditions' returns 0 matches outside migrations / the field's definition).
All tests pass: make check.

5. Eval framework — AD use case¶

This is what we measure to call the system "good enough" on the AD slice. Numbers go into a markdown report under evals/monitoring_setup/ad_eval.md.

5.1 Recall (against canonical AD competitor set)¶

The canonical set, agreed up-front:

canonical_ad_competitors = {
    "dupilumab",        # IL-4Rα mAb — anchor
    "lebrikizumab",     # IL-13 mAb
    "tralokinumab",     # IL-13 mAb
    "nemolizumab",      # IL-31Rα mAb
    "baricitinib",      # JAK1/2 inhibitor
    "abrocitinib",      # JAK1 inhibitor
    "upadacitinib",     # JAK1 inhibitor
    "ruxolitinib",      # topical JAK1/2 (Opzelura) — currently missing
    "eblasakimab",      # IL-13Rα1 mAb (ASLAN004) — early stage, may miss
}

Metric	Target	How to measure
Discovery recall	≥ 8/9	`set(discover_competitors output) ∩ canonical_ad_competitors`
CT.gov trial coverage per drug	≥ 80% of CT.gov UI count	Manual: open clinicaltrials.gov, search "drug + atopic dermatitis", record N. Compare to `SELECT COUNT(DISTINCT source_id) FROM signal WHERE drug_name=X AND indication LIKE '%Atopic%'`.
PubMed paper coverage	Spot check 3 drugs	Compare `SELECT COUNT(*) FROM signal WHERE source='pubmed' AND drug_name=X` to PubMed UI count for "drug + atopic dermatitis".

5.2 Evidence completeness (per competitor, per headline endpoint)¶

For each competitor in the configured list, count paired evidence rows per headline endpoint:

SELECT drug_name, endpoint, COUNT(*) as n
FROM evidencerecord
WHERE drug_name IN (... canonical set ...)
  AND comparator_value IS NOT NULL
  AND endpoint LIKE '%EASI%' OR endpoint LIKE '%IGA%' OR endpoint LIKE '%Pruritus%'
GROUP BY drug_name, endpoint;

Targets: - ≥ 5 drugs with ≥ 1 paired EASI-75 row - ≥ 4 drugs with ≥ 1 paired IGA 0/1 row - ≥ 3 drugs with ≥ 1 paired Pruritus NRS row

If any drug has 0 paired rows on any endpoint, log it in the eval report — it's either a real data gap (drug never reported on that endpoint) or an extractor miss.

5.3 UI render check (manual)¶

Boot make demo-immunology && make dev && make frontend, navigate to /asset/dupilumab, click the Landscape tab, and screenshot:

Indication selector shows 3 tabs (AD / Asthma / CRSwNP) + "All".
Clicking "Atopic Dermatitis" filters the comparator grid to AD competitors.
Each card shows ≥ 1 EASI-75 row and ≥ 1 IGA 0/1 row.
Clicking "Asthma" reveals tezepelumab, mepolizumab, benralizumab cards (or whichever are returned by discovery).

5.4 Quality (manual, one-time)¶

Pull 20 random EvidenceRecord rows for the canonical set. For each, verify:

endpoint string matches a known headline endpoint (no "primary endpoint" stubs).
value and comparator_value are sensible (no transposed arms, no negative percentages, etc.).
source_url actually opens to the trial / paper that supports the row.

Pass criterion: ≥ 17/20 (85%).

5.5 Eval automation script¶

scripts/eval_ad_use_case.py should run the above SQL and print a markdown table to evals/monitoring_setup/ad_eval.md. Wire it into the Make target chain:

demo-immunology: seed-immunology-full evidence-pilot eval-ad ## ...
eval-ad: ## AD use-case eval report
    uv run python scripts/eval_ad_use_case.py

6. Implementation phases¶

Phase 1 — Schema migration (1 day)¶

Files: ogur/models/landscape.py, new alembic migration (or in-place SQLModel migration since dev DB is local).

Add to Landscape:

indications: str       # JSON list[str]; replaces `indication` (singular)
moa: str | None        # JSON list[str]; nullable
horizon: str           # "phase_1+" | "phase_2+" | "phase_3+" | "approved"
candidate_drugs: str   # JSON list[str]; populated by discover_competitors
last_discovered_at: datetime | None

Keep Landscape.conditions until Phase 4 — old code still reads it. Add a deprecation comment.

For the local SQLite path, the simplest migration is rm -f ogur.db && make demo-immunology. Document this explicitly in the PR.

Phase 2 — `discover_competitors.py` + evidence-pipeline rewire (2 days)¶

New file: scripts/discover_competitors.py.

Skeleton:

"""Populate Landscape.candidate_drugs from Open Targets graph queries.

Inputs (read from Landscape row):
  - indications: list[str]
  - targets: list[str]
  - moa: list[str] | None
  - horizon: str

Output: Landscape.candidate_drugs JSON list, Landscape.last_discovered_at datetime.

Idempotent: re-running produces the same list (modulo Open Targets data updates).
"""

Open Targets GraphQL fragment (knownDrugs field on the disease + target intersection):

query CandidateDrugs($efoId: String!, $targetSymbol: String!) {
  disease(efoId: $efoId) {
    knownDrugs(size: 100) {
      rows {
        drug { id name }
        targetClass: target { approvedSymbol }
        phase
        mechanismOfAction
      }
    }
  }
}

The function _filter_by_horizon(rows, horizon) keeps rows where phase >= horizon_threshold.

Edit: ogur/engine/evidence_pipeline.py line 608, replace _select_top_competitors:

def _select_top_competitors(landscape_id, top_n, source_types) -> list[str]:
    """Read candidate_drugs from the landscape, fall back to signal-count for legacy."""
    with get_session() as session:
        landscape = session.exec(
            select(Landscape).where(Landscape.id == landscape_id)
        ).first()
        if landscape and landscape.candidate_drugs:
            drugs = json.loads(landscape.candidate_drugs)
            return drugs[:top_n]
        # Legacy fallback (delete in Phase 4)
        return _select_top_competitors_by_signal_count(...)

Phase 3 — Source adoption (1 day)¶

Edit: ogur/sources/clinicaltrials.py. fetch() should iterate over both landscape.indications (use query.cond) AND landscape.candidate_drugs (use query.intr). Dedup at the NCT level via seen_hashes.

async def fetch(self, landscape: Landscape) -> list[Signal]:
    signals = []
    seen_hashes = set()
    for indication in json.loads(landscape.indications):
        studies = await self._search_by_condition(indication)
        # ... extract & dedup ...
    for drug in json.loads(landscape.candidate_drugs or "[]"):
        studies = await self._search_by_intervention(drug)
        # ... extract & dedup ...
    return signals

Edit similarly: ogur/sources/pubmed.py, ogur/sources/openalex.py, ogur/sources/patents.py. Each gets a per-drug query loop in addition to the per-indication broad sweep.

Phase 4 — `seed_immunology.py` cleanup + Make integration (15 min)¶

Edit: scripts/seed_immunology.py. Replace _CONDITIONS with explicit field assignments:

indications = ["Atopic Dermatitis", "Asthma", "Chronic Rhinosinusitis with Nasal Polyposis"]
targets = ["IL-4Rα", "IL-4", "IL-13", "TSLP", "IL-33", "OX40L", "JAK1", "JAK2", "IL-31Rα", "IgE"]
moa = None  # leave empty for immunology; oncology landscapes will set this
horizon = "phase_2+"

The seed script then runs discover_competitors.py first (synchronously), then the source loop reads the populated candidate_drugs.

Edit: Makefile, update seed-immunology-full:

seed-immunology-full: ## Reproducible immunology DB
    rm -f ogur.db
    uv run python scripts/seed_immunology.py             # creates landscape
    uv run python scripts/discover_competitors.py immunology-001  # populates candidate_drugs
    uv run python scripts/seed_immunology_sources.py     # NEW: split out source-fetching loop
    uv run python scripts/build_drug_profiles.py
    uv run python scripts/build_target_graph.py
    uv run python scripts/seed_kiqs.py
    uv run python scripts/generate_briefing.py --landscape immunology-001
    uv run python scripts/analyze_drug.py dupilumab immunology-001

(Or keep seed_immunology.py as the single entrypoint and have it orchestrate the discover→fetch sequence internally — author's choice.)

Phase 5 — `Landscape.conditions` removal (1 hour)¶

After Phase 4 verifies green: git grep -n 'landscape.conditions\|landscape\.conditions' should return only the model definition + deprecation comment. Delete the column, drop migration.

7. Comparative statistics (deliverable)¶

Generate before/after numbers. Run on: - Before: the post-PR #65 DB (numbers in §3 above). - After: the DB produced by make demo-immunology once this plan lands.

Required tables (auto-generated by scripts/eval_ad_use_case.py):

Headline counts diff — same shape as §3.1, side by side.
Per-source signal counts diff — flag the conference=1 issue if it persists.
Per-drug coverage diff — for each canonical AD competitor, before / after (signals, sources, NCTs).
Indication distribution diff — should go from 1 row (AD: 3883) to 3 rows (AD / Asthma / CRSwNP, each non-zero).
Top-N selection diff — what _select_top_competitors returns before vs. after.
Evidence pipeline outputs diff — evidence records and paired records counts per drug, ranked.
Clinical Evidence card count — manual screenshot count: 2 (current) vs. ≥ 5 (target).

Output format:

| Metric | Before (PR #65) | After (this plan) | Δ | Pass? |
|---|---:|---:|---:|---|
| ...

Persist this diff as evals/monitoring_setup/before_after.md so future regressions can be A/B'd against it.

8. Risks & open issues to resolve in flight¶

Open Targets coverage of early-stage drugs. Eblasakimab (ASLAN004) and similar Phase 1 / Phase 2 compounds may not be in the Open Targets graph yet. Mitigation: keep the broad indication sweep as a safety net (already in Phase 3); flag any canonical-set members missing from candidate_drugs in the eval report.
Open Targets rate limits. Discovery query may hit GraphQL throttling on first run. Mitigation: implement with the existing tenacity retry pattern from ogur/sources/opentargets.py; cache to a local JSON next to the DB.
Conference source = 1 signal. This is not in scope here but the eval will surface it; flag in the implementation PR description so it gets a follow-up issue.
Phase mapping. Open Targets uses ints (1, 2, 3, 4); our SignalType.PHASE_TRANSITION enum uses strings ("Phase 1", "Phase II"). The mapping table for horizon needs to be defined in discover_competitors.py and unit tested.
MoA taxonomy alignment. ADR-0001 open question #1 — adopt Open Targets vocabulary, alias on read. Document the canonical MoA values in docs/data-sources.md.
The Signal.indication column inheritance. Currently every signal inherits landscape.indication (singular). When indications becomes a list, signals must be tagged with their origin indication (which query.cond produced them), not the landscape's name. This requires plumbing through _search_studies → _extract_signals. Easy to miss; tests should assert Signal.indication distribution per indication.

9. PR strategy¶

PR-A (this plan + ADR): docs only, ready for review now (#67).
PR-B (Phase 1): schema changes + tests. Standalone, mergeable.
PR-C (Phase 2): discover_competitors.py + evidence-pipeline rewire. Depends on PR-B.
PR-D (Phase 3 + 4 + 5): source adoption, seed integration, conditions removal. Single PR because they all touch the same flow.
PR-E (eval): scripts/eval_ad_use_case.py + before/after report. Can run in parallel with PR-D.

Total estimated effort: 4–5 person-days, plus eval review. The eval is what gives confidence to delete Landscape.conditions.