ADR-0001: Inputs and Primitives for Landscape Monitoring Setup¶
Status: Proposed Date: 2026-04-27 Driver: Khalil Related: PR #65 (workaround), PR #48 (immunology demo target)
Context¶
seed_immunology.py currently hardcodes a list of competitor drug names into Landscape.conditions so that each one becomes its own CT.gov query.cond pass. This was a fix for a real failure — CT.gov pagination growth had pushed lebrikizumab / tralokinumab / nemolizumab past the 500-result cap of the broad "Atopic Dermatitis" query — but it is the wrong long-term primitive:
- Curation rot. Every landscape needs a complete drug roster maintained forever. A new IL-13 startup files an IND and we are blind until someone updates JSON. That defeats the monitoring premise.
- Per-source asymmetry. Even with a perfect drug list, each source has different blind spots (CT.gov pagination, PubMed for pre-clinical, SEC for company-keyed mentions, patents for code-numbered compounds).
- Wrong altitude of expert input. Analysts should describe the competitive boundary, not maintain a drug roster. Asking a BD analyst for a 50-row spreadsheet is a curation task; asking for indication + target + horizon is a 5-minute conversation.
We need a primitive set that scales across landscapes (immunology, oncology, etc.) and degrades gracefully when sources lose entries.
Decision¶
A landscape monitoring setup takes three inputs from the analyst:
- Indication boundary — a list of one or more indications that define the disease scope.
- Target and/or Mechanism of Action (MoA) — the molecular handle. Target is the molecule; MoA is what the drug does to it. Some indications are well-characterized by target alone; others require MoA to keep competitive views coherent.
- Competitive horizon — the development-stage cutoff (e.g.
phase_1+,phase_2+,approved).
The drug list is derived, not authored. It comes from an Open Targets graph query keyed on (indication × target × MoA × phase ≥ horizon), refreshed on every reseed, with an audit trail.
Why target and/or MoA, not target alone¶
In some indications, target is enough. In others, MoA is the meaningful axis for what counts as a true competitor:
| Indication | Target | MoA differentiation matters? | Example |
|---|---|---|---|
| Atopic Dermatitis | IL-13 | No — lebrikizumab and tralokinumab compete directly | — |
| Atopic Dermatitis | JAK | Slightly — JAK1-selective vs. dual matters for safety positioning | abrocitinib (JAK1) vs. baricitinib (JAK½) |
| NSCLC | KRAS-G12C | Yes — covalent inhibitors compete differently than pan-RAS | sotorasib (covalent) vs. RMC-6236 (multi-RAS) |
| HER2+ breast | HER2 | Yes — mAb / ADC / TKI are distinct competitive sub-fields | trastuzumab vs. T-DXd vs. tucatinib |
| NSCLC | EGFR | Yes — generation + format are everything | osimertinib (3rd-gen TKI) vs. amivantamab (EGFR×MET bispecific) vs. patritumab deruxtecan (HER3-ADC) |
So target is required and moa is an optional refinement filter. In immunology landscapes the analyst will often leave MoA empty; in oncology they will almost always set it.
Why a list of indications¶
The Dupixent franchise spans Atopic Dermatitis + Asthma + CRSwNP + EoE + Prurigo Nodularis + COPD. Monitoring at the franchise level needs all of them. But comparative evidence views — EASI-75, IGA 0/1 — are per-indication: endpoints don't carry across.
So:
Landscape.indicationsis a list.- Comparative evidence cards are scoped to one indication at a time (UI selector or per-indication routes).
- Cross-indication views (FranchisePortfolio) read the union.
- For the demo, narrow
immunology-001to["Atopic Dermatitis"]only. The head-to-head story stays clean and EASI-75 / IGA 0/1 are the right endpoints. Multi-indication support is a framework capability we don't need to exercise yet.
Why a competitive horizon¶
Without a phase cutoff, Open Targets returns 200+ molecules per target — including pre-clinical compounds with no signals in any of our sources. That noise dilutes the top-N evidence pipeline (the bug PR #65 patched). A horizon expressed as phase_1+ or phase_2+ keeps the candidate list at a useful 10–30 drugs.
Architecture¶
┌─────────────────────────────┐
│ Analyst input (one-time) │
│ • indications: [...] │
│ • targets: [...] │
│ • moa: [...] (optional) │
│ • horizon: phase ≥ X │
└──────────────┬──────────────┘
│
▼
┌──────────────────────────┐
│ discover_competitors.py │
│ (Open Targets graph) │
└──────────────┬───────────┘
│
▼
candidate_drugs: list[DrugProfile]
│
┌──────────────────────┼──────────────────────┐
▼ ▼ ▼
CT.gov PubMed Patents
query.cond + term + author assignee +
query.intr per per drug target keyword
drug + indication
│ │ │
└──────────────────────┴──────────────────────┘
│
▼
broad indication / target sweep
(gap-catcher for novel entrants
Open Targets has not indexed yet)
The disease/target sweep stays — it's the safety net for unknown-unknowns. But it stops being load-bearing for the known competitors.
Schema changes¶
class Landscape(SQLModel, table=True):
id: str
name: str
# NEW — replaces ad-hoc conditions string
indications: str # JSON list[str]
targets: str # JSON list[str] already exists, kept
moa: str | None # JSON list[str] NEW, optional
horizon: str # "phase_1+" | "phase_2+" | "phase_3+" | "approved"
# NEW — derived, not authored
candidate_drugs: str # JSON list[str], populated by discover_competitors.py
last_discovered_at: datetime | None
# DEPRECATED
conditions: str # remove after migration; was the bag-of-keywords
Consequences¶
Easier:
- Adding a new landscape becomes "fill in 3 fields"; the drug list falls out.
- The CT.gov pagination bug we just hit cannot recur — every drug we know about gets its own dedicated query.
- Audit trail: last_discovered_at plus a diff against the previous candidate_drugs shows when a new competitor entered the field.
Harder: - Open Targets becomes a hard dependency for setup. Currently it's one source among ten; it would become load-bearing for landscape configuration. Mitigation: cache the discovery output, fail open (use the last known list) on Open Targets outage. - Cold-start latency: setting up a landscape requires a synchronous Open Targets query before any other source can run. - MoA taxonomy alignment — the analyst input vocabulary must match Open Targets' (or be aliased on read).
Still unsolved (out of scope here): - Novel-mechanism competitors. A startup with a brand-new target Open Targets has not indexed. This is a separate problem solved by the temporal-signal layer: cluster anomalies in patent filings, conference abstracts mentioning unfamiliar code numbers, SEC 8-Ks with unrecognized drug names. The human review loop earns its keep there — not in roster maintenance.
Implementation phases¶
Phase 1 — schema + discovery (1–2 days)¶
- Add
indications,moa,horizon,candidate_drugs,last_discovered_attoLandscape. - Migration: copy
Landscape.indication(singular) intoindications(list of one). - Write
scripts/discover_competitors.py— Open Targets GraphQL query + DB upsert.
Phase 2 — source-side adoption (1 day)¶
ClinicalTrialsSourcereads bothindicationsandcandidate_drugs, runsquery.condper indication andquery.intrper drug.PubMedSource,OpenAlexSource,PatentsSourceextended similarly.
Phase 3 — landscape narrowing for the demo (15 min)¶
- Update
seed_immunology.pyto emitindications=["Atopic Dermatitis"]. - Drop the hardcoded competitor names from
_CONDITIONS. Rundiscover_competitors.py. - Verify CT.gov retrieval matches the archived DB shape.
Phase 4 — cleanup (1 hour)¶
- Remove
Landscape.conditionsafter grep confirms nothing reads it. - Update
docs/data-sources.mdto describe the discovery layer.
Open questions¶
- MoA taxonomy. Open Targets has its own MoA vocabulary (
agonist,antagonist,covalent inhibitor, etc.). Do we adopt theirs or define our own? Recommend: adopt theirs, alias on read. - Drug deduplication across indications. A drug studied in 4 Dupixent-franchise indications shouldn't appear 4× in
candidate_drugs. TheDrugSynonymtable already handles name-level dedup; we need indication-level set semantics. - Refresh cadence. When does
discover_competitors.pyre-run? Per seed? Weekly cron? Manual? Recommend: per seed for now (free, fast); revisit when Phase 3 watcher lands. - Phase as the horizon. Open Targets phase strings vs. our
SignalType.PHASE_TRANSITIONenum need a mapping table. - Multi-target drugs. A bispecific like amivantamab (EGFR×MET) belongs to two target buckets. Do we surface it once or twice in
candidate_drugs? Probably once, with both targets recorded — but UI implications need a separate decision.