Skip to content

ADR-0001: Inputs and Primitives for Landscape Monitoring Setup

Status: Proposed Date: 2026-04-27 Driver: Khalil Related: PR #65 (workaround), PR #48 (immunology demo target)

Context

seed_immunology.py currently hardcodes a list of competitor drug names into Landscape.conditions so that each one becomes its own CT.gov query.cond pass. This was a fix for a real failure — CT.gov pagination growth had pushed lebrikizumab / tralokinumab / nemolizumab past the 500-result cap of the broad "Atopic Dermatitis" query — but it is the wrong long-term primitive:

  1. Curation rot. Every landscape needs a complete drug roster maintained forever. A new IL-13 startup files an IND and we are blind until someone updates JSON. That defeats the monitoring premise.
  2. Per-source asymmetry. Even with a perfect drug list, each source has different blind spots (CT.gov pagination, PubMed for pre-clinical, SEC for company-keyed mentions, patents for code-numbered compounds).
  3. Wrong altitude of expert input. Analysts should describe the competitive boundary, not maintain a drug roster. Asking a BD analyst for a 50-row spreadsheet is a curation task; asking for indication + target + horizon is a 5-minute conversation.

We need a primitive set that scales across landscapes (immunology, oncology, etc.) and degrades gracefully when sources lose entries.

Decision

A landscape monitoring setup takes three inputs from the analyst:

  1. Indication boundary — a list of one or more indications that define the disease scope.
  2. Target and/or Mechanism of Action (MoA) — the molecular handle. Target is the molecule; MoA is what the drug does to it. Some indications are well-characterized by target alone; others require MoA to keep competitive views coherent.
  3. Competitive horizon — the development-stage cutoff (e.g. phase_1+, phase_2+, approved).

The drug list is derived, not authored. It comes from an Open Targets graph query keyed on (indication × target × MoA × phase ≥ horizon), refreshed on every reseed, with an audit trail.

Why target and/or MoA, not target alone

In some indications, target is enough. In others, MoA is the meaningful axis for what counts as a true competitor:

Indication Target MoA differentiation matters? Example
Atopic Dermatitis IL-13 No — lebrikizumab and tralokinumab compete directly
Atopic Dermatitis JAK Slightly — JAK1-selective vs. dual matters for safety positioning abrocitinib (JAK1) vs. baricitinib (JAK½)
NSCLC KRAS-G12C Yes — covalent inhibitors compete differently than pan-RAS sotorasib (covalent) vs. RMC-6236 (multi-RAS)
HER2+ breast HER2 Yes — mAb / ADC / TKI are distinct competitive sub-fields trastuzumab vs. T-DXd vs. tucatinib
NSCLC EGFR Yes — generation + format are everything osimertinib (3rd-gen TKI) vs. amivantamab (EGFR×MET bispecific) vs. patritumab deruxtecan (HER3-ADC)

So target is required and moa is an optional refinement filter. In immunology landscapes the analyst will often leave MoA empty; in oncology they will almost always set it.

Why a list of indications

The Dupixent franchise spans Atopic Dermatitis + Asthma + CRSwNP + EoE + Prurigo Nodularis + COPD. Monitoring at the franchise level needs all of them. But comparative evidence views — EASI-75, IGA 0/1 — are per-indication: endpoints don't carry across.

So:

  • Landscape.indications is a list.
  • Comparative evidence cards are scoped to one indication at a time (UI selector or per-indication routes).
  • Cross-indication views (FranchisePortfolio) read the union.
  • For the demo, narrow immunology-001 to ["Atopic Dermatitis"] only. The head-to-head story stays clean and EASI-75 / IGA 0/1 are the right endpoints. Multi-indication support is a framework capability we don't need to exercise yet.

Why a competitive horizon

Without a phase cutoff, Open Targets returns 200+ molecules per target — including pre-clinical compounds with no signals in any of our sources. That noise dilutes the top-N evidence pipeline (the bug PR #65 patched). A horizon expressed as phase_1+ or phase_2+ keeps the candidate list at a useful 10–30 drugs.

Architecture

              ┌─────────────────────────────┐
              │  Analyst input (one-time)   │
              │  • indications: [...]       │
              │  • targets: [...]           │
              │  • moa: [...] (optional)    │
              │  • horizon: phase ≥ X       │
              └──────────────┬──────────────┘
                ┌──────────────────────────┐
                │  discover_competitors.py │
                │  (Open Targets graph)    │
                └──────────────┬───────────┘
              candidate_drugs: list[DrugProfile]
        ┌──────────────────────┼──────────────────────┐
        ▼                      ▼                      ▼
   CT.gov                  PubMed                  Patents
   query.cond +            term + author           assignee +
   query.intr per          per drug                target keyword
   drug + indication
        │                      │                      │
        └──────────────────────┴──────────────────────┘
                broad indication / target sweep
                (gap-catcher for novel entrants
                 Open Targets has not indexed yet)

The disease/target sweep stays — it's the safety net for unknown-unknowns. But it stops being load-bearing for the known competitors.

Schema changes

class Landscape(SQLModel, table=True):
    id: str
    name: str

    # NEW — replaces ad-hoc conditions string
    indications: str       # JSON list[str]
    targets: str           # JSON list[str]   already exists, kept
    moa: str | None        # JSON list[str]   NEW, optional
    horizon: str           # "phase_1+" | "phase_2+" | "phase_3+" | "approved"

    # NEW — derived, not authored
    candidate_drugs: str   # JSON list[str], populated by discover_competitors.py
    last_discovered_at: datetime | None

    # DEPRECATED
    conditions: str        # remove after migration; was the bag-of-keywords

Consequences

Easier: - Adding a new landscape becomes "fill in 3 fields"; the drug list falls out. - The CT.gov pagination bug we just hit cannot recur — every drug we know about gets its own dedicated query. - Audit trail: last_discovered_at plus a diff against the previous candidate_drugs shows when a new competitor entered the field.

Harder: - Open Targets becomes a hard dependency for setup. Currently it's one source among ten; it would become load-bearing for landscape configuration. Mitigation: cache the discovery output, fail open (use the last known list) on Open Targets outage. - Cold-start latency: setting up a landscape requires a synchronous Open Targets query before any other source can run. - MoA taxonomy alignment — the analyst input vocabulary must match Open Targets' (or be aliased on read).

Still unsolved (out of scope here): - Novel-mechanism competitors. A startup with a brand-new target Open Targets has not indexed. This is a separate problem solved by the temporal-signal layer: cluster anomalies in patent filings, conference abstracts mentioning unfamiliar code numbers, SEC 8-Ks with unrecognized drug names. The human review loop earns its keep there — not in roster maintenance.

Implementation phases

Phase 1 — schema + discovery (1–2 days)

  • Add indications, moa, horizon, candidate_drugs, last_discovered_at to Landscape.
  • Migration: copy Landscape.indication (singular) into indications (list of one).
  • Write scripts/discover_competitors.py — Open Targets GraphQL query + DB upsert.

Phase 2 — source-side adoption (1 day)

  • ClinicalTrialsSource reads both indications and candidate_drugs, runs query.cond per indication and query.intr per drug.
  • PubMedSource, OpenAlexSource, PatentsSource extended similarly.

Phase 3 — landscape narrowing for the demo (15 min)

  • Update seed_immunology.py to emit indications=["Atopic Dermatitis"].
  • Drop the hardcoded competitor names from _CONDITIONS. Run discover_competitors.py.
  • Verify CT.gov retrieval matches the archived DB shape.

Phase 4 — cleanup (1 hour)

  • Remove Landscape.conditions after grep confirms nothing reads it.
  • Update docs/data-sources.md to describe the discovery layer.

Open questions

  1. MoA taxonomy. Open Targets has its own MoA vocabulary (agonist, antagonist, covalent inhibitor, etc.). Do we adopt theirs or define our own? Recommend: adopt theirs, alias on read.
  2. Drug deduplication across indications. A drug studied in 4 Dupixent-franchise indications shouldn't appear 4× in candidate_drugs. The DrugSynonym table already handles name-level dedup; we need indication-level set semantics.
  3. Refresh cadence. When does discover_competitors.py re-run? Per seed? Weekly cron? Manual? Recommend: per seed for now (free, fast); revisit when Phase 3 watcher lands.
  4. Phase as the horizon. Open Targets phase strings vs. our SignalType.PHASE_TRANSITION enum need a mapping table.
  5. Multi-target drugs. A bispecific like amivantamab (EGFR×MET) belongs to two target buckets. Do we surface it once or twice in candidate_drugs? Probably once, with both targets recorded — but UI implications need a separate decision.