Architecture¶

Full system walkthrough. If a code reference below conflicts with what's in the repo, trust the code — ping the doc owner to fix.

§1 Ten-thousand-foot view¶

┌──────────────────────────────────────────────────────────────────────────┐
│                         INGESTION (seed scripts)                         │
│                                                                          │
│    ClinicalTrials.gov ─┐                                                 │
│    PubMed             ─┤                                                 │
│    OpenFDA            ─┤                                                 │
│    Open Targets       ─┼──► Source adapters ──► normalize to Signal ──►  │
│    Europe PMC         ─┤    (ogur/sources/)       content_hash dedup     │
│    OpenAlex           ─┤                                                 │
│    SEC EDGAR          ─┤                                                 │
│    Lens / EPO OPS     ─┤                                                 │
│    Holo3 (Playwright) ─┘                                                 │
│                                                                          │
└──────────────────────────────────┬───────────────────────────────────────┘
                                   │
                                   ▼  SQLite (ogur.db) — Signals + Profiles + Targets
                                   │
┌──────────────────────────────────┼───────────────────────────────────────┐
│              INTELLIGENCE ENGINE (ogur/engine/)                          │
│                                                                          │
│  ChangeDetector ──► AgentOrchestrator ──► Enricher ──► Synthesizer       │
│  (pure Python)     (5 DomainAgents,        (pure        (Sonnet —        │
│                     Haiku scoring)          Python)      streaming,      │
│                                                          KIQ-aware)      │
│                                                            │             │
│                         KIQs + ENTITY CATALOG ─────────────┤             │
│                                                            ▼             │
│                                         Verification gate (KIQ shape +   │
│                                          entity-ref check)               │
│                                                                          │
│  Evidence pipeline (CLI-only) ──► extractor/ ──► EvidenceRecord +        │
│                                   (Haiku + GLiNER)  ProtocolProfile      │
│                                                                          │
│                         QueryEngine (Haiku, ad-hoc Q&A)                  │
└──────────────────────────────────┬───────────────────────────────────────┘
                                   │
                                   ▼  Briefing rows, per-tab analyses, ask responses
                                   │
┌──────────────────────────────────┼───────────────────────────────────────┐
│                       API (ogur/api/ — FastAPI)                          │
│   /health · /api/signals · /api/briefing/… · /api/ask ·                  │
│   /api/landscapes/{id}/evidence/comparative                              │
└──────────────────────────────────┬───────────────────────────────────────┘
                                   │
                                   ▼  HTTP/JSON
                                   │
┌──────────────────────────────────┼───────────────────────────────────────┐
│                  FRONTEND (frontend/ — Vite + React)                     │
│   Portfolio  ·  Asset Detail (6 tabs)  ·  Global Signals  ·  Global Ask  │
└──────────────────────────────────────────────────────────────────────────┘

Three hard boundaries: 1. Ingestion is separate from synthesis. scripts/seed_* scrapes; scripts/generate_briefing reads. The pipeline never re-scrapes. 2. DetectedChange is a value object. It references a Signal row — it never creates one. The dedup invariant (SHA-256 content hash, unique in DB) is preserved through the pipeline. 3. Engine writes to DB only through store/. FastAPI routes call engine functions, not DB queries, except for read-only list endpoints.

What this engine is not tuned for¶

The pipeline above is the Monitor-mode engine: it operates on a pre-seeded landscape_id (e.g. nsclc-001) with a fixed asset list curated up front. ChangeDetector reads from a Signal table populated by a per-landscape seed script.

The frontend also ships an Explore mode (/explore, see docs/design/ux-spec.md §1.5) whose intent is ad-hoc landscape mapping over the full drug × disease space. That use case would need a candidate-retrieval stage (vector search over the entire Signal corpus + structured pre-filter on target / modality / phase) sitting in front of the synthesis pipeline. No such stage exists today. Explore mode currently runs on frontend mock data only — fine for the M1/M2 demo; a real backend buildout is tracked as future work and should not be promised to customers without scoping it first.

§2 Data model¶

SQLModel tables in ogur/models/:

`Signal` (signal.py)¶

The atomic unit of intelligence. Every source normalizes to this.

id               UUID, primary key
source           "clinicaltrials" | "pubmed" | "openfda" | "opentargets" |
                 "conferences" | "openalex" | "sec" | "lens" | "epo" |
                 "holo_conference" | "holo_pipeline"
source_id        Original source ID (NCT, PMID, accession, etc.)
signal_type      Enum — 20+ types (see below)
severity         "high" | "medium" | "low"
drug_name        Normalized generic (e.g. "pembrolizumab")
drug_brand_name  "Keytruda"
company          "Merck"
indication       "Non-Small Cell Lung Cancer"
target           "PD-1"
moa              One-sentence mechanism of action
phase            "Phase 3", "Approved", etc.
title            str
summary          str
raw_data         JSON string — full source payload (re-parse if normalization changes)
detected_at      timestamp ingestion saw it
event_date       timestamp the underlying event happened (if known)
landscape_id     FK to Landscape
content_hash     SHA-256 truncated to 16 hex — UNIQUE constraint

SignalType enum (signal.py:7):

Category	Types
Trial lifecycle	`phase_transition`, `trial_registered`, `trial_amendment`, `trial_status_change`, `trial_enrollment`, `protocol_amendment`
Literature	`publication`, `conference_abstract`
Regulatory	`fda_approval`, `label_change`, `safety_signal`, `regulatory_event`
Pipeline	`pipeline_update`, `early_pipeline`
Corporate	`press_release`, `ma_announcement`, `licensing_deal`, `investment_round`, `leadership_change`
IP	`patent_filing`
Visual intelligence	`earnings_narrative`, `job_posting`, `kol_activity`

`DrugProfile` + `DrugSynonym` (drug.py)¶

Assembled from signals via scripts/build_drug_profiles.py. DrugSynonym maps every known alias ("Keytruda", "MK-3475", "lambrolizumab", "pembro") to the canonical generic name, so signals from different sources coalesce.

`CompanyProfile` (company.py)¶

Primary-keyed by normalized_name (no UUID — company name is the natural identifier). Aggregates recent_deals JSON list capped at 20.

`Target` + `DrugTarget` (target.py)¶

Bipartite graph: Target is a gene/protein node (HGNC symbol PK), DrugTarget is a weighted edge where evidence_count increments when a new source confirms the same pair. Built by scripts/build_target_graph.py from DrugProfile data.

`Briefing` (briefing.py)¶

Stores the synthesizer's structured JSON as columns. landscape_id is overloaded to encode composite identifiers:

`landscape_id` format	Produced by
`nsclc-001`	Landscape-level briefing (`generate_briefing.py`)
`nsclc-001-pembrolizumab`	Drug-level briefing (`generate_drug_briefing.py`)
`nsclc-001-pembrolizumab-overview`	Per-tab analyzer (overview)
`nsclc-001-pembrolizumab-trials`	Per-tab analyzer (trials)
`nsclc-001-pembrolizumab-competitive`	Per-tab analyzer (competitive)

Full column inventory:

Synthesis fields — executive_summary, signal_analyses (JSON list), strategic_implications, watchlist (JSON list), predictions (JSON list). These mirror the synthesizer's structured output.
Harness fields (added with the verification gate):
kiq_answers — JSON list of structured KIQ responses (one per active KIQ; see §4.8).
schema_valid — tri-state (None / True / False). None means no validator ran (e.g. a briefing for a landscape with no KIQs); True/False mean the verification gate passed/failed after synthesis retries.
schema_errors — JSON list of error strings emitted by the verification gate (see §4.8).
Window + metadata — period_start, period_end (the briefing's lookback window), signals_count (post-classification count), model_used (the synthesizer model ID at generation time, e.g. claude-sonnet-4-6), generated_at (default datetime.utcnow() at insert).
Identity — id (UUID PK), landscape_id (indexed; overloaded for composite keys per the table above).

`KIQ` (kiq.py)¶

A Key Intelligence Question — the intent-capture layer. One row per question per scope: question text, time_horizon enum (TACTICAL / OPERATIONAL / STRATEGIC), priority int, active bool. Seeded by scripts/seed_kiqs.py with stable IDs, so re-runs idempotently merge instead of duplicating. Every briefing is generated against the active KIQs for its scope, and the synthesizer produces one structured answer block per active KIQ.

Path C — landscape-level vs. drug-specific KIQs. KIQ scoping mirrors the Briefing landscape_id overload. Class-level questions ("How is the JAK inhibitor class evolving?") attach to the parent landscape (immunology-001). Drug-specific questions ("How is dupilumab differentiating against lebrikizumab?") attach to the drug-composite ID (immunology-001-dupilumab). When generate_drug_briefing.py runs for dupilumab, it loads KIQs keyed on immunology-001-dupilumab only — class-level KIQs do not bleed into individual drug briefings. The one-time migration that moved legacy parent-only KIQs into the new scopes is scripts/migrate_immunology_kiqs.py.

`EvidenceRecord` + `ProtocolProfile` (evidence.py)¶

Stored separately from Briefing — these are the durable "structured trial outcomes" tables.

ProtocolProfile — one row per trial (PK trial_id). Holds parsed protocol fields: population, line of therapy, primary endpoint, biomarker selection, blinding, etc.
EvidenceRecord — N rows per trial. Each is a single (arm, endpoint, value, unit, CI, HR, p, comparator) outcome with a raw_excerpt provenance string so the UI can show "this number came from here." Content-hashed on (trial_id, drug, endpoint, arm, subgroup, source_id) for dedup.

Briefing references trials by NCT ID inside signal_analyses / predictions prose; it does not hold foreign keys into Evidence. The relationship is intentionally one-way — Evidence is a query target, not a Briefing dependency.

`Landscape` (landscape.py)¶

Scope definition — indication, therapeutic area, tracked conditions/targets/companies (all stored as JSON strings). Currently two are seeded: nsclc-001 (oncology) and immunology-001 (dupilumab / atopic dermatitis).

§3 Ingestion¶

The `Source` base class (ogur/sources/base.py)¶

Abstract fetch(landscape) → list[Signal] contract. Each source: - Gets a shared httpx.AsyncClient with 30 s timeout - Uses tenacity retry: 3 attempts, exponential backoff 2→10 s, does not retry on 4xx client errors except as noted per source - Computes content_hash via Source.compute_hash(source, source_id, signal_type) — deterministic, 16 hex chars from SHA-256

Source catalog¶

Ten production sources implemented in ogur/sources/, one file per source — except patents.py, which is a single adapter producing two Signal.source values (lens and epo) depending on which upstream returned the row. The base.py and visual_base.py files are abstract bases, not sources themselves.

See data-sources.md for per-source authentication, rate limits, and external documentation links.

Seed scripts¶

scripts/seed_nsclc.py — runs every source concurrently for the nsclc-001 landscape, logs counts per source, writes rows via upsert_signal (ignores duplicates on content_hash).
scripts/seed_immunology.py — same pattern for immunology-001.
Source failures are caught and logged — one failing API never crashes the whole run.

DrugProfile assembly¶

scripts/build_drug_profiles.py reads the signals already in the DB and aggregates one DrugProfile per distinct (normalized_name, landscape) — picking the most advanced phase seen, the most frequent company attribution, and the first target hit. It never hits the network.

§4 Intelligence engine¶

The core pipeline lives in ogur/engine/pipeline.py and chains four stages.

§4.1 Detect (detector.py)¶

Pure Python, no LLM calls. ChangeDetector.detect(since):

Reads all Signals + DrugProfiles from the DB (one transaction each).
Runs seven detectors producing DetectedChange value objects:
_detect_new_drugs — drugs with signals but no DrugProfile row
_detect_phase_changes — signal phase > profile phase (uses _PHASE_RANK table)
_detect_trial_status_changes — TRIAL_STATUS_CHANGE signals within window
_detect_regulatory_events — approvals, label changes, safety signals
_detect_new_publications — publications tagged to tracked drugs
_detect_corporate_events — SEC filings, M&A, licensing deals, leadership changes
_detect_patent_filings — Lens / EPO patent rows within window
Sorts by severity (high → medium → low).
Returns list[DetectedChange] — each holds a reference to the existing Signal row.

Why a value object and not a DB table? Because the same underlying Signal can be a "change" multiple times (e.g., phase transition + regulatory event). A transient in-memory object keeps the dedup invariant simple.

§4.2 Classify (AgentOrchestrator)¶

Routes each DetectedChange to a DomainAgent based on signal.source, then each agent scores its own batch with a Haiku-class LLM using a domain-specific prompt suffix.

Domain	Sources	Agent class
`clinical`	`clinicaltrials`	ClinicalAgent
`regulatory`	`openfda`	RegulatoryAgent
`scientific`	`pubmed`, `conferences`	ScientificAgent
`biological`	`opentargets`	BiologicalAgent
`company`	`sec`	CompanyAgent

Scoring: 1 = noise, 10 = critical. Threshold: ≥ 5. After scoring, the orchestrator rebalances by source quota (configured in settings.min_signals_per_source) so e.g. PubMed always has ≥ 2 slots in the final cut, then caps at max_signals_for_synthesis (default 20).

Concurrent chunked dispatch. SignalClassifier splits each agent's batch into 50-entry chunks and dispatches them in parallel through a ThreadPoolExecutor (5 workers). One Haiku timeout no longer tanks the whole run — only the affected chunk falls back to neutral score 5, and the classifier only fails outright if every chunk fails. This was added when batch sizes started bumping into the SDK's non-streaming token ceiling.

Fallback: If the Haiku call throws for a whole agent, each agent drops to severity-ordering so the pipeline completes. See DomainAgent.classify.

§4.3 Enrich (enricher.py)¶

Pure Python, DB reads only. For each scored DetectedChange, produces an EnrichedChange with:

related_signals — other signals on the same drug within the window
drug_profile — the cached DrugProfile row
company_profile — the cached CompanyProfile row (if any)
competitor_context — other drugs in the same landscape at comparable phase
enrichment_sources — names of the sources that contributed to this change

§4.4 Synthesize (synthesizer.py)¶

One streamed Sonnet call. Takes enriched changes + drug profiles + active KIQs + a KnownIds catalog → structured JSON with:

executive_summary
signal_analyses[] { signal_id, drug, headline, what_happened, why_it_matters,
                    cross_source_connections, confidence, severity }
strategic_implications
watchlist[]
predictions[]
kiq_answers[] { kiq_id, finding, evidence, uncertainty, implication, confidence }
metadata { signals_count, model, generated_at }

The prompt (see synthesizer.py:19) emphasizes: lead with what changed, connect across sources, be specific (drug names, NCT, dates), assess confidence, flag what to watch next.

ENTITY CATALOG. The pipeline pre-loads canonical drug, company, and target IDs from the DB (see §4.8) and injects them as a structured block in the system prompt. The synthesizer is told to reference entities only by these IDs, which keeps EntityChips on the frontend resolvable without fuzzy matching.

Streaming transport. Synthesis switched to streaming after intermittent pre-response timeouts on long contexts (issue #32). Streaming also keeps the --mock-llm / --capture-fixture dev-loop short-circuits cheap — fixtures replay token-by-token in test mode.

Schema-retry loop. The synthesizer wraps generation in a verification gate (see §4.8). If validate_kiq_answers rejects the output, the synthesizer retries up to MAX_SCHEMA_RETRIES (= 2) with the validator's error messages appended to the prompt. Entity-reference mismatches are logged but no longer drive retries — they were too aggressive a gate, since prose can legitimately mention an entity not in the catalog.

Upsert to DB via briefings store — the primary key is landscape_id, so the latest briefing replaces the previous one. schema_valid and schema_errors are persisted alongside so the API can surface gate state.

§4.5 Query (query.py)¶

Ad-hoc Q&A endpoint. Given a question + landscape_id:

Extract keywords (stopword-stripped).
Pre-filter signals by DrugSynonym lookup + landscape target list (this is the vector-search seam — when we swap to embeddings, only this step changes).
Score remaining signals by keyword overlap + recency decay.
Stuff top-N signals into a Haiku prompt.
Return {answer, key_signals[], sources_used}.

Also pulls the latest landscape Briefing into context so answers inherit recent synthesis.

§4.6 Per-tab analyzers (ogur/engine/analyzers/)¶

For each drug, three parallel analyzers produce AssetDetail tab content — each uses Haiku, because this is structured extraction, not cross-drug synthesis:

Analyzer	Output shape (schemas.py)
OverviewAnalyzer	`OverviewOut` — MoA, drug class, key differentiators, safety signals, data gaps
TrialsAnalyzer	`TrialsOut` — list of `TrialRecord` (NCT, phase, status, primary endpoint)
CompetitiveAnalyzer	`CompetitiveOut` — competitors, route_matrix, threat_register, white_space

Results are cached as Briefing rows with composite landscape_id (see §2 data model).

§4.7 Evidence pipeline (evidence_pipeline.py)¶

A separate, CLI-only orchestrator that builds the durable structured-trial-outcomes layer. Run via scripts/run_evidence_pipeline.py; not invoked during a normal briefing.

For each landscape, it:

Reads top-N drugs by signal volume from the DB.
Pre-filters signals before any LLM call via the _OUTCOME_BEARING_SIGNAL_TYPES frozenset. Currently narrow: only PUBLICATION and CONFERENCE_ABSTRACT pass through to the outcomes extractor. Press releases, licensing deals, patent filings, label changes, regulatory events, and trial-registration / status-change rows are all skipped here — they either yielded zero-outcome payloads or are handled by the CT.gov protocol-parser path below. The narrow allow-list dominated ~60% of immunology-001 spend before it was added; expanding it (to e.g. LABEL_CHANGE, FDA_APPROVAL) is intentional future work, not an oversight. The eligibility split (signals_processed / signals_skipped_ineligible) is surfaced in the per-drug pilot report so the filter doesn't silently swallow signal volume.
For ClinicalTrials.gov v2 signals, parses the protocol JSON via parsers/ct_gov_v2.py and upserts a ProtocolProfile.
For abstract / label / conference text, calls OutcomesExtractor (Haiku tool-use) and upserts EvidenceRecord rows.
Idempotent: dedup is via compute_evidence_content_hash(...). --force-replace overwrites existing rows for prompt-iteration runs.
Emits a per-drug + aggregate markdown report (confidence distribution, null-field rate, error breakdown) — written to evals/pilot_immunology/report.md by default.

Budget gates: --budget-usd halts the run if the live API spend crosses a threshold, and --dry-run calls the extractors without DB writes for cost estimation.

Why a separate pipeline? Evidence is expensive to build (Haiku per outcome) and rarely changes once written. Coupling it to the briefing pipeline would re-pay that cost every cycle. Evidence accumulates in its own tables and is queried on demand by /api/assets/{drug}/competitors/evidence and the matcher (which scores trial-protocol similarity for competitive-context callouts).

§4.8 KIQs + verification gate¶

KIQs (kiq.py, kiqs store) are the intent-capture surface for the briefing — see §2 data model. Loaded by the pipeline at the start of each run via list_kiqs(landscape_id, active_only=True).

Verification (verification.py) is two pure-Python validators that run between the synthesizer's response and persistence:

validate_kiq_answers(kiq_answers, expected_kiqs) — checks every active KIQ has an answer with required keys (kiq_id, finding, evidence, uncertainty, implication, confidence), confidence is in {high, medium, low}, and finding / evidence clear minimum lengths. Drives retries in the synthesizer.
validate_entity_references(text_blocks, known_ids) — ensures drug / company / target IDs in synthesizer prose resolve in the catalog and any NCT IDs match the regex. Warn-only post-issue-#32 — logs to schema_errors but doesn't retry.

KnownIds is a TypedDict prepared once by the pipeline (pipeline.py) using list_drugs / list_companies / list_targets from the stores, then handed to both the synthesizer (for ENTITY CATALOG injection) and the verification gate. Single source of truth ⇒ no drift between what the prompt advertises and what the validator accepts.

§4.9 Entity & outcomes extractors (ogur/engine/extractor/)¶

Sub-package owned by the evidence pipeline. None of these run during a normal briefing.

Module	Role
`entity_extractor.py`	Biomarker / mutation / drug-target / indication NER. Dual backend: GLiNER local model (F1 ≈ 0.88 on the BIOPSY-derived gold set) or a Claude fallback when ML deps aren't installed.
`compound_span_postprocessor.py`	Splits gene+mutation compounds (`KRAS G12C-mutated` → adds standalone `KRAS` biomarker span). Curated oncology gene list; never replaces original spans.
`outcomes_extractor.py`	Structures (endpoint, arm, value, unit, CI, HR, p, comparator) tuples from abstract / label / conference text via Haiku tool-use. Preserves `raw_excerpt` for UI provenance. Handles non-efficacy categories (safety, PK, PRO) and "not reached" time-to-event patterns.
`parsers/ct_gov_v2.py`	Pure-Python parser over ClinicalTrials.gov v2 JSON → `ProtocolProfile` fields. No LLM.
`matcher.py`	Compute-on-read similarity score between two trial protocols, with weighted field comparisons (endpoint > comparator > population > biomarker > histology > LoT > blinding).
`eval.py`, `outcomes_eval.py`	Eval harnesses. See evals.md.

§5 API layer (ogur/api/)¶

Single FastAPI app constructed in app.py. create_tables() on startup creates the SQLite schema if missing. Four routers under /api: signals, briefings, query, and evidence (head-to-head card route — see below). See api-reference.md for every endpoint.

Background tasks. POST endpoints that trigger generation (briefing, drug briefing, per-tab analyzer) return 202 Accepted and enqueue the work via FastAPI BackgroundTasks — synchronous for now; APScheduler is Phase 3 full.

Comparative evidence (routes/evidence.py). GET /api/landscapes/{landscape_id}/evidence/comparative produces head-to-head cards from EvidenceRecord rows. The route is deterministic, no LLM — for each (drug, headline endpoint) pair it picks the highest-confidence row, then most-recent on tie. Drugs without paired evidence are omitted entirely; rows without a comparator_value are filtered out (single-arm rows would render misleading H2H cards). Endpoint-name fuzzing handles trial-arm variations ("EASI-75 at week 16" → "EASI-75"). Per-landscape config (headline endpoints, included drugs) is currently a v1 hardcoded dict in the route module — keyed by landscape_id. Move to DB-stored config when more than two landscapes need cards.

§6 Frontend (frontend/)¶

Vite + React 18 + TypeScript. State via Zustand + TanStack Query. Four top-level views, one Inspector panel. See frontend.md and design/ux-spec.md.

The frontend talks to the backend via the Vite dev-server proxy (/api/* → http://localhost:8000). The universal Inspector panel is a 380 px right rail with a Zustand-driven content slot that context-switches on object type (drug, trial, signal, company).

§7 Model tiering & cost¶

Stage	Model	Cost per run (approx)	Why this tier
Classifier + DomainAgents	`claude-haiku-4-5-20251001`	~$0.002	Batch scoring; each signal scored once
Per-tab analyzers	Haiku	~$0.01 per (drug, tab)	Structured extraction, not cross-drug reasoning
QueryEngine	Haiku	~$0.001 per question	Interactive, needs to feel fast
Synthesizer	`claude-sonnet-4-6`	~$0.05–0.20	Cross-source synthesis, long context
Holo3 visual extraction	`holo3-122b-a10b`	Varies by screenshot count	Vision over IR pages + congress portals

Model IDs live in config.py — changing either is one line.

§8 Dedup invariants¶

The dedup contract is load-bearing. Three invariants:

Every Signal has a content_hash and it's UNIQUE at DB level. Enforced by the SQLModel Field(unique=True) constraint. Violation ⇒ IntegrityError on insert, which upsert_signal catches and treats as a dedup hit.
content_hash = sha256(source + source_id + signal_type)[:16]. Deterministic. Two runs of the same source producing the same signal collapse. A signal that genuinely changes (phase transition, new abstract) produces a new hash because source_id or signal_type differs.
The engine never inserts Signal rows. It only reads them and wraps them in DetectedChange / EnrichedChange. If this ever changes, the invariant breaks and the dedup contract must be re-derived.

§9 Testing architecture¶

In-memory SQLite + StaticPool so every Session shares one connection — required because get_session() is called in multiple places (detector, enricher, query, stores) and each call gets its own Session.

Fixtures in tests/conftest.py:

Fixture	Provides
`db_engine`	Fresh in-memory engine per test, all tables created
`db_session`	A `Session` bound to `db_engine`
`mock_get_session`	A context manager that wraps `db_engine` — injected via patch
`patch_db`	Patches `get_session` in every engine + store module
`patch_sources_db`	Patches `get_session` in source modules that call the DB
`api_client`	FastAPI `TestClient` with `get_db` overridden to use the test DB

Golden rule: patch where the symbol is used (e.g. ogur.engine.detector.get_session), not where it's defined (ogur.store.database.get_session). Otherwise imports happen before the patch takes effect.

See testing.md for usage patterns.

§10 What's deliberately not built¶

No node-edge graph visualization in UI. The knowledge graph (Target ↔ Drug ↔ Signal) is infrastructure, used for retrieval + classification. Users see typed cards and EntityChips. See design/ux-spec.md §6.4.
No COGS, KOL sentiment, or internal clinical data. Public sources only.
No chatbot. Synthesis is structured JSON → rendered cards. QueryEngine answers ad-hoc questions but doesn't hold state.
No scheduled ingestion yet. Cron-triggered seed + briefing is Phase 3 full.