ADR-0075 — Document-content adverse analysis (uploaded documents can RAISE risk)

Status: Accepted Date: 2026-06-28 Deciders: Compliance engineering (with the OB-Holding live-case oracle)

Context

The post-portal document path validated each uploaded document's type and extracted structured data (UBOs), emitting only "verified" findings. It never read the document content for adverse disclosures. Combined with the OSINT path being unable to reach some primary sources (e.g. Estonian/Lithuanian court and news sites are bot-blocked), this produced a dangerous asymmetry:

A recertification pack that explicitly disclosed "the EPPO has opened a criminal investigation … €31.6m frozen … €8.4m AML fine … funds paid into the company's own out-of-licence payment account" produced no risk signal. The documents were marked verified and the case stayed at MEDIUM/CDD — it could only ever lower risk, never raise it.

This was found while validating the OB Holding 1 OÜ case against a human analyst's primary-source-corroborated REJECT pack (research doc 2026-06-28-ob-holding-live-case-data-audit.md; oracle items R1/R2/R3/R12). The correct verdict is CRITICAL → REJECT; the system reached only MEDIUM because the decisive facts had no route into the engine.

The escalation machinery already existed and was unused for documents: escalate_entity_criminal_investigation (→ CRITICAL entity_criminal_investigation, a hard blocker) and red_flag:payment_account_outside_licence (→ EBA PAYMENT_ACCOUNT_OUTSIDE_LICENCE_CRITICAL floor). Both are classified DETERMINISTIC in finding_determinism.py.

Decision

Add a deterministic document-content adverse analyzer (app/services/document_adverse_analysis.py) that scans the Docling-extracted markdown of every validated document and surfaces adverse disclosures as findings, then feeds them to the existing escalators — it does not re-implement classification.

analyze_adverse_content(markdown) splits the text into sentences and flags those containing criminal-LE signals (reusing the escalator's exact _CRIMINAL_LE_SIGNALS vocabulary), AML-enforcement signals, asset-freeze signals, or source-of-funds signals — each as a finding that quotes the source sentence (auditable). Negation-guarded ("no criminal investigation", "cleared of" → skipped).
detect_payment_account_outside_licence(markdown) extracts the structured (licence_countries, payment_account_country) the existing red_flag_engine.evaluate_payment_account_jurisdiction needs.
extract_document_data runs both over all validated documents and returns adverse_findings + payment_account_signal.
refresh_synthesis_after_documents adds them, runs the existing escalate_entity_criminal_investigation (entity-named criminal disclosure → CRITICAL hard block), and builds the payment-account red flag.
The workflow runs a new risk-reassessment checkpoint 3 (post-document) that re-scores the full EBA matrix from the refreshed findings.

Why deterministic, not an LLM agent

Consistent with finding_determinism.py, which already treats these signals as DETERMINISTIC. For a hard REJECT blocker, a transparent, auditable, fail-closed rule that quotes its evidence is preferable to an opaque model that could silently miss a criminal-investigation disclosure. LLM enrichment (nuance, multi-language) is a documented follow-up, not a blocker.

Consequences

Uploaded documents can now raise risk. The OB-Holding text drives the EBA matrix to CRITICAL (score 90, PAYMENT_ACCOUNT_OUTSIDE_LICENCE_CRITICAL) plus a CRITICAL entity_criminal_investigation hard-block finding — the gold-standard REJECT verdict (proved end-to-end in test_document_adverse_analysis.py::test_end_to_end_document_drives_eba_critical).
Severity is only ever raised, never lowered — consistent with the standing "ADD scrutiny, never suppress" principle.
The analyzer is English-phrase-based today. Non-English source documents and paraphrased disclosures are a known recall gap → LLM-enrichment follow-up.
No new workflow-sandbox imports; the analyzer is pure (re + typing) and reuses the already-imported escalator.

References

ADR-0020 (EBA matrix), ADR-0073 (R12 payment-account-outside-licence), ADR-0019 (OSINT pipeline), finding_determinism.py, research doc docs/research/2026-06-28-ob-holding-live-case-data-audit.md.

Context​

Decision​

Why deterministic, not an LLM agent​

Consequences​

References​

Context

Decision

Why deterministic, not an LLM agent

Consequences

References