ADR-0075 — Document-content adverse analysis (uploaded documents can RAISE risk)
Status: Accepted Date: 2026-06-28 Deciders: Compliance engineering (with the OB-Holding live-case oracle)
Context
The post-portal document path validated each uploaded document's type and extracted structured data (UBOs), emitting only "verified" findings. It never read the document content for adverse disclosures. Combined with the OSINT path being unable to reach some primary sources (e.g. Estonian/Lithuanian court and news sites are bot-blocked), this produced a dangerous asymmetry:
A recertification pack that explicitly disclosed "the EPPO has opened a criminal investigation … €31.6m frozen … €8.4m AML fine … funds paid into the company's own out-of-licence payment account" produced no risk signal. The documents were marked verified and the case stayed at MEDIUM/CDD — it could only ever lower risk, never raise it.
This was found while validating the OB Holding 1 OÜ case against a human
analyst's primary-source-corroborated REJECT pack (research doc
2026-06-28-ob-holding-live-case-data-audit.md; oracle items R1/R2/R3/R12). The
correct verdict is CRITICAL → REJECT; the system reached only MEDIUM because
the decisive facts had no route into the engine.
The escalation machinery already existed and was unused for documents:
escalate_entity_criminal_investigation (→ CRITICAL entity_criminal_investigation,
a hard blocker) and red_flag:payment_account_outside_licence (→ EBA
PAYMENT_ACCOUNT_OUTSIDE_LICENCE_CRITICAL floor). Both are classified
DETERMINISTIC in finding_determinism.py.
Decision
Add a deterministic document-content adverse analyzer
(app/services/document_adverse_analysis.py) that scans the Docling-extracted
markdown of every validated document and surfaces adverse disclosures as
findings, then feeds them to the existing escalators — it does not
re-implement classification.
analyze_adverse_content(markdown)splits the text into sentences and flags those containing criminal-LE signals (reusing the escalator's exact_CRIMINAL_LE_SIGNALSvocabulary), AML-enforcement signals, asset-freeze signals, or source-of-funds signals — each as a finding that quotes the source sentence (auditable). Negation-guarded ("no criminal investigation", "cleared of" → skipped).detect_payment_account_outside_licence(markdown)extracts the structured (licence_countries, payment_account_country) the existingred_flag_engine.evaluate_payment_account_jurisdictionneeds.extract_document_dataruns both over all validated documents and returnsadverse_findings+payment_account_signal.refresh_synthesis_after_documentsadds them, runs the existingescalate_entity_criminal_investigation(entity-named criminal disclosure → CRITICAL hard block), and builds the payment-account red flag.- The workflow runs a new risk-reassessment checkpoint 3 (post-document) that re-scores the full EBA matrix from the refreshed findings.
Why deterministic, not an LLM agent
Consistent with finding_determinism.py, which already treats these signals as
DETERMINISTIC. For a hard REJECT blocker, a transparent, auditable, fail-closed
rule that quotes its evidence is preferable to an opaque model that could
silently miss a criminal-investigation disclosure. LLM enrichment (nuance,
multi-language) is a documented follow-up, not a blocker.
Consequences
- Uploaded documents can now raise risk. The OB-Holding text drives the EBA
matrix to CRITICAL (score 90,
PAYMENT_ACCOUNT_OUTSIDE_LICENCE_CRITICAL) plus a CRITICALentity_criminal_investigationhard-block finding — the gold-standard REJECT verdict (proved end-to-end intest_document_adverse_analysis.py::test_end_to_end_document_drives_eba_critical). - Severity is only ever raised, never lowered — consistent with the standing "ADD scrutiny, never suppress" principle.
- The analyzer is English-phrase-based today. Non-English source documents and paraphrased disclosures are a known recall gap → LLM-enrichment follow-up.
- No new workflow-sandbox imports; the analyzer is pure (re + typing) and reuses the already-imported escalator.
References
ADR-0020 (EBA matrix), ADR-0073 (R12 payment-account-outside-licence),
ADR-0019 (OSINT pipeline), finding_determinism.py, research doc
docs/research/2026-06-28-ob-holding-live-case-data-audit.md.