Skip to main content

ADR-0078 — Alias / brand / group expansion of the screened entity set

Status: Accepted Date: 2026-06-28 Deciders: Compliance engineering (with the OB-Holding reachability investigation oracle) Context source: docs/research/2026-06-28-ob-holding-reachability-investigation.md

Context

This is the keystone finding of the 2026-06-28 reachability investigation, and the single highest-leverage fix.

osint_agent.py:_safe_adverse_media (~line 1362) passes only company_name + unique_directors to the adverse-media agent. For OB Holding 1 OÜ that means the only entities ever screened are the string "OB Holding 1 OÜ" and its two board members. But the adverse facts legally attach to the brand and operating group, not the holding's legal name:

  • the €31.6m EPPO freeze is on "OlyBet Eesti";
  • the €8.4m fine is on "Olympic Casino Group Baltija" (Lithuania);
  • the group brand is "OlyBet" / "Olympic Entertainment Group AS".

None of these strings is ever searched. No provider upgrade (ADR-0077) can surface or attribute these facts without first widening the screened entity set to the brand/group/subsidiaries. Provider work without alias expansion is wasted.

This sits directly against an existing guard. ADR-0073 R9 (name-collision guard): "No parent / UBO / related-entity attribution may be resolved on name similarity alone. Resolution requires a corroborating identifier — registration number, jurisdiction match, or LEI. A name-only candidate is surfaced as an unverified candidate, never folded into the resolved chain." Naïvely screening a bare brand string ("Olympic Casino") risks pulling other entities' adverse media into OB Holding's bucket — the precise harm R9 forbids. The adversarial reviewer flagged exactly this: an "LLM-guessed alias" would be the "present unverified as pipeline-found" failure the project's honesty principle prohibits.

Decision

Expand the screened entity set from the legal name to its brand + operating subsidiaries + group, with provenance and a two-lane attribution model that reconciles fully with ADR-0073 R9. The cardinal split: recall (we search the brand) is separated from attribution (whether a hit counts as the subject's risk).

  1. aliases parameter + two lanes. Add an aliases: list[Alias] parameter to run_adverse_media_agent, where each Alias carries name, relation (brand / subsidiary / parent / former-name), source, and corroborating_identifier (the reg-no / jurisdiction / LEI linking it to the subject, or None). Every alias is searched (recall); hits are then routed by lane:

    • Verified lane — the alias→subject link carries a corroborating identifier → hits may be attributed to the subject via the shown group/brand chain (see #3).
    • Unverified-candidate lane — no corroborating identifier (e.g. a bare brand string from NorthData/website extraction) → hits are surfaced as labelled unverified candidates only: they never enter _subject_identifiers, never merge into the subject's resolved bucket, and never drive the subject's verdict. This closes the loophole where deduping every alias hit "into the subject's bucket" would let a bare brand name silently escalate the holding.
  2. Aliases come only from evidenced sources — never an LLM judgement.

    • Admissible verified-lane sources: the subject's own registry identity (registered_name + registered former names — a value computed inside the per-country registry service, e.g. ee_ariregister_service.py, which must be surfaced into the investigation data structure; it is not today a uniform registry_data dict key, so wiring it through is part of this work), and bitemporal-graph IS_SUBSIDIARY_OF/parent edges (graph_service) that carry reg-no/jurisdiction/LEI provenance.
    • NorthData / website_validation brand strings are unverified-lane by default unless an independent corroborating identifier is attached.
    • No "brand-resolution LLM call." Where an alias string is extracted from a NorthData/website page, extraction is restricted to verbatim strings present in the source, and the relation to the subject must still carry an independent corroborating identifier before it enters the verified lane — an LLM may neither invent the string nor assert the relationship.
  3. Provenance-gated attribution via a new escalator tier — operationalised, not just asserted. osint_post_processing.py:escalate_entity_criminal_investigation gains a third attribution tier, subject-via-verified-group-chain, distinct from its existing text-identifier-CRITICAL (subject name/reg-no present in the article) and unattributed-HIGH (related/unattributed) tiers. A verified-lane alias hit escalates the subject only when the alias→subject link carries a corroborating identifier, and the finding carries the shown evidence chain — e.g. "criminal-investigation article names OlyBet Eesti (reg-no <X>); OlyBet Eesti linked to subject OB Holding 1 OÜ (reg-no 14975047) via graph edge IS_SUBSIDIARY_OF." Note the chain corroborates across two distinct identifiers; the subject's own reg-no alone is not the link. Unverified-lane hits never reach this tier, alias strings are never added to _subject_identifiers, and we never silently relabel another entity's enforcement as the holding's.

  4. No-edge case is an honest gap, not a guess. If the graph exposes no provenanced parent/subsidiary edge and the registry no group identity, the verified lane yields nothing — surfaced as a coverage note (ADR-0067), never patched with an LLM guess. A dedicated graph brand/"trades-as" edge type does not exist today (only IS_SUBSIDIARY_OF); strengthening graph brand/subsidiary edges for gambling-group structures is a related follow-up. Until it lands, brand aliases lacking a reg-no/jurisdiction/LEI stay in the unverified lane.

  5. Person aliases inherit a person-identifier guard (GDPR). Director/person adverse queries (ADR-0077) screen natural persons by name; because this is criminal-offence data (GDPR Art. 10) feeding an Art. 22 automated decision, a person hit is bound to the subject's persons only with identifier-level corroboration (DOB / role-in-entity), mirroring R9 for persons. Without it the hit is an unverified candidate for officer review (ADR-0014 human oversight). All resulting findings inherit the ADR-0071 default-deny customer surface, and alias-sourced hits + their evidence chains persist as append-only ScreeningResult records (ADR-0063/0064, 5-yr retention).

Consequences

  • Unlocks the value of ADR-0077: with OlyBet/Olympic in the screened set, the €31.6m freeze and €8.4m fine become findable (recall), and attributable to OB Holding only via the verified-group-chain tier when a corroborated IS_SUBSIDIARY_OF edge exists. Honest dependency: if that edge is not yet in the graph for this entity, the hit correctly stays an unverified candidate / unattributed-HIGH rather than a false-certain subject-CRITICAL — surfacing the OlyBet freeze for the officer without over-asserting it as the holding's enforcement.
  • Fully consistent with ADR-0073/0024/0049: attribution still requires a corroborating identifier; this ADR reuses the R9 guard rather than bypassing it. Bare-name brand screening without an identifier produces only labelled unverified candidates.
  • Recall vs. attribution are kept distinct: alias expansion fixes recall (we now find OlyBet's freeze); R9 governs attribution (whether it counts as OB Holding's risk).
  • Honest residual: the €31.6m freeze remains legally on OlyBet Eesti and the €8.4m fine on Olympic Casino Group Baltija — both group-adjacent to, but legally distinct from, the holding. The system presents them as group-chain evidence, never as enforcement against OB Holding itself.

Alternatives considered

  • LLM brand-resolution call to infer aliases. Rejected — violates the honesty principle and ADR-0073 R9 (name/association asserted without a corroborating identifier).
  • Screen a fixed hardcoded alias list per known case. Rejected — not general, not auditable, and stale by construction. Aliases must derive from live registry/graph provenance.

References

ADR-0073 (R9 name-collision guard — the guard this ADR reconciles with), ADR-0024 (entity matching / blocking keys), ADR-0049 (registration-number identity mismatch), ADR-0022 (Neo4j knowledge graph — alias edge source), ADR-0077 (multi-provider retrieval — depends on this keystone), ADR-0067 (fail-closed honesty), ADR-0014 (human oversight), ADR-0071 (tipping-off default-deny surface), ADR-0063 (append-only ScreeningResult) / ADR-0064 (audit immutability — provenance retention), osint_agent.py:_safe_adverse_media, osint_post_processing.py:escalate_entity_criminal_investigation, adverse_media_agent.py, research doc docs/research/2026-06-28-ob-holding-reachability-investigation.md.