Skip to main content

ADR-0049: Cross-reference registration-number identity-mismatch detection

Status: Accepted Date: 2026-04-22 Supersedes: none Superseded by: none

Context

During a 2026-04-22 demo rehearsal on the Czech company EP Infrastructure, a.s. the case was submitted with IČO 04106864 (wrong — off by one digit from the real 02413507). The platform's reaction exposed a sharp gap in the cross-reference layer:

  • ARES returned NOT_FOUND for 04106864 → critical finding.
  • NorthData and GLEIF name-matched the entity independently and returned registration number 02413507 → both labelled "verified".
  • The platform surfaced both findings without ever comparing the two IDs or telling the officer the submitted identity was wrong.
  • Downstream findings cascaded as misleading consequences: no Sbírka listin filings (symptom, not gap), VIES invalid (symptom, not compliance risk), company_status=not_found, null legal_form, null NACE.

The officer saw seven findings and had to reverse-engineer the root cause (mistyped IČO) from contradictions between them. That is exactly the anti-pattern a KYB/KYC automation layer is supposed to eliminate — "here are seven symptoms, you figure out which is primary" is worse than no automation.

Decision

Extend the cross_reference_service with a registration-number comparator that runs before all other field comparisons. Its output is a dedicated identity_mismatch finding — not mixed into the generic field-discrepancy stream — because the identity layer is causally upstream of every other comparison: if the registration numbers disagree, every other field comparison becomes meaningless symptom-noise.

Severity rules

StateSeverityRationale
All sources agree on the same registration numbercorroboratedThe happy path.
Submitted value differs from ≥2 authoritative secondaries that agree with each otherHIGH — identity_mismatchNear-certain input error. Surface the alternative value as a suggested correction so the officer can act in one click.
Secondaries disagree internally, no submittedMEDIUMData-quality signal; not necessarily an error.
Only one source has a valuesilentNo cross-reference possible.

Authoritativeness ordering

Primary: the country-specific registry (ARES for CZ, KBO for BE, INPI for FR, CVR for DK, …). Secondaries: NorthData, GLEIF, VIES, regional gazettes — any source that returns a canonical registration number.

The comparator uses two agreeing secondaries as the threshold because a single secondary disagreeing with the primary could be a secondary-source stale-cache artefact; two agreeing against the primary is a stronger signal of input error.

Architectural constraints

Runs first. Placed at the very start of the cross-reference pipeline so that downstream field comparisons inherit the "we already know the identity is suspect" context. Without this ordering, the symptomatic findings fire before the root-cause finding — the exact inverted ordering that confused the EP Infrastructure officer.

Auto-suggested correction. When the HIGH branch fires, the finding includes an alternative_value field that the dashboard renders as a one-click "accept suggested correction" button. The pipeline then re-runs the investigation with the corrected ID.

Does not auto-correct. Even with two agreeing secondaries, the platform does not silently overwrite the submitted value. An officer makes the call and owns the audit trail — EU AI Act Art. 14 (human oversight).

Consequences

  • Root-cause-first finding order. Identity mismatch fires as a critical-tier finding before symptomatic findings so the officer sees cause before effects.
  • Downstream suppression (optional future work). When identity_mismatch fires, symptomatic findings could be grouped or deprioritized to reduce noise. Deferred — current scope is surfacing the root cause, not hiding the symptoms.
  • Cross-country applicability. The comparator is country-agnostic — registration numbers are typed strings across all 12+ supported jurisdictions; the comparator only needs to know which sources are primary vs. secondary.
  • Integrates with the chatbot. The chatbot can now surface "did you mean IČO 02413507?" directly in the conversation when an officer asks about a NOT_FOUND case.

Alternatives considered

Auto-correct silently when ≥2 secondaries agree. Rejected. Violates EU AI Act Art. 14 human oversight. Also silently masks the possibility that the officer intentionally submitted a specific ID (e.g., a known-wrong ID they want to investigate).

Treat registration-number differences as ordinary field discrepancies. Rejected — that's the current behaviour and it's exactly what broke on EP Infrastructure. Identity is structurally upstream of field-level comparisons.

Only compare against the primary registry. Rejected. If the primary returns NOT_FOUND (as it did here), there's no canonical value to compare against. Secondary-pair consensus is what salvages this case.

References

  • EU AI Act Art. 14 — human oversight
  • AMLR Art. 11 — customer identification & verification obligation
  • Related: ADR-0024 (entity matching / blocking keys), ADR-0043 (cross-country registry parity), ADR-0045 (sanctions FP suppression — complementary "symptom-from-cause" separation)
  • Incident-driven: EP Infrastructure, a.s. demo rehearsal 2026-04-22