ADR-0020: EBA Risk Matrix with Weighted-Max Aggregation

Date: 2026-03-31 (date decision was originally made) Status: Accepted Deciders: Adrian Birlogeanu (Soft4U BV), Claude Code Documented retroactively: 2026-04-03

Context

EU regulators (EBA GL/2021/02, 6AMLD, AMLR) mandate that financial institutions perform structured AML/CFT risk assessments using a risk-based approach. The assessment must be systematic, documented, and auditable. Trust Relay's initial risk scoring used a custom ARIA (Automated Risk and Intelligence Assessment) scorer that lacked regulatory alignment -- it did not map to the EBA's prescribed risk dimensions and could not demonstrate compliance during regulatory examination.

A critical design challenge is the aggregation function. A simple weighted average of risk dimensions dilutes critical signals: a company with directors on a PEP list operating from a FATF blacklist country could still score "medium" risk if other dimensions (product type, delivery channel) are clean. Regulators expect that a single critical risk factor elevates the overall assessment, not that it gets averaged away.

Additionally, different financial institutions have different risk appetites and regulatory contexts. A Belgian bank's geographic risk thresholds differ from a Dutch payment service provider's. The scoring system must be configurable per tenant without requiring code changes.

Decision

We adopt the EBA GL/2021/02 standard as the risk assessment framework with the following structure:

5 Risk Dimensions with regulatory-mandated weights:

Customer Risk (30%) -- 3 factors: entity type, ownership complexity, PEP exposure
Geographic Risk (25%) -- 3 factors: country of incorporation, operational jurisdictions, counterparty locations
Product/Service Risk (20%) -- 3 factors: product complexity, cash intensity, anonymity potential
Transaction Risk (15%) -- 3 factors: volume patterns, cross-border activity, unusual structures
Delivery Channel Risk (10%) -- 3 factors: face-to-face vs. remote, intermediary involvement, technology risk

Weighted-max aggregation with critical dimension floor boost:

Each dimension produces a score (0-100) from its 3 constituent factors
The overall score is computed as: max(weighted_average, highest_dimension_score * floor_boost_factor)
If any single dimension scores CRITICAL (>85), the overall score cannot fall below HIGH (70) regardless of other dimensions
This ensures that a FATF blacklist country (Geographic = CRITICAL) always produces at least a HIGH overall risk, even if all other dimensions are LOW

Config-driven thresholds stored in risk_configurations table:

Each tenant can customize dimension weights, factor scoring rules, and threshold boundaries
Default configuration ships as JSON reference data files
Changes are audited in risk_config_audit table with before/after snapshots

EU AI Act Art. 12 compliance:

SHA-256 hashes computed on both input data and output scores
Hash pair stored alongside the risk assessment for reproducibility proof
Given the same input hash, the system must produce the same output hash (determinism guarantee)

Consequences

Positive

Direct regulatory alignment with EBA GL/2021/02 -- assessment structure maps 1:1 to the guideline's risk categories
Weighted-max aggregation prevents critical risk signals from being diluted by clean dimensions
Config-driven design allows tenant customization without code changes -- a bank can adjust geographic risk weights for their specific corridor exposure
SHA-256 determinism proof satisfies EU AI Act Art. 12 automatic logging requirements
Replaced ARIA scorer with a single, standards-based implementation -- reduced scoring code complexity

Negative

Weighted-max aggregation is more conservative than pure weighted average -- some cases will score higher than they would under a simple average, potentially increasing false-positive rates
15 factors across 5 dimensions require comprehensive input data -- missing data for any factor must be handled (currently defaults to medium risk, which may over-penalize data-sparse cases)
Config-driven thresholds add operational complexity -- tenants need guidance on what constitutes a safe configuration

Neutral

The 5-dimension structure is fixed by EBA guidelines -- this is not a design choice but a compliance requirement
Risk configurations are versioned -- changing thresholds does not retroactively alter past assessments
The previous ARIA scorer code remains in the codebase but is no longer invoked by the pipeline

Alternatives Considered

Alternative 1: Pure weighted average

Why rejected: A company incorporated in a FATF blacklist country with PEP directors scores Geographic=95 and Customer=90. With clean Product (20), Transaction (15), and Channel (10) dimensions, the weighted average is: (950.25 + 900.30 + 200.20 + 150.15 + 100.10) = 54.0 (MEDIUM). This is a regulatory failure -- such a company must score HIGH or CRITICAL. Weighted-max produces max(54.0, 950.85) = 80.75 (CRITICAL), which is the correct regulatory outcome.

Alternative 2: Machine learning risk models

Why rejected: ML models require labeled training data (historical cases with known outcomes) that Trust Relay does not yet have. More fundamentally, ML models are non-deterministic -- the same input can produce slightly different outputs across model versions or inference runs. EU AI Act Art. 12 requires automatic logging that enables reproducibility. A non-deterministic model cannot satisfy this requirement without additional infrastructure (model versioning, input/output snapshots per inference). The deterministic rule-based approach satisfies Art. 12 by construction.

Alternative 3: Hard-coded factor trees

Why rejected: Hard-coding scoring rules means every tenant customization requires a code change, review, test, and deployment. With 50+ potential tenants, each with different risk appetites and regulatory contexts, this creates an O(N) maintenance burden. The config-driven approach makes customization an operational task rather than a development task.

Context​

Decision​

Consequences​

Positive​

Negative​

Neutral​

Alternatives Considered​

Alternative 1: Pure weighted average​

Alternative 2: Machine learning risk models​

Alternative 3: Hard-coded factor trees​