Tiered Scanning Architecture

TrustRelay implements a four-tier scanning model that balances investigation depth against cost, latency, and LLM usage. Each tier builds on the previous one, and entities can be escalated from lighter tiers to deeper tiers based on risk signals. The architecture is designed for three entry points: investigation-time discovery, portfolio batch scanning, and continuous monitoring (deferred).

Four-Tier Model

Tier	Name	Data Sources	LLM Calls	Latency	Cost/Entity	Risk Output
0	E-VAL	Neo4j graph only	0	< 100ms	$0.00	`eval_score` (0.0--1.0)
1	Lightweight Scan	KBO + PEPPOL + sanctions list	0	2--8s	~$0.01	green / amber / red + flags
2	Standard Scan	Tier 1 + adverse media + LLM synthesis	1--2	10--20s	~$0.05	green / amber / red + narrative
3	Full Investigation	Existing OSINT pipeline via Temporal	5+	60--160s	~$0.50	Full investigation report

Tier 0: E-VAL (Graph-Only Risk Scoring)

Pure graph traversal. Answers the question: "How suspicious is this entity based on what we already know?" Computes a score from six weighted signals without any external API calls or LLM invocations. Designed for bulk screening of thousands of entities in seconds.

Source: app/services/graph_service.py -- compute_eval_score()

Tier 1: Lightweight Scan

Queries three external data sources with zero LLM involvement:

KBO/BCE Registry -- company status, legal name, NACE codes, directors, establishments
PEPPOL Verification -- directory registration, inhoudingsplicht (tax/social debt)
Sanctions Screening -- exact match (normalized string equality) + fuzzy match (Jaro-Winkler similarity >= 0.80)

Produces a ScanResult with structured flags (SANCTIONS_HIT, WITHHOLDING_OBLIGATIONS, COMPANY_INACTIVE, etc.) and a risk tier determination. Results are persisted as ScanResult nodes in the knowledge graph.

Source: app/agents/scan_agent.py -- ScanAgent.tier1_scan()

Tier 2: Standard Scan

Extends Tier 1 with two additional capabilities:

Adverse Media Search -- searches news and media sources for negative coverage of the entity
LLM Synthesis -- a PydanticAI agent produces a human-readable risk narrative from the combined Tier 1 + adverse media data
Sanctions LLM Resolution -- ambiguous fuzzy matches (similarity 0.80--0.95) are sent to an LLM resolver agent to confirm or reject

The LLM synthesis call is the primary cost driver. Adverse media hits that return results add ~$0.03 to the per-entity cost.

Source: app/agents/scan_agent.py -- ScanAgent.tier2_scan(), app/agents/scan_synthesis_agent.py

Tier 3: Full Investigation

Delegates to the existing Temporal-based OSINT investigation pipeline. Creates a compliance case, starts a ComplianceCaseWorkflow, and runs the full agent DAG (document validation, registry agent, person validation, adverse media, synthesis, MCC classification, task generation). This is the same pipeline used for case-driven investigations.

Tier 3 is triggered via the scan API but executes as a standard compliance workflow. The scan endpoint creates the case in PostgreSQL, starts the Temporal workflow, and returns the workflow_id for status tracking.

Source: app/api/scan.py -- _create_tier3_case()

Tier Escalation Flow

Escalation can be automatic (via RecursiveDiscoveryService configuration) or manual (officer triggers escalation through the API).

E-VAL Formula

The Entity Valuation and Assessment Layer (E-VAL) computes a risk score from six graph-derived signals, each with a configurable weight:

E-VAL = w_shared * shared_connections_score
      + w_degree * graph_degree_score
      + w_inv    * cross_investigation_score
      + w_risk   * country_risk_score
      + w_rel    * relationship_type_score
      - w_prior  * prior_clean_scan_score

Default Weights

Signal	Weight	Description	Normalization
`shared_connections`	0.25	Number of shared graph neighbors with the target entity	`min(count / max(count, 1), 1.0)`
`cross_investigations`	0.25	Number of prior investigations targeting this entity	`min(count / 5, 1.0)`
`country_risk`	0.20	Country-level risk score stored on the entity node	Raw value (0.0--1.0)
`graph_degree`	0.15	Total relationship count in the graph	`min(degree / 10, 1.0)`
`relationship_type`	0.10	Weight based on the type of relationship to the target	Fixed 0.5 (placeholder for future per-type weights)
`prior_scan_recency`	0.05	Subtractive -- reduces score if a recent clean scan exists	1.0 if prior green scan, else 0.0

The final score is clamped to [0.0, 1.0]. An E-VAL score at or above the configurable threshold (default 0.3) triggers a Tier 1 scan in the recursive discovery flow.

Weight Overrides

Weights can be overridden per-call via the weights parameter:

eval_result = await scan_agent.tier0_eval(
    registration_number="0123456789",
    target_reg="0987654321",
    weights={
        "shared_connections": 0.40,
        "country_risk": 0.30,
        "graph_degree": 0.10,
        "cross_investigations": 0.10,
        "relationship_type": 0.05,
        "prior_scan_recency": 0.05,
    },
)

Source: app/services/graph_service.py -- _DEFAULT_EVAL_WEIGHTS, compute_eval_score()

ScanResult Data Model

The ScanResult model captures the output of Tier 1 and Tier 2 scans. All fields are populated progressively -- Tier 1 fills the registry and sanctions fields, Tier 2 adds adverse media and synthesis.

class ScanResult(BaseModel):
    # Identity
    scan_id: str                        # e.g. "scan-0123456789-t1-20260227143000"
    registration_number: str            # Belgian enterprise number
    tier: int                           # 0, 1, 2, or 3
    risk_tier: str = "green"            # "green" | "amber" | "red"
    confidence: float                   # 0.0-1.0 (0.8 with KBO data, 0.3 without)

    # Tier 0
    eval_score: float = 0.0             # E-VAL graph-based risk score

    # Tier 1: Registry data
    company_status: str = ""            # e.g. "active", "ceased", "bankrupt"
    legal_name: str = ""                # Official company name from KBO
    nace_codes: list[str] = []          # NACE activity codes
    director_count: int = 0             # Number of directors found
    ubo_count: int = 0                  # Number of UBOs found
    sanctions_exact_matches: int = 0    # Normalized string equality matches
    sanctions_fuzzy_matches: int = 0    # Jaro-Winkler >= 0.80 matches
    peppol_registered: bool = False     # In PEPPOL e-invoicing directory
    withholding_obligations: bool = False  # Tax or social debt detected
    tax_debt_detected: bool = False     # Belgian tax authority debt
    social_debt_detected: bool = False  # Belgian social security debt

    # Tier 2: Media + synthesis
    adverse_media_hits: int = 0         # Count of adverse signals
    adverse_media_summary: str = ""     # Human-readable summary
    synthesis_summary: str = ""         # LLM-generated risk narrative

    # Metadata
    flags: list[str] = []               # Machine-readable flags
    scan_cost_cents: int = 0            # Actual cost in cents
    scanned_at: datetime | None = None  # Scan timestamp (UTC)
    cached: bool = False                # True if served from graph cache

Possible Flags

Flag	Trigger	Tier
`SANCTIONS_HIT`	Exact sanctions match found	1+
`SANCTIONS_FUZZY`	Fuzzy sanctions match (no exact)	1+
`WITHHOLDING_OBLIGATIONS`	Tax or social debt via PEPPOL inhoudingsplicht	1+
`COMPANY_INACTIVE`	Company status is ceased/dissolved/bankrupt	1+
`KBO_UNAVAILABLE`	KBO lookup failed or returned null	1+
`PEPPOL_UNAVAILABLE`	PEPPOL verification failed	1+
`ADVERSE_MEDIA_FOUND`	Adverse media hits > 0	2
`ADVERSE_MEDIA_UNAVAILABLE`	Adverse media search failed	2

Source: app/models/scan.py

Graph Integration

Scan results are stored as nodes in the Neo4j knowledge graph, creating a persistent, queryable record of every scan ever performed.

Graph Schema

Node types:

Node	Primary Key	Created By
`ScanResult`	`scan_id`	`ScanAgent.tier0_eval()`, `tier1_scan()`, `tier2_scan()`
`Company`	`registration_number`	`MERGE` on first scan
`Portfolio`	`portfolio_id`	`PortfolioService.batch_scan()`

Relationship types:

Relationship	From	To	Meaning
`SCANNED`	`ScanResult`	`Company`	This scan result assessed this company
`CONTAINS`	`Portfolio`	`Company`	This company is in this portfolio

Cache via Graph

Tier 1 scans check the graph for a recent, non-stale ScanResult before performing a fresh scan. The cache check queries:

MATCH (s:ScanResult)-[:SCANNED]->(c:Company {registration_number: $reg})
WHERE s.tier >= $min_tier
  AND (s.stale_after IS NULL OR s.stale_after > datetime())
RETURN properties(s)
ORDER BY s.scanned_at DESC LIMIT 1

A cache hit returns cached: true in the result. The force parameter on tier1_scan() bypasses the cache.

Source: app/services/graph_service.py -- upsert_scan_result(), get_latest_scan_result(), get_scan_results()

Portfolio Batch Scanning

The PortfolioService orchestrates batch Tier 1 scans across a portfolio of entities with rate-limited concurrency.

Architecture

Concurrency Control

The service uses an asyncio.Semaphore with a configurable limit (default 20) to prevent overwhelming external APIs:

self._semaphore = asyncio.Semaphore(max_concurrent)

async def _scan_one(self, registration_number: str) -> dict | None:
    async with self._semaphore:
        result = await self._scan_agent.tier1_scan(registration_number)
        return result.model_dump()

All scans run via asyncio.gather(), so a portfolio of 100 entities with max_concurrent=20 processes in 5 waves.

Response Format

{
    "portfolio_id": "portfolio-a1b2c3d4e5f6",
    "portfolio_name": "Q1 2026 Merchant Portfolio",
    "total_entities": 100,
    "scanned": 97,
    "failed": 3,
    "summary": {"green": 82, "amber": 12, "red": 3},
    "results": [...]
}

Source: app/services/portfolio_service.py, app/api/scan.py -- POST /scan/portfolio

Recursive Discovery

After a full OSINT investigation completes (Tier 3), the RecursiveDiscoveryService traverses the knowledge graph to find connected entities, scores them with E-VAL, and auto-scans the highest-risk ones.

Configuration

Parameter	Default	Description
`max_depth`	3	Maximum graph traversal depth
`max_entities_per_level`	10	Top K entities scanned per hop distance
`eval_threshold`	0.3	Minimum E-VAL score to trigger Tier 1 scan
`auto_escalate_to_tier2`	`False`	Auto-escalate amber results to Tier 2
`auto_escalate_to_tier3`	`False`	Auto-escalate red results to Tier 3

Source: app/services/recursive_discovery_service.py

Three Entry Points

1. Investigation-Time Discovery

Triggered after a Tier 3 full investigation completes. The recursive discovery engine finds connected entities in the graph and scans the highest-risk ones automatically.

Flow: Investigation completes --> graph populated with entities/relationships --> RecursiveDiscoveryService.discover_and_scan() --> Tier 1 scans + optional auto-escalation

2. Portfolio Batch Scanning

Triggered by a compliance officer uploading a list of entity registration numbers. The portfolio service runs rate-limited parallel Tier 1 scans and produces an aggregate risk summary.

Flow: Officer submits portfolio --> POST /scan/portfolio --> PortfolioService.batch_scan() --> per-entity Tier 1 scans --> aggregate results

3. Continuous Monitoring

A Temporal scheduled-workflow continuous-monitoring capability now exists (app/workflows/continuous_monitoring.py + app/services/monitoring_schedule_service.py). It runs as a per-tenant cron-triggered Temporal Schedule (monitoring-{tenant_id}) that executes configured monitoring checks. As built, the workflow is scoped to monitoring active customs cases (Module C) rather than re-running tiered Tier 0–2 scans over arbitrary graph entities. Periodic re-scanning of graph entities for new sanctions hits / adverse media via the tiered scan engine remains a future extension on top of this scheduling foundation.

Sanctions Matching Tiers

Sanctions screening uses a three-stage matching pipeline that progressively increases confidence:

Stage Details

Stage	Method	When	LLM?	Confidence
Exact	Normalized string equality lookup against an in-memory index	Tier 1+	No	1.0 (definitive)
Fuzzy	Jaro-Winkler similarity >= 0.80 threshold	Tier 1+	No	0.80--1.0 (similarity score)
LLM Resolution	PydanticAI agent analyzes ambiguous matches (similarity 0.80--0.95)	Tier 2 only	Yes	Agent confidence score

The LLM resolver is intentionally deferred to Tier 2. Tier 1 scans report fuzzy matches as SANCTIONS_FUZZY flags (amber risk tier) without LLM confirmation, keeping Tier 1 at zero LLM calls. When an entity is escalated to Tier 2, the resolver agent re-evaluates each ambiguous match and either confirms (llm_confirmed) or rejects it.

Source: app/services/sanctions_matcher_service.py

API Endpoints

Method	Path	Description
`POST`	`/scan/entity/{reg_number}`	Scan entity at specified tier (0--3)
`GET`	`/scan/entity/{reg_number}/results`	Get scan history for an entity
`POST`	`/scan/entity/{reg_number}/escalate`	Escalate entity to a higher tier
`POST`	`/scan/portfolio`	Batch scan a portfolio of entities
`GET`	`/scan/portfolio/{portfolio_id}/results`	Get results for a portfolio scan

Source: app/api/scan.py

Four-Tier Model​

Tier 0: E-VAL (Graph-Only Risk Scoring)​

Tier 1: Lightweight Scan​

Tier 2: Standard Scan​

Tier 3: Full Investigation​

Tier Escalation Flow​

E-VAL Formula​

Default Weights​

Weight Overrides​

ScanResult Data Model​

Possible Flags​

Graph Integration​

Graph Schema​

Cache via Graph​

Portfolio Batch Scanning​

Architecture​

Concurrency Control​

Response Format​

Recursive Discovery​

Configuration​

Three Entry Points​

1. Investigation-Time Discovery​

2. Portfolio Batch Scanning​

3. Continuous Monitoring​

Sanctions Matching Tiers​

Stage Details​

API Endpoints​