Verification Tools

Trust Relay includes a suite of verification tools that perform targeted, point-in-time checks during OSINT investigation. They split into two groups by how they are invoked. Three checks always run as part of the workflow's run_verification_checks activity because their compliance value is unconditional: OpenSanctions screening, jurisdiction risk, and email-security DNS posture. Four further checks are registered as LLM-callable tools on the OSINT agent (tool_check_wayback, tool_check_reviews, tool_check_interpol, tool_detect_virtual_office) and are invoked selectively when the investigation context warrants. The verification package also contains supporting modules — gleif.py (LEI lookup), license_registry.py (regulatory-license verification), name_matcher.py (person/company fuzzy matching), screening_derivation.py (director screening-status derivation), and opensanctions_bulk.py (bulk dataset refresh/match).

Each tool is implemented as a module-level async function (e.g. screen_opensanctions(), assess_jurisdiction_risk(), check_email_security()), not as an instantiated tool class. There is no VerificationToolRegistry class; the always-run checks are called directly from app/workflows/activities.py, and the LLM-callable checks are wrapped as tool_* functions in app/agents/osint_agent.py.

The seven core checks each return a VerificationResult:

class VerificationResult(BaseModel):
    tool: str                   # tool identifier, e.g. "open_sanctions"
    status: str                 # "clear", "hit", "warning", "error", "unknown"
    summary: str                # human-readable one-line summary
    details: dict               # structured tool-specific data
    source_url: str | None      # canonical source URL for evidence attribution
    confidence: float           # 0.0–1.0 confidence in the result

The VerificationResult feeds into the confidence scoring engine (Pillar 1) and the findings list presented to the officer.

Always-Run (Deterministic) Tools

These three checks execute for every entity regardless of EVOI score or agent selection, because their cost is low and the compliance value is unconditional. They are called directly from the run_verification_checks activity in app/workflows/activities.py.

1. OpenSanctions

File: backend/app/services/verification/opensanctions.py (entry point screen_opensanctions(); bulk dataset refresh/match in opensanctions_bulk.py)

OpenSanctions maintains a consolidated dataset of sanctioned entities, politically exposed persons (PEPs), and wanted individuals sourced from 300+ official lists worldwide.

Metric	Value
Total entities	~831,000
Sanctioned entities	~71,000
PEPs	~700,000
Wanted persons	~64,000
Data source	Local bulk download (Parquet/JSON), refreshed on schedule
Storage	PostgreSQL with `pg_trgm` extension for fuzzy text search

Matching Strategy

The tool uses a dual matching strategy to balance precision and recall:

Trigram similarity (pg_trgm similarity() function) — catches name variations, transliterations, and misspellings. Threshold: 0.7 similarity score.
Substring containment — detects cases where the query name is contained within a longer entity name (e.g., searching "Bolloré" matches "Bolloré Transport & Logistics SA"). Threshold: 0.8 containment ratio.

A match on either strategy is returned as a hit. Results are ranked by combined score and capped at the top 10 matches to avoid overwhelming the investigation result.

Auto-Refresh Schedule

Dataset	Refresh Frequency
Sanctions lists	Daily (00:00 UTC)
PEPs	Weekly (Sunday 01:00 UTC)

Refresh is handled by a background Temporal schedule activity that downloads the latest bulk export, loads it into a staging table, and atomically swaps it with the live table.

Diacritics Normalization

Before matching, both the query name and stored entity names are normalized using Unicode NFKD decomposition to strip diacritical marks. This ensures "Müller" matches "Muller" and "François" matches "Francois". Director names from NorthData are also normalized before being passed to the tool, enabling fuzzy matching across transliterated name variants.

2. Jurisdiction Risk

File: backend/app/services/verification/jurisdiction_risk.py (entry point assess_jurisdiction_risk())

A static lookup that classifies a country's risk level based on authoritative financial crime watchlists.

List	Source	Update Frequency
FATF Grey List	Financial Action Task Force (FATF)	Quarterly (FATF plenary sessions)
FATF Black List	High-Risk Jurisdictions subject to a Call for Action	Quarterly
EU High-Risk Third Countries	European Commission Delegated Regulation	Periodic Commission updates

Risk classification output:

Classification	Meaning
`LOW`	Not on any watchlist
`MEDIUM`	EU high-risk but not FATF listed
`HIGH`	FATF Grey List
`CRITICAL`	FATF Black List

assess_jurisdiction_risk(country_code) evaluates a single ISO 3166-1 alpha-2 country code and returns the highest applicable classification (Black List > Grey List/EU high-risk). Callers invoke it per country of interest — typically the entity's country of incorporation — and the verification activity aggregates the worst result.

This tool is entirely deterministic — no API calls, no LLM inference. The watchlist data (fatf_black_list, fatf_grey_list, eu_high_risk_third_countries) is loaded from JSON reference-data files in config/reference_data/ via ReferenceDataService, not hardcoded constants.

3. Email Security

File: backend/app/services/verification/email_security.py (entry point check_email_security())

Validates the email security posture of the entity's declared domain by checking three DNS record types using dnspython.

Record	Check	Pass Condition
SPF	`TXT` record at apex domain	Record exists and contains `v=spf1`
DKIM	`TXT` record at `default._domainkey.{domain}`	Record exists and contains `v=DKIM1`
DMARC	`TXT` record at `_dmarc.{domain}`	Record exists and contains `v=DMARC1`

A domain with none of these records is associated with a higher likelihood of domain spoofing and phishing, which is a risk indicator for shell companies and fraudulent operators. A MEDIUM severity finding is generated when all three are missing; LOW when one or two are present.

The tool also checks whether the domain is a free provider (Gmail, Outlook, Yahoo, etc.). A regulated business using a free email domain generates a LOW severity finding.

LLM-Available Tools

These four checks are registered as tools on the OSINT agent (the tool_check_wayback, tool_check_reviews, tool_check_interpol, and tool_detect_virtual_office wrappers in app/agents/osint_agent.py) and are invoked selectively by the LLM based on investigation context. They are more expensive (latency or rate limits) and are reserved for cases where they add clear investigative value.

4. Wayback Machine

File: backend/app/services/verification/wayback_history.py (entry point check_wayback_history())

Queries the Internet Archive CDX (Capture/Display/Access) API to retrieve the capture history of the entity's website.

API: https://web.archive.org/cdx/search/cdx?url={domain}&output=json&limit=100&fl=timestamp,statuscode

Output field	Description
`first_capture`	Date of the earliest recorded capture
`last_capture`	Date of the most recent capture
`capture_count`	Total number of captures in the archive
`domain_age_years`	Derived from first_capture date

A domain registered very recently (< 6 months) with no Wayback captures is a strong indicator of a newly-created entity — a risk pattern in KYB investigations. The tool generates a HIGH severity finding for domains under 3 months old and MEDIUM for domains under 12 months.

The tool is invoked by the LLM agent when the investigation includes a business website and the registry data shows a recently incorporated company.

5. Consumer Reviews

File: backend/app/services/verification/consumer_reviews.py (entry point check_consumer_reviews())

Searches Trustpilot for consumer reviews of the entity to assess public reputation and detect complaint patterns.

API: Trustpilot public search API (https://www.trustpilot.com/search?query={company_name})

Output field	Description
`trustpilot_score`	Star rating (1–5)
`review_count`	Number of reviews
`trust_level`	Trustpilot's own classification (Excellent/Great/Average/Poor/Bad)
`top_complaints`	Common complaint themes extracted from recent reviews

A very low score (< 2.0) or a high volume of fraud-related complaints generates a MEDIUM severity finding. The tool is invoked when the entity operates in a consumer-facing sector (retail, financial services, property).

6. Interpol Red Notices

File: backend/app/services/verification/interpol_notices.py (entry point check_interpol_notices())

Queries the public Interpol API for Red Notices (international wanted persons alerts) matching director names associated with the entity.

API: https://ws-public.interpol.int/notices/v1/red?name={surname}&forename={forename}&resultPerPage=20

Output field	Description
`notices_found`	Number of matching Red Notices
`matched_names`	List of matched person names with notice IDs
`nationalities`	Nationalities listed in matched notices
`charges`	Summary of alleged offences

A Red Notice match generates a CRITICAL severity finding and immediately sets p_critical in the EVOI belief state, triggering full pipeline execution regardless of other EVOI scores.

The tool is invoked when person validation surfaces names of natural persons who are directors or UBOs, particularly for high-risk jurisdictions.

:::caution Interpol public API limitations The public Interpol API returns only unclassified notices and may not reflect the full Interpol database. A "clear" result does not constitute a comprehensive Interpol check. Compliance officers should treat this tool as an initial screening layer. :::

7. Virtual Office Detection

File: backend/app/services/verification/virtual_office.py (entry point detect_virtual_office())

Detects whether the entity's registered address is associated with a virtual office, mail forwarding, or co-working provider — a common characteristic of shell companies.

Detection uses two approaches:

Pattern matching against a curated list of known virtual office and registered agent providers. The list includes major providers (Regus, WeWork, IWG, Spaces, HQ) as well as national registered agent companies for BE, NL, LU, UK, IE, and CY.
Address normalization — the registered address is compared against the provider list after stripping floor/unit references. A match on street name and postal code is sufficient to flag the address.

Finding severity	Condition
HIGH	Address exactly matches a known virtual office provider
MEDIUM	Address pattern matches a co-working provider
LOW	Address is a PO Box or c/o address

The tool is invoked when the entity's registered address is in a jurisdiction known for high shell company density (LU, CY, MT, BVI, Cayman) or when the address matches a co-working space postcode cluster.

How Tools Are Wired In

There is no central tool-registry class. backend/app/services/verification/__init__.py re-exports only VerificationResult, check_licenses, and LicenseCheckResult. The verification checks are wired in at two distinct call sites:

Always-run checks are invoked directly from the run_verification_checks activity in app/workflows/activities.py:

from app.services.verification.opensanctions import screen_opensanctions
from app.services.verification.jurisdiction_risk import assess_jurisdiction_risk
from app.services.verification.email_security import check_email_security

company_result = await screen_opensanctions(company_name, country, entity_type="company")
jur_result = await assess_jurisdiction_risk(country)
email_result = await check_email_security(domain)

LLM-callable checks are exposed as tool_* async wrappers on the OSINT agent in app/agents/osint_agent.py, which the LLM may call when the investigation context warrants:

async def tool_check_wayback(url: str) -> str: ...
async def tool_check_reviews(company_name: str, country: str) -> str: ...
async def tool_check_interpol(person_name: str, nationality: str = "") -> str: ...
async def tool_detect_virtual_office(address: str, company_name: str = "") -> str: ...

Evidence Attribution

Every VerificationResult includes a source_url pointing to the canonical source (e.g., the specific OpenSanctions entity page, the Wayback Machine capture, or the Interpol notice URL). Tool invocations are logged through the @audited_tool decorator to the tool_invocations table (SHA-256 input/output hashes), creating a complete evidence chain for regulatory audit purposes. See Agentic OS Foundation for the tool audit layer.

Always-Run (Deterministic) Tools​

1. OpenSanctions​

Matching Strategy​

Auto-Refresh Schedule​

Diacritics Normalization​

2. Jurisdiction Risk​

3. Email Security​

LLM-Available Tools​

4. Wayback Machine​

5. Consumer Reviews​

6. Interpol Red Notices​

7. Virtual Office Detection​

How Tools Are Wired In​

Evidence Attribution​