ADR-0079 — Estonian company financials via RIK e-Äriregister open data
Status: Accepted
Date: 2026-06-28
Deciders: Compliance engineering (with the OB-Holding reachability investigation oracle)
Context source: docs/research/2026-06-28-ob-holding-reachability-investigation.md
Context
OB Holding 1 OÜ's real FY2024 financials (turnover €42,538,000, net profit €24,601,000, assets €57.45m, equity €42.42m) read as "not assessed" in our output. This is a deliberate non-fetch, not a data absence:
ee_ariregister_service.py:338-341hard-codesfinancials_summary="Financial data not available via Äriregister public API (requires e-äriregister portal access)."on every found-company path. We call only the autocomplete +/company/{reg}/json"card" endpoints (identity/persons only).osint_agent.py:_fetch_country_financials(625-653) has per-country fetchers for NO/RO/SK/NL but no EE branch, so the synthesis fallback can't fill the gap. The country-agnostic NorthData fallback then returns little (NorthData is DACH-centric and excludes EE/LV/LT).
The figures are free and public. The 2026-06-28 investigation verified live (HTTP 200) the RIK avaandmed open-data annual-report element dump (4.2024_aruannete_elemendid…zip, 23.3MB, application/zip, last-modified 2026-06-09), which carries Revenue / Total assets / Equity / Operating-and-period profit per registrikood per year, CC-BY-4.0, no auth. This also addresses ADR-0073 R11 (financials-ingestion gap: a missing financial statement must read as an honest gap, never a benign "no financials").
The proven pattern already exists for other countries: a thin fetch_financials(reg) module (e.g. no_regnskapsregisteret.py) returning financial_utils.build_fhr(snapshots, source=…), wired into the agent's financial_health_report= field and consumed unchanged by the financial-analysis agent (ADR-0048).
Decision
Add Estonian financials as a real, sourced data channel; replace the hard-coded "not available" string with either sourced figures or an honest, structured gap.
-
RIK avaandmed open-data fetcher (primary, FREE). New
app/services/registries/ee_ariregister_financials.py:fetch_financials(reg_code)that:- ingests the per-year "report elements" open-data dumps into a local table keyed by
registrikood+ year (the dumps are 23MB ZIPs — this is an ingest + monthly-refresh job, not a per-request download), recording the dump'slast-modified/ as-of date; - ingests only the financial elements +
registrikoodfrom the "general info" dump — not natural-person director fields — so EE open-data ingestion stays outside the PII regime (app/pii/); CC-BY-4.0 attribution is met by thesource=label; - builds the
report_id → registrikoodjoin, attributes only on exactregistrikoodmatch, and runs a corroboration/sanity check (magnitude cross-check against the NorthData/aggregator figures from #5) before presenting a figure as the subject's — a join-key error would surface a real-but-wrong company's numbers, the financial analog of the R9 name-collision (EU AI Act Art. 15 accuracy); - maps Revenue→
revenue, Total assets→total_assets, Equity→equity, and the named net-profit et-gaap element→profit_loss(pin the exact element with documented semantics — do not conflate operating profit with net profit; the verified OB Holding figure is net profit €24,601,000) per year; - returns
build_fhr(snapshots, source='RIK e-Äriregister Open Data')carrying the as-of date. Mirror the provenno_regnskapsregisteret.pyshape.
- ingests the per-year "report elements" open-data dumps into a local table keyed by
-
Wire EE into the pipeline. Add an EE branch to
osint_agent.py:_fetch_country_financials; replace the hard-codedfinancials_summaryinee_ariregister_service.py:run_ee_agentwith the fetched FHR (or empty + gap). The FHR is consumed unchanged byfinancial_analysis_agent.run_financial_analysis. -
Remove the misleading hard-code (Wave 0). Delete the literal
financials_summary='Financial data not available…'and the early-return blocks that also set it. Distinguish two gap shapes: an unsupported jurisdiction emits the ADR-0080country_capability_gap(FINANCIAL_STATEMENTS, country); a supported jurisdiction with the datum absent (e.g. EE post-launch, FY not present as of the snapshot) emits an ADR-0073 R11financials_ingestion_gap/ ADR-0067 data-gap — not a capability gap (declaring "not assessed for Estonia" when the channel works would understate our coverage to a regulator). Mirror the CZfilings_only→upgrade pattern incz_ares_service.py. -
Optional higher-fidelity tier (RIK SOAP/XBRL). A second function
fetch_financials_soapcallingarireg.majandusaastaAruanneteKirjed_v1(after the…Loetelu_v1list) parsing et-gaap XBRL<vaartus>line items — selected ahead of open data when credentials are configured. This requires a signed RIK agreement (organizational lead-time), gated behind newconfig.pyfields (ariregister_xml_user/password/enabled). Open data (decision #1) is the no-contract fallback. -
NorthData financials parse (corroboration stopgap). NorthData already fetches OB Holding's page; extend extraction to parse the financial block it renders and map to
build_fhr(source='NorthData')— lowest-friction corroboration since the HTTP path already exists. Aggregator-sourced figures are labelled as such (not the official register).
Consequences
- The €42.54m FY2024 turnover (and prior years) surfaces from a free, official source, feeding the existing ratio/distress analysis (ADR-0048) instead of
going_concern="insufficient_data". - ADR-0073 R11 satisfied: financials are now fetched where available, and a genuine absence reads as an audited gap (ADR-0080), never a benign "no financials."
- Honest residuals (must remain gaps, never inferred): (a) a recent FY may be absent because the company has not filed or because our monthly snapshot is stale — we cannot observe filing behaviour, so the gap is surfaced as "not present as of
<snapshot date>" (carrying the avaandmedlast-modifieddate), never asserted as "not filed"; the per-case as-of snapshot persists into the evidence bundle (ADR-0021/0063), not just the live refreshed table; (b) the free open data carries key indicators only — note-level statements need the SOAP/XBRL tier (RIK-agreement-gated); (c) GLEIF/BRIS/OpenCorporates carry no EE financials (identity/existence only); (d) aggregators (Inforegister/Okredo/NorthData) are labelled aggregator-sourced unless fetched from the e-Äriregister filing itself. - New operational surface: the avaandmed ingest needs a scheduled refresh and storage. Budget as M/L, not a trivial fetch.
Addendum — lazy on-demand auto-fetch + parsing correction (2026-06-29)
Two updates from wiring this into the live flow and validating against the real dump.
1. Lazy on-demand ingest (supersedes "scheduled job only" for the default path).
The original decision deferred ingestion to a scheduled bulk job and left
fetch_financials returning {} until that job ran — which meant EE financials
were inert in practice (every live case showed "not assessable"). We now add a
lazy auto-fetch (ee_avaandmed_autofetch.py, default ee_avaandmed_autofetch_enabled=True):
on the first EE financial lookup with no pre-ingested path, the module
downloads the current avaandmed report files once into a local cache and
stream-extracts the company. Design guards:
- URL discovery, not hardcoding — the filenames carry a monthly snapshot
suffix (
…kuni_31052026…), so the live URLs are scraped from the open-data page each refresh; a pinned URL would break monthly. - Cache + TTL (
ee_avaandmed_cache_dir,ee_avaandmed_cache_ttl_days=30), single-flight (asyncio.Lockso concurrent investigations download once), per-company memoisation, stream parsing of the 200–300MB CSVs (bounded memory, early-break on the report-id-sorted element file). - Fail-closed — discovery/download/parse failure →
[](honest gap, not cached, retries next time); a genuinely absent company caches[](fast, honest on repeat). The pre-ingestedee_avaandmed_ingest_pathstill wins when set (scheduled bulk ingest remains supported for high-volume deployments). The scheduled job is now an optional optimisation, not a precondition. - Cost/latency trade: the first EE lookup pays a one-time ~40MB download + a multi-hundred-MB scan (cached after). Acceptable for this workload; a future SQLite index would make repeat per-company lookups instant.
2. Parsing correction — OB Holding's financials ARE in the open data. A mid-
investigation grep '"<report_id>"' (quoted) wrongly returned zero and led to a
brief, wrong conclusion that OB Holding (14975047) was a "PDF-only filer" absent
from the elements dump. Cause: the report_id column is unquoted in data rows
(only text fields are quoted), so the quoted grep never matched; the csv parser
reads it correctly. Verified: report_id 3323168 carries the 20 FY2024 elements —
Revenue €42,538,000, Assets €57,453,000, Equity €42,420,000, net profit
€24,601,000 — matching Inforegister exactly. Lesson (encoded as a ground-truth
test): validate structured data with the parser, never a grep on a quoted field.
Also extended _ELEMENT_FIELD_MAP to the real balance-sheet labels (bare
Varad/Assets → total_assets), which the original idealised map missed.
3. Inforegister.ee fallback + cross-source corroboration. ee_inforegister_financials.py
is a fallback (and second-source check) for the cases avaandmed can't serve —
a company whose report content is not yet in the structured open-data elements
(very recent or micro filings). It reads the same RIK figures that Inforegister
republishes free + login-less as embedded chart JSON (<script type= "application/json"> blocks), parsed per-block by the block's own year header so a
quarterly series can't be mistaken for the annual figure, and forecast columns
("2025*") are excluded. It is fallback-only (used by _fetch_country_financials
when the official avaandmed path returns {}) and labelled aggregator-sourced
(source="Inforegister.ee (aggregator)" + source_url), per consequence (d) —
never presented as the primary registry. Honest residuals: equity is not in the
chart JSON (omitted, not guessed); a slug mismatch 404s to an honest gap (no fuzzy
fetch). Validated live: OB Holding FY2024 revenue €42,538,000 / net profit
€24,601,000 / assets €57,453,000 — identical to avaandmed, giving an independent
cross-source confirmation.
Alternatives considered
- Commercial EE credit bureau (Creditinfo Eesti / Inforegister REST) as primary. Rejected as first move — contract + cost — when the official open data is free and verified. Kept as an optional config-gated channel (Wave 6) if the free path proves insufficient.
- Per-request download+parse of the 23MB ZIP. Rejected — not viable per request; requires the ingest table + refresh job in decision #1.
- Keep the hard-coded string. Rejected — it is actively misleading (asserts unavailability where free data exists) and violates ADR-0067 honesty.
References
ADR-0073 (R11 financials-ingestion gap), ADR-0048 (financial-analysis agent — unchanged consumer), ADR-0068 (country-capability registry), ADR-0080 (FINANCIAL_STATEMENTS capability signal — pairs with this), ADR-0067 (fail-closed honesty), ADR-0021 (evidence bundle) / ADR-0063 (append-only persistence — per-case as-of snapshot retention), ee_ariregister_service.py, no_regnskapsregisteret.py, financial_utils.py, research doc docs/research/2026-06-28-ob-holding-reachability-investigation.md.