Joining five Norwegian datasets on the organisation number
The Norwegian 9-digit organisasjonsnummer is the cleanest primary key in any business-data system we have ever worked with. Every legal entity in Norway has exactly one. It is checksummed (mod-11), unambiguous, and reused across every public dataset that touches Norwegian business. That makes it tempting to assume the join is trivial. It is not, and the failure modes are the kind that ship a confidently wrong result.
This post walks through how we merge five public Norwegian datasets into a single queryable graph keyed on the org number. We cover the schemas, the refresh cadences, the joins that are not what you think they are, and the half-dozen edge cases that we still see in production.
The five datasets
Each of these is independently published by a Norwegian government or EU body. None of them was designed to be joined to the others.
| Dataset | Publisher | Refresh cadence | Volume (NO entities) |
|---|---|---|---|
| Brønnøysundregistrene (Foretaksregisteret + Enhetsregisteret) | Brønnøysund Register Centre | Real-time event feed | ~1.2 million |
| Aksjonærregisteret (shareholder registry) | Skatteetaten | Annual snapshot (April) | ~330,000 with declared owners |
| Doffin (public procurement filings) | Direktoratet for forvaltning og økonomistyring | Real-time, as filings are published | ~25,000 active notices, 60,000+ historical awards |
| EU Horizon & EIC R&D grants | European Commission (CORDIS) | Monthly batch | ~3,500 Norwegian recipients across the dataset |
| Sanctions lists (EU, UN, OFAC) | Various | Daily merge | Variable; thousands of natural-person and legal-entity hits |
The "obvious" join
The shape an outsider expects is a star schema. Brønnøysund in the middle; every other dataset hangs off it via org number.
This is correct enough to ship a working v1. It is wrong enough that it loses real signal on companies that matter. Three things make it more complicated:
- Aksjonærregisteret references entities that are not in Brønnøysund. About 6% of declared owners are foreign entities or natural persons identified only by birth year and name. The join is on org number where it exists, on a (name, birth_year) tuple otherwise.
- Doffin filings list the buyer and supplier with org numbers, but historical filings (pre-2021) list the supplier as free-text. We run a fuzzy backfill that resolves about 84% of legacy filings to org numbers.
- EU grant recipients are identified by a "PIC" — Participant Identification Code — not by org number. CORDIS publishes a PIC-to-VAT-number mapping, and Norwegian VAT numbers are derived from org numbers ("NO" + orgnr + "MVA"), so the join is two hops, not one.
The schema we landed on
After three rewrites, the join graph in production looks like this:
create table companies (
orgnr char(9) primary key,
name text not null,
status text not null, -- active | dissolved | bankrupt
founded_date date,
nace_code text,
nace_label text,
organisational_form text,
registered_address jsonb,
postal_address jsonb,
employees int,
is_vat_registered boolean,
is_bankrupt boolean,
source_data jsonb, -- enriched contacts live here
last_brreg_event timestamp,
layer4_attempted_at timestamp
);
create table roles (
orgnr char(9) references companies,
person_id text, -- natural person hash
name text,
role text, -- CEO | board_member | signatory
birth_year int,
added_at timestamp,
removed_at timestamp
);
create table ownership (
child_orgnr char(9) references companies,
parent_orgnr char(9), -- nullable: foreign owner
parent_name text,
parent_birth_year int, -- natural-person owners
parent_country text,
percentage numeric(5,2),
snapshot_year int not null,
primary key (child_orgnr, snapshot_year, parent_orgnr, parent_name)
);
create table contracts (
notice_id text primary key,
buyer_orgnr char(9),
supplier_orgnr char(9),
supplier_name_raw text, -- pre-2021 free-text
contract_title text,
contract_value_nok bigint,
awarded_at date,
source text not null -- doffin | ted
);
create table eu_grants (
grant_id text primary key,
recipient_orgnr char(9),
recipient_pic text,
recipient_vat text,
programme text, -- horizon_europe | eic | etc
topic text,
awarded_eur bigint,
start_date date,
end_date date
);
create table sanctions_hits (
hit_id serial primary key,
target_orgnr char(9),
target_name text,
target_birth_year int,
list text, -- eu | un | ofac | global_affairs_canada
list_added_at date,
reason text
);
The two columns that get a lot of mileage are parent_birth_year in ownership and target_birth_year in sanctions_hits. They are how we identify natural persons across datasets without storing fødselsnummer (Norwegian personal IDs), which are PII and we deliberately do not touch.
Refresh cadences and how they collide
The five sources do not refresh on a common heartbeat. The trick to running the join sanely is to be explicit about the staleness of every field at read time.
The annual cadence on Aksjonærregisteret is the painful one. A company can change ownership in May and the registry will not reflect that until April of next year. We tag every shareholder edge with the snapshot_year it was sourced from and surface that in the API response. A user who joins our owner data into a KYB workflow must understand that the answer can be up to 11 months stale through no fault of ours.
Edge cases that bit us
Org-number reuse on takeover
When a Norwegian company is acquired and the legal entity merges into another, Brønnøysund can assign the surviving entity the same org number under the new name. We hit this once with a small consultancy that had been our test fixture for a month. The day it was acquired, every test silently produced different name + same orgnr. We now key our internal cache on (orgnr, last_brreg_event_id) and log a warning when the name attached to an orgnr changes.
Doffin's supplier_orgnr is sometimes the wrong entity
Procurement filings include a "supplier_organisation" field. About 1.4% of the time, what is listed there is not the entity that actually won the contract but a parent group or a partner that submitted the bid jointly. We do not try to fix this; we surface the raw field plus a flag supplier_uncertain when the supplier appears to differ from the awarded entity in the filing's text body.
Foreign-owned subsidiaries with no Norwegian parent in Aksjonær
The shareholder snapshot only includes Norwegian entities and natural persons. A wholly foreign-owned Norwegian AS will show up as having no declared owners in the registry, which is technically correct but practically misleading. We surface a has_declared_owners boolean separately so consumers can distinguish "no Norwegian owners" from "no owners declared at all".
Bankrupt vs dissolved vs slettet
Norwegian status terminology has three states that map confusingly to English:
- Konkurs — bankruptcy in progress. Legal entity still exists.
- Slettet etter konkurs — dissolved post-bankruptcy. Entity legally gone.
- Slettet etter avvikling — dissolved after voluntary wind-down.
We normalise these into status: active | bankrupt | dissolved with a dissolution_reason sub-field. Consumers asking "is this company safe to invoice?" want to know all three at once.
The mod-11 check digit
A "9-digit org number" is actually 8 random digits plus a mod-11 checksum. Free-text input fields that lose a digit are common. We always run the checksum on input and reject before issuing a join query, which catches about 1 in 400 user inputs that would otherwise return cleanly-looking but wrong data.
function isValidOrgnr(s: string): boolean {
if (!/^\d{9}$/.test(s)) return false;
const weights = [3, 2, 7, 6, 5, 4, 3, 2];
const digits = s.split("").map(Number);
const sum = digits.slice(0, 8).reduce((acc, d, i) => acc + d * weights[i], 0);
const check = (11 - (sum % 11)) % 11;
return check < 10 && check === digits[8];
}
What the join unlocks
Most of the engineering work pays off when you can answer questions that none of the source datasets can answer alone:
- "Which Norwegian consultancies have won more than NOK 10M in public contracts in 2025 and are not on any sanctions list?" — that is a four-way join across Brønnøysund (NACE filter), Doffin (contract value), sanctions, and Aksjonærregisteret (filter by Norwegian-owned).
- "Show me every Norwegian recipient of Horizon Europe grants in clean-tech, and their current revenue and headcount." — Brønnøysund + financial filings + EU grants joined on org number via the PIC-VAT-orgnr chain.
- "Which startups received their first public contract in 2024?" — Doffin filtered by year, joined to Brønnøysund founding date, joined to Aksjonær to confirm independent ownership.
None of these queries take more than 100ms in production. The slow part is always the join planning when the user query touches all five datasets at once.
What we got wrong the first three times
- Wide tables with NULLs everywhere. The v1 schema crammed every dataset into a single denormalised
companiestable with columns likedoffin_total_value. It was unbearable to maintain and the staleness of each column was hidden from the consumer. We moved to one table per source with explicit join paths. - Treating org number as the only key. The reality is that 8% of the relationships we care about (foreign owners, natural-person shareholders, free-text supplier names) cannot be joined by org number. We added secondary keys (PIC, VAT, name+birth_year) and now treat the org-number join as the primary path with documented fallbacks.
- Aggressive caching. Caching Brønnøysund responses for 24 hours felt fine until a bankruptcy filing didn't show up in the API for half a day. We now invalidate per-orgnr cache entries on the matching event-feed signal, and the rest of the system inherits the freshness.
If you are building something similar
Three things we wish we had known on day one:
- Subscribe to Brønnøysund's event feed from day one. Polling endpoints with a cron job means you find out about status changes hours later than you need to. The event feed is published as Atom/RSS and is approximately real-time.
- The Aksjonær annual snapshot is the single biggest constraint on freshness. Surface its snapshot year in every shareholder response. Consumers will understand "this was correct as of April 2025"; they will not forgive silent staleness.
- The PIC-VAT-orgnr chain for EU grants is not documented in one place. We assembled it from CORDIS schema docs, Skatteetaten's VAT rules, and three iterations of test data. The mapping is mostly stable but the corner cases (foreign branches, sole proprietors with a number-derived rather than register-derived VAT ID) account for almost all the errors we ever ship.
Everything described in this post is what powers GET /companies/:orgnr on our API. The full schema is documented at /docs. The benchmark comparing what's available in our join versus what other vendors provide is at /coverage.
Try the join on a real Norwegian company
5,000 requests / month, no card. See all five datasets joined in one response.
Open the live lookup →