— Engineering · data architecture

Joining five Norwegian datasets on the organisation number

The Norwegian 9-digit organisasjonsnummer is the cleanest primary key in any business-data system we have ever worked with. Every legal entity in Norway has exactly one. It is checksummed (mod-11), unambiguous, and reused across every public dataset that touches Norwegian business. That makes it tempting to assume the join is trivial. It is not, and the failure modes are the kind that ship a confidently wrong result.

This post walks through how we merge five public Norwegian datasets into a single queryable graph keyed on the org number. We cover the schemas, the refresh cadences, the joins that are not what you think they are, and the half-dozen edge cases that we still see in production.

The five datasets

Each of these is independently published by a Norwegian government or EU body. None of them was designed to be joined to the others.

DatasetPublisherRefresh cadenceVolume (NO entities)
Brønnøysundregistrene (Foretaksregisteret + Enhetsregisteret)Brønnøysund Register CentreReal-time event feed~1.2 million
Aksjonærregisteret (shareholder registry)SkatteetatenAnnual snapshot (April)~330,000 with declared owners
Doffin (public procurement filings)Direktoratet for forvaltning og økonomistyringReal-time, as filings are published~25,000 active notices, 60,000+ historical awards
EU Horizon & EIC R&D grantsEuropean Commission (CORDIS)Monthly batch~3,500 Norwegian recipients across the dataset
Sanctions lists (EU, UN, OFAC)VariousDaily mergeVariable; thousands of natural-person and legal-entity hits

The "obvious" join

The shape an outsider expects is a star schema. Brønnøysund in the middle; every other dataset hangs off it via org number.

[ Aksjonærregisteret ] │ │ orgnr ▼ [ Doffin ] ◀──── [ Brønnøysund ] ────▶ [ Sanctions ] ▲ │ orgnr │ [ EU R&D grants ]

This is correct enough to ship a working v1. It is wrong enough that it loses real signal on companies that matter. Three things make it more complicated:

  1. Aksjonærregisteret references entities that are not in Brønnøysund. About 6% of declared owners are foreign entities or natural persons identified only by birth year and name. The join is on org number where it exists, on a (name, birth_year) tuple otherwise.
  2. Doffin filings list the buyer and supplier with org numbers, but historical filings (pre-2021) list the supplier as free-text. We run a fuzzy backfill that resolves about 84% of legacy filings to org numbers.
  3. EU grant recipients are identified by a "PIC" — Participant Identification Code — not by org number. CORDIS publishes a PIC-to-VAT-number mapping, and Norwegian VAT numbers are derived from org numbers ("NO" + orgnr + "MVA"), so the join is two hops, not one.

The schema we landed on

After three rewrites, the join graph in production looks like this:

create table companies (
  orgnr               char(9) primary key,
  name                text not null,
  status              text not null,            -- active | dissolved | bankrupt
  founded_date        date,
  nace_code           text,
  nace_label          text,
  organisational_form text,
  registered_address  jsonb,
  postal_address      jsonb,
  employees           int,
  is_vat_registered   boolean,
  is_bankrupt         boolean,
  source_data         jsonb,                  -- enriched contacts live here
  last_brreg_event    timestamp,
  layer4_attempted_at timestamp
);

create table roles (
  orgnr               char(9) references companies,
  person_id           text,                    -- natural person hash
  name                text,
  role                text,                    -- CEO | board_member | signatory
  birth_year          int,
  added_at            timestamp,
  removed_at          timestamp
);

create table ownership (
  child_orgnr         char(9) references companies,
  parent_orgnr        char(9),               -- nullable: foreign owner
  parent_name         text,
  parent_birth_year   int,                 -- natural-person owners
  parent_country      text,
  percentage          numeric(5,2),
  snapshot_year       int not null,
  primary key (child_orgnr, snapshot_year, parent_orgnr, parent_name)
);

create table contracts (
  notice_id           text primary key,
  buyer_orgnr         char(9),
  supplier_orgnr      char(9),
  supplier_name_raw   text,              -- pre-2021 free-text
  contract_title      text,
  contract_value_nok  bigint,
  awarded_at          date,
  source              text not null      -- doffin | ted
);

create table eu_grants (
  grant_id            text primary key,
  recipient_orgnr     char(9),
  recipient_pic       text,
  recipient_vat       text,
  programme           text,              -- horizon_europe | eic | etc
  topic               text,
  awarded_eur         bigint,
  start_date          date,
  end_date            date
);

create table sanctions_hits (
  hit_id              serial primary key,
  target_orgnr        char(9),
  target_name         text,
  target_birth_year   int,
  list                text,              -- eu | un | ofac | global_affairs_canada
  list_added_at       date,
  reason              text
);

The two columns that get a lot of mileage are parent_birth_year in ownership and target_birth_year in sanctions_hits. They are how we identify natural persons across datasets without storing fødselsnummer (Norwegian personal IDs), which are PII and we deliberately do not touch.

Refresh cadences and how they collide

The five sources do not refresh on a common heartbeat. The trick to running the join sanely is to be explicit about the staleness of every field at read time.

Brønnøysund
< 5 min
event feed
Doffin
< 15 min
RSS poll
Sanctions
daily
merged at 04:00 CET
EU grants
monthly
CORDIS batch
Aksjonær
annual
released in April

The annual cadence on Aksjonærregisteret is the painful one. A company can change ownership in May and the registry will not reflect that until April of next year. We tag every shareholder edge with the snapshot_year it was sourced from and surface that in the API response. A user who joins our owner data into a KYB workflow must understand that the answer can be up to 11 months stale through no fault of ours.

Edge cases that bit us

Org-number reuse on takeover

When a Norwegian company is acquired and the legal entity merges into another, Brønnøysund can assign the surviving entity the same org number under the new name. We hit this once with a small consultancy that had been our test fixture for a month. The day it was acquired, every test silently produced different name + same orgnr. We now key our internal cache on (orgnr, last_brreg_event_id) and log a warning when the name attached to an orgnr changes.

Doffin's supplier_orgnr is sometimes the wrong entity

Procurement filings include a "supplier_organisation" field. About 1.4% of the time, what is listed there is not the entity that actually won the contract but a parent group or a partner that submitted the bid jointly. We do not try to fix this; we surface the raw field plus a flag supplier_uncertain when the supplier appears to differ from the awarded entity in the filing's text body.

Foreign-owned subsidiaries with no Norwegian parent in Aksjonær

The shareholder snapshot only includes Norwegian entities and natural persons. A wholly foreign-owned Norwegian AS will show up as having no declared owners in the registry, which is technically correct but practically misleading. We surface a has_declared_owners boolean separately so consumers can distinguish "no Norwegian owners" from "no owners declared at all".

Bankrupt vs dissolved vs slettet

Norwegian status terminology has three states that map confusingly to English:

We normalise these into status: active | bankrupt | dissolved with a dissolution_reason sub-field. Consumers asking "is this company safe to invoice?" want to know all three at once.

The mod-11 check digit

A "9-digit org number" is actually 8 random digits plus a mod-11 checksum. Free-text input fields that lose a digit are common. We always run the checksum on input and reject before issuing a join query, which catches about 1 in 400 user inputs that would otherwise return cleanly-looking but wrong data.

function isValidOrgnr(s: string): boolean {
  if (!/^\d{9}$/.test(s)) return false;
  const weights = [3, 2, 7, 6, 5, 4, 3, 2];
  const digits = s.split("").map(Number);
  const sum = digits.slice(0, 8).reduce((acc, d, i) => acc + d * weights[i], 0);
  const check = (11 - (sum % 11)) % 11;
  return check < 10 && check === digits[8];
}

What the join unlocks

Most of the engineering work pays off when you can answer questions that none of the source datasets can answer alone:

None of these queries take more than 100ms in production. The slow part is always the join planning when the user query touches all five datasets at once.

What we got wrong the first three times

If you are building something similar

Three things we wish we had known on day one:

  1. Subscribe to Brønnøysund's event feed from day one. Polling endpoints with a cron job means you find out about status changes hours later than you need to. The event feed is published as Atom/RSS and is approximately real-time.
  2. The Aksjonær annual snapshot is the single biggest constraint on freshness. Surface its snapshot year in every shareholder response. Consumers will understand "this was correct as of April 2025"; they will not forgive silent staleness.
  3. The PIC-VAT-orgnr chain for EU grants is not documented in one place. We assembled it from CORDIS schema docs, Skatteetaten's VAT rules, and three iterations of test data. The mapping is mostly stable but the corner cases (foreign branches, sole proprietors with a number-derived rather than register-derived VAT ID) account for almost all the errors we ever ship.

Everything described in this post is what powers GET /companies/:orgnr on our API. The full schema is documented at /docs. The benchmark comparing what's available in our join versus what other vendors provide is at /coverage.

Try the join on a real Norwegian company

5,000 requests / month, no card. See all five datasets joined in one response.

Open the live lookup →