Heritage and data spaces: experiment 1 · Plimpton-322 Consultancy

In digital heritage, you occasionally hear the term data space (in Dutch: dataruimte). Over the past year it suddenly started popping up — just enough to make you wonder: what does it actually mean? And more importantly: what can it really do? Is it a new term for policymakers, or is there concrete technology underneath it? Out of curiosity, in late December 2025 I started digging. In a series of experiments I looked at what it is, how it works, and whether I could build a data space myself. You need something to do between the Christmas dinner and the New Year’s doughnuts. And by “build”, I really mean: working code for a real data space transaction. Something you can install yourself. On your own computer.

You can find the first experiment in the P-322 GitHub data space-experiments repository.

This is the first blog post about those experiments. I will do my best to keep it accessible. But: data spaces are still just servers talking to each other. To show how that works, I cannot avoid some technical detail and sharing code. For the more technical readers it may not go deep enough, but for them I’m happy to point to the repository, where you can find all the code and container configuration.

Reading guide — choose your route

This is a long article because I am doing two things at once:

explaining what a data space is, and
showing how a data space actually works, with a concrete experiment.

You do not have to read it all in one sitting. Feel free to choose a route:

🕐 ±5 minutes — Concept & context
What is a data space, why does it show up in heritage policy, and what makes it different from “old wine in new bottles”?
→ Read: [A data space: what is it, really?](#the-data space-and-the-national-digital-heritage-strategy)
🕒 ±15 minutes — Deeper insight without code
A general explanation of the problem a data space tries to solve, plus an overview of concepts, terms, and roles.
→ Read: [Why a data space? And what does it look like?](#why-a-data space-and-what-does-it-look-like)
🧠 Full — The complete experiment including code
All steps, code fragments, and the experimental demo.
→ Read: [Step by step through the data space experiment](#step-by-step-through-the-data space-experiment)

A data space — what is it, really?

The data space and the National Digital Heritage Strategy

A little over a year ago, the dedicated people at the Netwerk Digitaal Erfgoed were working hard on the new National Digital Heritage Strategy for 2025–2028. Around that time, I had a few conversations in which they explained that one important component would be the “data space” (dataruimte). And indeed: “Towards a data space for cultural heritage for the whole kingdom” is the first of the four strategic goals.

From the digital strategy:

"Interoperability and the ability to exchange data effectively is one of the primary objectives of EU data policy. The EU’s 2020 data strategy calls for the development of ‘data spaces’ to make it easier to exchange data within and across sectors. For the cultural sector, the European Commission initiated the common European data space for cultural heritage, where heritage information from all EU member states is made findable and reuseable."

That did not mean much to me until, in October 2025, NDE invited me to speak at the event From data to blueprint. Europeana and the Jewish Heritage Network are working on the [European Memory Data Space](https://blueprint.memorydata space.eu) with the aim of bringing Holocaust-related material together.

Old wine in new bottles?

At that event, I spoke with a specialist from a think tank that, among other things, works on the implementation of data spaces in the Netherlands.

I asked her, honestly.

“Is a data space actually something new? As in real technology? Or did we just give ideas we have been funding for fifteen years a new name for the next generation of funders and policymakers?”

She started with the usual nuance.

“That depends on who you ask. There are multiple sides.”

Then she paused for a moment and said:

“Honestly, for most people it’s mainly policy. The need to exchange data in a manageable way within a chain or network of departments and organisations. If you want that and you’re working on it, then it’s already a data space.”

I frowned.

“By that standard, the Netwerk Digitaal Erfgoed has been a data space for a long time… as has the Colonial Collections consortium. And NWO projects like CLARIAH and ODISSEI.”

She nodded, but immediately added something.

“For the technical people in our community, that often does not go far enough. They say: you only have a data space if you also implement the technology.”

I have to admit that made sense to me. But then: which technology?

A data space between the New Year’s doughnuts

Between Christmas and New Year’s I thought: “Well, if there is technology… could I do it too? Build a data space?” So I took the first steps. You are going to see my learning path here. I will do things that others might look at and think: “But why??” Or: “How can you do that / miss that?”

I am writing this blog to help. Maybe it means you won’t have to make the same mistakes.

A data space at its smallest is two parties that want to exchange data: a provider and a consumer. In NDE terms, that is closest to Bronhouder and Gebruiker/Dienstplatform. The consumer is in any case not an end user, but a party that wants to process the data further via an app, website, or other infrastructure.

Why a data space? And what does it look like?

What is the current situation?

Within NDE, data holders and users already exchange data via the web. Heritage data lives on a data holder’s server and is openly available through an API. Think of that as a web address intended not for humans, but for machines.

The user’s software calls that web API, asks for data, and gets it back. Simple.

That means the data holder has zero control. It is open data, so an open web API. Anyone can ask for data, and everyone gets it.

What problem should a data space solve?

As a data holder/provider, you do not know who is requesting the data and what they will do with it.
And as a user/consumer, you do not know what is in the data.

That is a problem.
Even in an open landscape like heritage.

Over the past year and a half, I have thought multiple times: if only we had a solution for this.

For example, when we started making Colonial Collections data available about objects containing human remains. Ethically, you may not want to show that to the entire world. Sometimes, as a data holder, you want to share certain data openly only with researchers.

Or that time I had a conversation at an NDE hub about publishing data openly — just not for training AI models.

Or the realisation that more control over access is a consequence of a changing and polarising world. We can no longer naïvely make everything available to everyone. There are groups and countries that do not have our best interests at heart. For example, actors who benefit from sabotaging the relationship between the Netherlands and Indonesia. Or extremists who want to misuse information: to promote a particular image of our culture — or to destroy it.

More grip on access to data, with as little damage as possible to the open character of heritage and society.
That is a major challenge.

Maybe a data space is, in fact, about making open infrastructure resilient.

The data space Connector

Just like in the current situation, in a data space the consumer and provider still communicate directly. The basis remains a peer-to-peer system without a central authority that manages, validates, and controls everything.

The difference is that consumer and provider talk through an infrastructure layer: the Eclipse data space Connector. The older technical readers who once programmed Java may frown for a second — because Eclipse was that programming environment, right? Yes: that Eclipse. But from it emerged a French foundation that maintains open-source software.

And that matters: the EDC layer is open and anyone can implement it freely. That means the technology beneath a data space does not belong to an EU institution, a company, or a country. It is also not a short-term solution tied to a temporarily funded project. The EDC is inherently open and, so to speak, here to stay.

EDC handles publishing and discovery, negotiation, evaluation, and the controlled execution of the data transfer between provider and consumer. Without EDC, there is no data space.

The data holder (Provider)

In a data space, the provider registers a dataset (an Asset) in its catalogue. That might sound like the dataset register, except it is not central. It is not a network service. The provider publishes the catalogue itself via its own EDC. It is therefore more comparable to publishing the dataset description — which, according to DERA, also sits with the data holder.

A provider catalogue does not contain the data, and it is not direct access to the web API where the data lives. In a data space catalogue, you register a proposal (an Offer): under these conditions, a consumer may approach me to obtain access to the data.

The provider does this via the Provider Management API of the EDC implementation running on its server.

The catalogue

How does a consumer know which assets are offered within the data space? Because it is a peer-to-peer system, there cannot be a mandatory central truth where “all provider offers” live. That is a deliberate design choice.

A data space has no central authority.
No single point of failure.
No implicit requirement to index everything.

The consumer knows (through configuration, governance, or an onboarding process) certain providers, and maintains its own list of provider endpoints. For each known provider, the consumer requests a catalogue. Gathering the offers from those catalogues is then done locally by the consumer. That gives the consumer maximum sovereignty — but I am afraid it does not scale at all. This may work for pilots, closed consortia, and an experiment like mine, but not for large networks with 100+ providers.

Even though a central index is not required, that does not mean it cannot exist. In practice, in a data space there will be one or more parties offering a federated catalogue. This is the place where, intuitively, the NDE dataset register fits for me.

A federated catalogue is a service that iterates over provider endpoints and fetches the metadata from their catalogues. Like the dataset register, it plays a role in dataset discovery. It does not contain the data itself. It is not an authority. And it does not negotiate between consumer and provider. The federated catalogue sits alongside the data space.

In larger data spaces there will likely be multiple thematic or domain-specific catalogues. In the NDE context, that could be a register for colonial datasets or WWII datasets; or catalogues specifically for libraries, museums, or archives.

Just like within NDE, providers publish dataset descriptions in the form of Offers — only now via their EDC catalogue. In an offer, the provider determines what can be retrieved, by whom, and under which conditions.

User/Service platform (Consumer)

To start a data transaction, the consumer always approaches the provider. It does not matter whether the consumer consulted a federated catalogue first.

To contact a provider, the consumer uses the Consumer Management API of its own EDC server.

Important: every participant in a data space has its own server running an EDC data space connector. Depending on whether you act as provider or consumer, you use the Provider or Consumer Management API of your own EDC. In conversations I met people who thought a broker or intermediary “provides” the EDC. That is not the case. There is no middle party that facilitates the EDC. A data space is still a peer-to-peer network.

The EDCs of participants in the data space communicate using the [data space Protocol (DSP)](https://eclipse-data space-protocol-base.github.io/data spaceProtocol/2025-1-err1/).

The consumer contacts the provider’s catalogue. If there is an interesting offer for a dataset, the consumer asks the provider whether an agreement can be made about the use of that data. In data space terms, a Contract Negotiation is initiated.

The provider can express requirements in such a way that the EDC can automatically evaluate them. That does require a trustworthy party capable of attesting to the consumer’s claims. If, for instance, the provider wants to grant access only to researchers, someone needs to be able to guarantee that the consumer really is a researcher. Behind that sits a complex world of infrastructure and organisational arrangements.

Fortunately, the data space specifications leave open how conditions are verified. The provider can also formulate conditions more generally and then manually verify afterwards whether the consumer complied. That is comparable to licenses such as CC-BY-SA or GPL. In both cases, you accept the conditions attached to the license. Ultimately, if you break those conditions, the other party has grounds to take legal action afterwards.

Once the consumer’s EDC accepts the conditions, the provider’s EDC makes a Contract Agreement available that forms the basis for the Transfer Process. This is the point where a formal agreement becomes operational access.

To redeem access to the data, the Consumer EDC retrieves a key called an Endpoint Data Reference (EDR). This key is linked to a contract agreement, a transfer process, and a specific asset. The key is only valid for a limited time. After it expires, the consumer must start contract negotiations again.

With the EDR, the consumer goes to the provider’s data EDC endpoint. With a valid EDR, the consumer obtains access to the data.

Contract negotiation and data transfer

A complete data space transaction consists of two processes: contract negotiation and the transfer process. In EDC terms, these are the Control Plane and the Data Plane.

The Control Plane establishes agreements and governs. It does not move data. The control plane handles catalogue exchange, contract negotiation, policy evaluation, identity and trust checks, and the initiation and monitoring of transfer processes. The output of the control plane is permission and instruction: who gets access to which data representation, under which conditions. The control plane is about trust, agreements, and legitimacy.

The Data Plane executes the actual data transfer based on an existing agreement. The data plane receives an EDR from the control plane, validates the associated credentials, retrieves the data from the source, and delivers it to the consumer. The data plane does not decide anything about access or policy; it merely executes what has already been contractually allowed. The data plane is about transport, security, and execution.

Step by step through the data space experiment

Data server

In the experiment, I started by building a basic server with an API that exposes data at http://localhost:7070/hello. It is not even 20 lines of code.

import Fastify from "fastify"; const app = Fastify({ logger: true }); app.get("/hello", async () => {  return {    message: "Hello, Dataspace",    ts: new Date().toISOString(),    dataset: [{ id: "a1", title: "Example record", license: "CC0" }],  };}); app.get("/health", async () => ({ ok: true })); const port = Number(process.env.PORT ?? 7070);const host = process.env.HOST ?? "0.0.0.0"; await app.listen({ port, host });

When you call that address, you get back:

{  "message": "Hello, dataspace",  "ts": "2026-01-07T14:47:50.462Z",  "dataset": [    {      "id": "a1",      "title": "Example record",      "license": "CC0"    }  ]}

Not a thrilling dataset, but good enough for now.

Provider and Consumer EDCs

In the current situation, all traffic goes to that basic server to retrieve data. In a data space, we do not want that anymore. In a data space, there is a DSP connection from the edc-consumer to the edc-provider. So we need to set that up.

You can implement EDC from scratch based on the connector code on GitHub. But that sounds like a lot of work. I was told it is a rather low-level codebase where you have to do a lot yourself: configure, wire, and manage. I was warned there is a pretty steep learning curve. Not where you want to start for a first experiment.

By now, there are a few vendors that offer higher-level EDC implementations with extra features that make configuration easier and reduce complexity in handling connections. One of them is the German company Sovity. In addition to their enterprise offering, they provide a Sovity Community Edition that you can run yourself. That is what I used to start these experiments.

Since I have not tested other implementations, this is not the moment to compare different EDC solutions. If there is a need for that, do let me know.

To use EDC, you need a place to store state. Sovity CE requires a PostgreSQL database. The provider stores, for example, which assets and offers are available, and the conditions attached.

For the experiment we need two EDCs: one for the provider that sits in front of the basic web server, and one for the consumer that wants to retrieve data from the provider. To spin up all moving parts, I use five Docker containers:

api: the basic web server with the actual data
provider-db
consumer-db
edc-provider
edc-consumer

The data space transaction

I implemented the data space transaction in TypeScript and run it with Node. The transaction steps are modelled separately, and in the experiment they are executed sequentially.

1. Provider: ensure asset (Control Plane)

As a Provider, you are essentially building a wall around your basic web server. In that wall, you open a gate where you control access.

import type { Config } from "../config.js";import { provider } from "../helpers.js";import { edcJson } from "../http.js"; export async function ensureAsset(cfg: Config) {  console.log(    provider(      `Checking whether asset '${cfg.assetId}' already exists in the provider catalog`    )  );   const list = await edcJson(    cfg,    "POST",    `${cfg.providerMgmt}/v3/assets/request`,    {      "@type": "https://w3id.org/edc/v0.0.1/ns/QuerySpec",      "https://w3id.org/edc/v0.0.1/ns/offset": 0,      "https://w3id.org/edc/v0.0.1/ns/limit": 50,    }  );   const exists =    Array.isArray(list) && list.some((x) => x?.["@id"] === cfg.assetId);   if (exists) {    console.log(      provider(`Asset '${cfg.assetId}' already exists — skipping creation`)    );  } else {    console.log(      provider(        `Asset '${cfg.assetId}' not found — creating new asset definition`      )    );     console.log(      provider(        `Registering asset metadata and linking it to the underlying HTTP data source`      )    );     await edcJson(cfg, "POST", `${cfg.providerMgmt}/v3/assets`, {      "@id": cfg.assetId,      "@type": "https://w3id.org/edc/v0.0.1/ns/Asset",      "https://w3id.org/edc/v0.0.1/ns/properties": {        name: "Hello asset",      },      "https://w3id.org/edc/v0.0.1/ns/dataAddress": {        "@type": "https://w3id.org/edc/v0.0.1/ns/DataAddress",        "https://w3id.org/edc/v0.0.1/ns/type": "HttpData",        "https://w3id.org/edc/v0.0.1/ns/baseUrl": "http://api:7070",        "https://w3id.org/edc/v0.0.1/ns/path": "/hello",      },    });     console.log(      provider(        `Asset '${cfg.assetId}' successfully registered in the provider catalog`      )    );  }   /**   * Didactic step: show what the underlying data source returns,   * without involving EDC, contracts, or transfer tokens.   */  const sourceUrl = `${cfg.sourceHostBaseUrl}${cfg.sourcePath}`;   console.log(    provider(      `Fetching data directly from the underlying HTTP source (no EDC, no contract)`    )  );  console.log(provider(`Source URL (host-mapped): ${sourceUrl}`));  console.log(    provider(      `This call bypasses the dataspace completely and is shown for reference only`    )  );   try {    const res = await fetch(sourceUrl, {      headers: { Accept: "application/json" },    });    const text = await res.text();     let parsed: any = text;    try {      parsed = JSON.parse(text);    } catch {}     console.log(provider(`Raw source response (reference):`));    console.log(      typeof parsed === "string" ? parsed : JSON.stringify(parsed, null, 2)    );     console.log(      provider(        `NOTE: This is the same data the asset points to, but accessed without any dataspace guarantees`      )    );    console.log(      provider(        `In step 7, the consumer will retrieve this data via EDC, using a negotiated contract and transfer token`      )    );  } catch (err) {    console.log(      provider(        `WARNING: Failed to fetch data directly from the underlying source`      )    );    console.log(err instanceof Error ? err.message : err);  }}

Step 1 is fundamental: the provider declares that there is something to negotiate about. An asset is registered in the Provider Management API, including a reference to the actual source data. This creates neither access nor publication, but it does create an anchor point: a named object that can later become visible in the data space through conditions and contracts. Without this step there is nothing to offer; without a contract definition the offer remains invisible. So this is not publication, but a preparatory promise: “this data exists, and I am willing to make it available under conditions.”

The implementation is deliberately transparent. The code starts with a check: does the asset-id already exist in the provider catalogue? If it does, nothing is changed. But each time you run my experiment, all containers and data are destroyed and rebuilt, so most of the time this step will register a new asset. Assets have minimal metadata and a DataAddress of type HttpData for now. That DataAddress points to the internal container URL (http://localhost:7070/hello), which is where the EDC will later grant consumer access.

After registration, the experiment includes an explicit verification moment. The data is fetched from the data server directly, outside of EDC. That path is intentional: once open as usual, and later via the city gate. The difference is not in the bytes, but in the promise around them. This allows us to verify that the data exists and that we receive the same content once we later approach the gate. This part is therefore not part of the data space transaction.

For experiment 1, there is only one protocol, one endpoint, and one asset, with minimal metadata.

2. Provider: ensure contract-definition (Control Plane)

The provider can, like a medieval city council, define conditions that the guards at the gate must enforce.

import type { Config } from "../config.js";import { provider } from "../helpers.js";import { edcJson } from "../http.js"; export async function ensureContractDefinition(cfg: Config) {  console.log(    provider(      `Ensuring contract definition '${cfg.contractDefId}' exists for asset '${cfg.assetId}'`    )  );   console.log(    provider(      `A contract definition links assets to policies that govern negotiation and access`    )  );   console.log(    provider(`This experiment uses policy '${cfg.policyId}' (always-true)`)  );   console.log(    provider(      `'always-true' means: the policy contains no constraints and therefore always evaluates to ALLOW`    )  );   console.log(    provider(`This does NOT mean the data is public or freely accessible`)  );   console.log(    provider(      `It means: if a consumer reaches contract negotiation, the policy itself will not block it`    )  );   console.log(    provider(      `In later experiments, this policy will be replaced by time-, purpose-, or identity-based rules`    )  );   console.log(    provider(      `Binding asset '${cfg.assetId}' to this policy via a contract definition`    )  );   const contractDefBody = {    "@id": cfg.contractDefId,    "@type": "https://w3id.org/edc/v0.0.1/ns/ContractDefinition",    "https://w3id.org/edc/v0.0.1/ns/accessPolicyId": cfg.policyId,    "https://w3id.org/edc/v0.0.1/ns/contractPolicyId": cfg.policyId,    "https://w3id.org/edc/v0.0.1/ns/assetsSelector": [      {        "@type": "https://w3id.org/edc/v0.0.1/ns/Criterion",        "https://w3id.org/edc/v0.0.1/ns/operandLeft": "id",        "https://w3id.org/edc/v0.0.1/ns/operator": "=",        "https://w3id.org/edc/v0.0.1/ns/operandRight": cfg.assetId,      },    ],  };   const byIdUrl = `${    cfg.providerMgmt  }/v3/contractdefinitions/${encodeURIComponent(cfg.contractDefId)}`;   try {    await edcJson(cfg, "GET", byIdUrl);    console.log(      provider(`Contract definition '${cfg.contractDefId}' already exists`)    );  } catch (e: any) {    const msg = String(e?.message ?? e);    const is404 =      msg.includes("HTTP 404") ||      msg.includes("404 Not Found") ||      msg.includes("ObjectNotFound");     if (!is404) throw e;     await edcJson(      cfg,      "POST",      `${cfg.providerMgmt}/v3/contractdefinitions`,      contractDefBody    );    console.log(provider(`Contract definition '${cfg.contractDefId}' created`));  }   console.log(provider(`Contract definition '${cfg.contractDefId}' is active`));   console.log(    provider(      `Result: the asset will appear in the catalog with an offer that is always negotiable`    )  );   console.log(    provider(      `Access is still impossible without a successful contract and transfer process`    )  );}

This is where an asset stops being a quiet promise and becomes an offer. With a contract definition, the provider specifies under which policy rules an asset is negotiable. Only at this point does something emerge that can appear in a catalogue: not data, but an offer. The contract definition ties a selection of assets to one or more conditions (policies) and thereby says not who may enter, but when a negotiation is meaningful at all. Without this step, the catalogue remains empty, no matter how rich the underlying data is.

The code is set up as a management action you can run repeatedly without thinking. First, a contract definition is built with a fixed id, explicit references to both an access policy and a contract policy, and a simple selector that points to exactly one asset based on its id. Then we query the Provider Management API to see whether this definition already exists. If it does, nothing changes. If it does not, the offer is created.

My logging takes the time to explain what is happening. The policy used here is called “always-true”, but that name is semantic, not an evaluation. The data does not suddenly become freely accessible. Access still requires a successful contract negotiation and an explicit transfer process. “Always-true” only makes clear that there are no deliberate blockers in negotiation or transfer.

For experiment 1, this is deliberately cut down. There is exactly one asset, one selector, and one policy, used as both access policy and contract policy. There is no versioning, no variation in conditions, and no semantics beyond the absolute minimum.

3. Consumer: fetch catalog (Control Plane)

Now the consumer goes out to see who is selling what at the market.

import type { Config } from "../config.js";import { consumer } from "../helpers.js";import { edcJson, firstObj } from "../http.js";import { normalizeOfferForContractRequest } from "../normalizeOfferForContractRequest.js";import type { CatalogResult } from "../types.js"; export async function fetchCatalog(cfg: Config): Promise<CatalogResult> {  console.log(    consumer(`Requesting catalog from provider via consumer management API`)  );  console.log(    consumer(`We ask our own EDC to fetch the provider's catalog over DSP`)  );  console.log(consumer(`Provider DSP address: ${cfg.providerDspDocker}`));  console.log(consumer(`Protocol: dataspace-protocol-http`));   const catalog = await edcJson(    cfg,    "POST",    `${cfg.consumerMgmt}/v3/catalog/request`,    {      "@type": "https://w3id.org/edc/v0.0.1/ns/CatalogRequest",      "https://w3id.org/edc/v0.0.1/ns/counterPartyAddress":        cfg.providerDspDocker,      "https://w3id.org/edc/v0.0.1/ns/protocol": "dataspace-protocol-http",      "https://w3id.org/edc/v0.0.1/ns/querySpec": {        "@type": "https://w3id.org/edc/v0.0.1/ns/QuerySpec",        "https://w3id.org/edc/v0.0.1/ns/offset": 0,        "https://w3id.org/edc/v0.0.1/ns/limit": 50,      },    }  );   console.log(    consumer(      `Catalog response received. Now extracting provider participantId and first dataset`    )  );   const providerPid = catalog?.["dspace:participantId"] ?? "provider";  console.log(consumer(`providerPid=${providerPid}`));   const dataset = firstObj<any>(catalog?.["dcat:dataset"]);  if (!dataset) {    console.log(      consumer(        `No dataset found. This usually means the provider did not publish any assets (or contract definition does not match)`      )    );    throw new Error(      `No dcat:dataset in catalog:\n${JSON.stringify(catalog, null, 2)}`    );  }   const assetId = dataset?.["@id"];  const offer = dataset?.["odrl:hasPolicy"];  if (!assetId || !offer) {    console.log(      consumer(        `Dataset exists but is missing '@id' or 'odrl:hasPolicy'. Without these we cannot negotiate a contract`      )    );    throw new Error(      `Missing @id or odrl:hasPolicy in dataset:\n${JSON.stringify(        dataset,        null,        2      )}`    );  }   console.log(consumer(`assetId=${assetId}`));  console.log(    consumer(      `Offer found. This offer is what we will send back in step 04 as part of the ContractRequest`    )  );   const consumerPid = cfg.consumerPid ?? "consumer";  console.log(consumer(`consumerPid=${consumerPid}`));   console.log(    consumer(      `Normalizing offer for ContractRequest (adding target/assigner/assignee)`    )  );  console.log(    consumer(      `Why: the catalog offer may be incomplete for the management API, and missing fields previously caused 400/500 errors`    )  );   const normalizedOffer = normalizeOfferForContractRequest({    offer,    assetId,    providerPid,    consumerPid,  });   console.log(    consumer(      `Normalized offer ready. Next step can submit a ContractRequest without guessing missing ODRL fields`    )  );   return { providerPid, assetId, offer: normalizedOffer };}

For the first time, the consumer looks outward. Not to retrieve data, but to discover what is being offered and under which conditions. Through a catalogue request, the consumer asks its own connector to — on its behalf — inquire with the provider which datasets and offers are available. What comes back is not a list of files, but an overview of negotiable proposals: datasets coupled to policies. This step is pure discovery. It marks the transition from internal preparation (steps 1 and 2) to interaction within the data space.

The implementation makes that explicit. Via its own Management API, the consumer sends a CatalogRequest that fixes the provider address and the protocol used. The edc-provider builds a response, which is received and interpreted by the edc-consumer. From that response, three things are distilled: the provider’s participant-id, the available dataset, and the corresponding offer policy. That offer is essential; without a policy there is nothing to negotiate. Because catalogue offers in this experimental practice can sometimes be incomplete, a normalisation step follows. Missing fields are explicitly filled so that the next step — submitting a ContractRequest — does not fail due to implicit assumptions in the management API. What happens here is therefore less “reading” than “preparing to respond”.

For experiment 1, the simplifications are intentionally rough. Exactly one (the first) dataset is chosen, without filtering or pagination. Missing participant ids get fixed defaults. The catalogue is not fully validated; only what is strictly required for negotiation is touched.

4. Consumer: negotiate contract (Control Plane)

The consumer knows there is an interesting trader at the market and decides to show up at the city gate.

import type { Config } from "../config.js";import { consumer } from "../helpers.js";import { edcJsonRaw, waitForState } from "../http.js";import { normalizeOfferForContractRequest } from "../normalizeOfferForContractRequest.js";import type { CatalogResult, NegotiationResult } from "../types.js"; export async function negotiateContract(  cfg: Config,  cat: CatalogResult): Promise<NegotiationResult> {  const consumerPid = cfg.consumerPid ?? "consumer";   console.log(consumer(`Preparing ContractRequest for asset '${cat.assetId}'`));  console.log(    consumer(      `This is where we formally ask the provider for permission to use the asset`    )  );  console.log(    consumer(      `Provider participantId=${cat.providerPid}, Consumer participantId=${consumerPid}`    )  );  console.log(consumer(`Counterparty DSP address=${cfg.providerDspDocker}`));   console.log(    consumer(`Normalizing offer into a valid ODRL Offer for a ContractRequest`)  );  console.log(    consumer(      `This step is crucial: missing target/assigner/assignee will cause negotiation to fail`    )  );   const req = {    "@context": {      "@vocab": "https://w3id.org/edc/v0.0.1/ns/",      edc: "https://w3id.org/edc/v0.0.1/ns/",      odrl: "http://www.w3.org/ns/odrl/2/",    },    "@type": "edc:ContractRequest",    "edc:counterPartyAddress": cfg.providerDspDocker,    "edc:counterPartyId": cat.providerPid,    "edc:protocol": "dataspace-protocol-http",    "edc:policy": normalizeOfferForContractRequest({      offer: cat.offer,      assetId: cat.assetId,      providerPid: cat.providerPid,      consumerPid,    }),  };   console.log(    consumer(`Submitting ContractRequest to consumer management API`)  );  console.log(    consumer(`Endpoint: ${cfg.consumerMgmt}/v3/contractnegotiations`)  );   const created = await edcJsonRaw(    cfg,    "POST",    `${cfg.consumerMgmt}/v3/contractnegotiations`,    JSON.stringify(req)  );   const negotiationId = created?.["@id"];  if (!negotiationId) {    console.log(      consumer(`Contract negotiation request failed. No @id returned`)    );    throw new Error(      `No negotiation @id returned:\n${JSON.stringify(created, null, 2)}`    );  }   console.log(    consumer(`Contract negotiation created with id=${negotiationId}`)  );  console.log(    consumer(`Negotiation is now asynchronous and handled via the DSP protocol`)  );  console.log(consumer(`Waiting for negotiation to reach state FINALIZED`));   const finalized = await waitForState(    cfg,    `${cfg.consumerMgmt}/v3/contractnegotiations/${negotiationId}`,    (b) => b?.state,    "FINALIZED",    60  );   const agreementId = finalized?.contractAgreementId;  if (!agreementId) {    console.log(      consumer(        `Negotiation reached FINALIZED state but no contractAgreementId was returned`      )    );    throw new Error(      `No contractAgreementId on finalized negotiation:\n${JSON.stringify(        finalized,        null,        2      )}`    );  }   console.log(consumer(`Contract negotiation finalized successfully`));  console.log(consumer(`AgreementId=${agreementId}`));  console.log(    consumer(      `This agreement is the legal basis for any subsequent data transfer`    )  );   return { negotiationId, agreementId };}

In EDC terms, this is the formal act of knocking. The consumer turns a catalogue offer into an explicit request to the provider: may I use this asset under the conditions you have published? This is not a technical data transfer, but a legal and semantic action. If successful, the negotiation results in a contract agreement that binds both connectors. Only from this point onward does a shared reality exist in which “access” has meaning within the data space.

The code makes this step tangible. Based on the previously fetched catalogue information, a ContractRequest is constructed in JSON-LD form, explicitly stating provider and consumer, the protocol in use, and the policy that functions as the offer. Because EDC is strict here, the offer is normalised again: missing fields such as target, assigner, and assignee are explicitly set to avoid ambiguity. The request is then submitted via the Consumer Management API, after which the negotiation proceeds asynchronously. The code does not wait blindly; it polls deliberately until the negotiation reaches the FINALIZED state. At that point, the contract agreementId is extracted and stored as the legal foundation for everything that follows.

For experiment 1, this is stripped down. The negotiation uses a single fixed protocol, without alternatives or fallback. Status handling is binary: success or failure. There are no events, no retries, and no semantic error handling. Offer normalisation is a pragmatic intervention, not an ideal conditions model. The goal is not completeness, but visibility of the mechanism.

5. Consumer: start transfer (Control Plane)

We are now approaching the end of the metaphor’s shelf life…

A small hatch opens in the gate through which the gatekeeper will soon hand over a key that allows the visitor to open the gate themselves.

import type { Config } from "../config.js";import { consumer } from "../helpers.js";import { edcJsonRaw, waitForState } from "../http.js";import type {  CatalogResult,  NegotiationResult,  TransferResult,} from "../types.js"; export async function startTransfer(  cfg: Config,  cat: CatalogResult,  neg: NegotiationResult): Promise<TransferResult> {  console.log(    consumer(`Preparing transfer request for agreement '${neg.agreementId}'`)  );  console.log(    consumer(      `This step turns a legal agreement into an actual right to access data`    )  );  console.log(    consumer(      `Transfer type is HttpData-PULL: the consumer will actively pull data from the provider`    )  );  console.log(consumer(`Provider participantId=${cat.providerPid}`));  console.log(consumer(`Provider DSP address=${cfg.providerDspDocker}`));   console.log(consumer(`Building TransferRequest`));  console.log(    consumer(      `NOTE: Sovity EDC CE expects a simplified TransferRequest shape here`    )  );  console.log(    consumer(      `Using a full JSON-LD TransferRequest would cause a server error (this is a known pitfall)`    )  );   const req = {    "@context": { "@vocab": "https://w3id.org/edc/v0.0.1/ns/" },    "@type": "TransferRequest",    contractId: neg.agreementId,    protocol: "dataspace-protocol-http",    connectorId: cat.providerPid,    counterPartyAddress: cfg.providerDspDocker,    transferType: "HttpData-PULL",  };   console.log(    consumer(`Submitting TransferRequest to consumer management API`)  );  console.log(consumer(`Endpoint: ${cfg.consumerMgmt}/v3/transferprocesses`));   const created = await edcJsonRaw(    cfg,    "POST",    `${cfg.consumerMgmt}/v3/transferprocesses`,    JSON.stringify(req)  );   const transferProcessId = created?.["@id"];  if (!transferProcessId) {    console.log(consumer(`Transfer process creation failed. No @id returned`));    throw new Error(      `No transfer @id returned:\n${JSON.stringify(created, null, 2)}`    );  }   console.log(    consumer(`Transfer process created with id=${transferProcessId}`)  );  console.log(    consumer(`Transfer process is now orchestrated by both connectors`)  );  console.log(consumer(`Waiting for transfer process to reach state STARTED`));   await waitForState(    cfg,    `${cfg.consumerMgmt}/v3/transferprocesses/${transferProcessId}`,    (b) => b?.state,    "STARTED",    60  );   console.log(consumer(`Transfer process is STARTED`));  console.log(    consumer(      `This means the provider has accepted the transfer and prepared access`    )  );  console.log(    consumer(      `No data has flowed yet; only the conditions for access are now in place`    )  );   return { transferProcessId };}

An abstract agreement becomes tangible in this step. The contract from the previous step exists, but as long as no transfer process has been started, it remains a “paper” reality. Here, the consumer asks the provider to turn the agreed right into operational access. No data flows yet, but the infrastructure is prepared: the guards receive instructions. This is the transition from legal permission to technical possibility — a crucial link between negotiation and use.

The code makes this transition explicit and verifiable. Based on the previously obtained contract agreementId, a TransferRequest is built. It specifies which contract is in force, which protocol is used, and that this is an active pull by the consumer. The request is submitted via the Consumer Management API, after which both connectors orchestrate the transfer process together. The consumer does not wait blindly; it tracks status until the process reaches STARTED. That moment matters: the provider has accepted the transfer and prepared everything internally to grant access. The logging emphasises that this is not data traffic yet, but a prepared possibility.

For experiment 1, clear simplifications were made. Only a pull variant is used and only one fixed protocol is supported. The TransferRequest is deliberately not fully JSON-LD compliant, but adapted to the expectations of the EDC implementation used. The STARTED status is treated as sufficient; completion and error paths remain out of scope. The goal is insight, not completeness.

6. Consumer: fetch EDR (Control Plane)

The key is handed over.

import type { Config } from "../config.js";import { consumer } from "../helpers.js";import { firstObj } from "../http.js";import type { EdrResult, TransferResult } from "../types.js"; export async function fetchEdr(  cfg: Config,  tr: TransferResult): Promise<EdrResult> {  const url = `${cfg.consumerMgmt}/v3/edrs/${tr.transferProcessId}/dataaddress`;   console.log(    consumer(      `Fetching EDR (Endpoint Data Reference) for transferProcessId='${tr.transferProcessId}'`    )  );  console.log(    consumer(      `This step is where the contract world turns into an actual HTTP call you can make`    )  );  console.log(    consumer(      `We ask the consumer connector: “Given this transfer, what temporary access details did the provider issue?”`    )  );  console.log(consumer(`Endpoint: ${url}`));  console.log(    consumer(`Polling because the EDR may not exist immediately after STARTED`)  );   let last: any = null;   for (let i = 0; i < 60; i++) {    const attempt = i + 1;    console.log(consumer(`EDR poll attempt ${attempt}/60`));     const res = await fetch(url, {      headers: { "X-Api-Key": cfg.apiKey, Accept: "application/json" },    });     const text = await res.text();    try {      last = JSON.parse(text);    } catch {      last = text;    }     const obj = firstObj<any>(last);    const token = obj?.authorization;    const endpointDocker = obj?.endpoint;     if (token && endpointDocker) {      console.log(consumer(`EDR is available`));      console.log(        consumer(`Provider public endpoint (docker): ${endpointDocker}`)      );      console.log(        consumer(          `Received authorization token (length=${String(token).length})`        )      );      console.log(        consumer(          `NOTE: this token is short-lived and scoped to this agreement/transfer`        )      );      console.log(        consumer(          `Next step will use this token to call the provider's public API`        )      );       const endpointHost = endpointDocker.replace(        "http://edc-provider:11005/api/public",        cfg.providerPublicHostBase      );       console.log(        consumer(`Provider public endpoint (host):   ${endpointHost}`)      );      console.log(        consumer(          `The docker->host rewrite exists only because we are calling from the host machine, not from inside the docker network`        )      );       return { endpointDocker, endpointHost, token };    }     console.log(      consumer(        `EDR not ready yet (missing 'authorization' and/or 'endpoint'). Waiting...`      )    );     await new Promise((r) => setTimeout(r, 1000));  }   console.log(consumer(`ERROR: Timed out waiting for EDR`));  throw new Error(    `EDR not ready or unexpected response from ${url}\nLast=${JSON.stringify(      last,      null,      2    )}`  );}

At this point, the data space stops being abstract. The transfer has started, the conditions have been accepted, and now the EDR appears: the Endpoint Data Reference. It is a temporary right to use the existing gate. The EDR combines two things that strictly belong together: a data endpoint and an authorisation code (a token). Together they form the concrete translation — the operationalisation — of the contract and transfer into an executable action. Without an EDR, access remains a promise; with an EDR, it becomes actionable.

The code treats this as a patient moment of negotiation with its own connector. Via the Consumer Management API, the consumer repeatedly asks the provider whether a data address is already available for the given transfer process. Because the provider does not always issue this immediately, the code polls until an object appears that contains both an endpoint and an authorisation token. Once that combination is present, it is made explicit what has been received: a short-lived token, strictly linked to this contract and this transfer.

Because the experiment runs locally and not on a network, the endpoint returned by the provider is rewritten to a host address. This rewrite is purely practical; conceptually it remains the same gate guarded by the edc-provider. In earlier steps this distinction does not matter, because the consumer never calls a provider endpoint directly; it always does so via its EDC. Only once the EDR is issued do we leave the Control Plane and enter the Data Plane.

For experiment 1, the assumptions are simple. We assume exactly one EDR with a straightforward token model. Polling is coarse and time-bound, without nuance or error feedback. Endpoint rewriting is hardcoded and context dependent. The goal is not robustness, but to show where the key comes from and what exactly it opens.

7. Consumer: data access (Data Plane)

Through the gate.

import type { Config } from "../config.js";import { consumer } from "../helpers.js";import type { EdrResult } from "../types.js"; export async function accessData(cfg: Config, edr: EdrResult): Promise<any> {  console.log(consumer(`Accessing provider data via the EDR-issued endpoint`));  console.log(    consumer(`This is the first moment where we leave the EDC APIs entirely`)  );  console.log(    consumer(      `From here on, this is a plain HTTP request — but one that only works because a contract exists`    )  );   const auth =    cfg.authHeaderMode === "bearer" ? `Bearer ${edr.token}` : edr.token;   console.log(    consumer(`Using Authorization header mode: ${cfg.authHeaderMode}`)  );  console.log(    consumer(`Calling provider public endpoint: ${edr.endpointHost}/`)  );  console.log(    consumer(      `NOTE: Without a valid contract, this endpoint would reject the request`    )  );  console.log(    consumer(      `NOTE: The token is scoped, temporary, and tied to the negotiated agreement`    )  );   const res = await fetch(`${edr.endpointHost}/`, {    headers: {      Authorization: auth,      Accept: "application/json",    },  });   console.log(consumer(`Provider responded with HTTP ${res.status}`));   const text = await res.text();  let parsed: any = text;   try {    parsed = JSON.parse(text);    console.log(consumer(`Response body is valid JSON`));  } catch {    console.log(consumer(`Response body is not JSON, returning raw text`));  }   if (!res.ok) {    console.log(consumer(`ERROR: Data access failed`));    throw new Error(      `Data access failed: HTTP ${res.status}\n${        typeof parsed === "string" ? parsed : JSON.stringify(parsed, null, 2)      }`    );  }   console.log(consumer(`Data access successful`));  console.log(    consumer(      `What you are seeing now is the protected resource, delivered through the dataspace contract`    )  );  console.log(    consumer(      `The connector is no longer involved in the data path — only in making this access legitimate`    )  );   return parsed;}

This is where everything comes together. The contract negotiation has been finalised, the transfer has started, and the EDR has been retrieved. Now the consumer actually accesses the protected data. This happens outside the EDC Management APIs, via a regular HTTP request to the provider’s public data endpoint. So this is not the localhost:7070/hello endpoint — that remains safely behind the provider’s public data endpoint.

The difference from a normal API call is not the protocol — we are now using plain HTTP instead of DSP — but the context: this access works only because a valid contract and an associated transfer process exist. This is where data space logic translates into actual bytes.

The code makes that transition explicit. Based on the previously obtained EDR, an HTTP Authorization header is constructed, either as a raw token value or as a Bearer token. Then a request is made to the provider’s public data endpoint. No EDC API is called anymore; technically, this call is no different from a standard fetch. The provider validates the token, links it to the correct contract, and retrieves the underlying data from localhost:7070/hello on behalf of the consumer. The response is checked for status and content, after which the body — if possible — is parsed as JSON. What comes back is the same data that was fetched directly in step 1, but now delivered through an explicit, enforceable agreement within a data space.

Yay! Experiment 1 succeeded: we built a working data space transaction!

For experiment 1, this step is deliberately minimal. The root endpoint is always called, without distinguishing assets or representations. Authentication is simplified to a single header variant and there is no retry or error strategy. Content negotiation is entirely absent. The goal is not to build a robust client, but to show where the data space ends and the regular web begins again.

What can break in experiment 1?

First: this is an experiment, not production code. So yes, it can break — and I certainly will not have seen everything on my Mac. The biggest issue is probably race conditions, which I kept running into during the experiment. I had to make a few adjustments to address them in at least two places: (1) containers start asynchronously, and sometimes the experiment wanted to begin before all required APIs were available. In the start-containers.sh script — which performs initialisation — there are checks that keep waiting and polling until everything is up.

# 1) Management APIswait_for_v3_query "EDC provider management" "$PROVIDER_MGMT"wait_for_v3_query "EDC consumer management" "$CONSUMER_MGMT" echo "== Waiting for DSP endpoints (host-facing) =="# DSP endpoints often respond with 200/404/405 depending on method/path; we just need them alive.wait_for_http "EDC provider DSP" "$PROVIDER_DSP_HOST" '^(200|404|405)$'wait_for_http "EDC consumer DSP" "$CONSUMER_DSP_HOST" '^(200|404|405)$' echo "== Waiting for public dataplanes (host-facing) =="wait_for_public_dataplane "EDC provider" "$PROVIDER_PUBLIC_HOST"wait_for_public_dataplane "EDC consumer" "$CONSUMER_PUBLIC_HOST" echo "== Containers ready =="

In addition, (2) the experiment itself is the interaction between two EDC servers via the DSP data space protocol. That communication is, of course, asynchronous too, and sometimes the two must wait for each other.

For example, in step 4 during contract negotiation:

  const finalized = await waitForState(    cfg,    `${cfg.consumerMgmt}/v3/contractnegotiations/${negotiationId}`,    (b) => b?.state,    "FINALIZED",    60  );

I added the waitForState function, which in this case waits 60 seconds until a timeout. That is already quite a lot, but if, when repeating this experiment, you notice communication issues between consumer and provider, one possible remedy is simply to wait longer.

And in any case: if you run into problems, or just have questions and comments, do not hesitate to contact me — and we can look at it together.

What next?

I put this experiment together fairly quickly, and it left me wanting more. In the next experiment, I made the code from experiment 1 more robust and structured it for what follows. I also introduce a second consumer. That does not differ much from experiment 1, but it sets the stage for experiment 3, where I will explore what happens if consumer-2 tries to hijack consumer-1’s contract negotiation, the start of the transfer, or the EDR key.

In experiments 4 and 5 I will really dig into the policy conditions a provider can attach to an offer, and how those can be evaluated automatically. Right now I use Always-true, and that is obviously not a real condition.

This is what I have lined up.

After that, I have a list of follow-up steps I want to explore — but please do send me a message if you have ideas for other data space experiments.

The experiments will not necessarily appear one right after the other, so keep following this blog for updates!