Catalog Management System (CMS): A Collaborative Workspace for Descriptive Asset Metadata

Public Description and Whitepaper

Working title: Catalog Management System (CMS)
Working URL: catalog.org/cms/
Last updated: 2026-05-14
Author: Roberto Bourgonjen
Copyright: © 2026 Roberto Bourgonjen. All rights reserved.
Project: Catalog CMS
Supersedes: Asset Catalog Whitepaper (2026-05-09), Asset Library Whitepaper (2026-05-08)


1. Overview

The Catalog Management System (CMS) is the collaborative workspace for descriptive asset metadata in the Catalog ecosystem. Where CPR records signed claims about digital assets, Encrypted Filestorage preserves and serves the encrypted bytes, and Catalog.ID identifies people, organizations, delegates, and AI agents, the CMS is what professionals and institutions use together to describe, organize, search, publish, and transfer the assets those modules underpin. It is the day-to-day collaborative working surface of the ecosystem — the place where curators, archivists, creators, delegates, and authorized AI agents share, review, and refine descriptive metadata.

The Catalog ecosystem comprises:

  • Catalog.ID: pseudonymous identity for people, organizations, delegates acting for organizations, and AI agents acting for principals.
  • CPR: durable signed claims about digital assets, AssetID allocation, and ownership records.
  • Encrypted Filestorage (EFS): preservation, replication, and retrieval for encrypted file packages. EFS stores ciphertext.
  • Licensing Agent (LA): bidirectional license issuance — issuer-signed offer (with wrapped key and terms digest) plus recipient-signed acceptance — serving every Catalog module that needs to admit a recipient to encrypted material.
  • Catalog Management System (CMS): collaborative workspace for descriptive asset metadata — standards-aware description, profile-driven export, transfer workflows, and discovery.
  • Bitcash: prepaid micropayment and metering layer for service consumption.

The CMS is positioned as the primary interface to the ecosystem for archival institutions, libraries, museums, governments, individual creators, and the AI agents that increasingly act on their behalf. It is not just a description tool. It is a durable, privacy-preserving, standards-aware control plane for digital assets and archival records, designed to integrate with the existing archival network rather than displace it.

The platform's core organizing principle is:

AssetID is the stable first-class identifier for an Asset: a first-class cultural, archival, administrative, creative, or evidential object, whether the object is born-digital, physically originated, digitally represented, hybrid, or aggregating.

This principle distinguishes the CMS from systems that treat the file as the primary unit of identity. A photograph, a manuscript, a sculpture, a website, an album, a fonds, a scientific dataset, a contract, a record series — each can be a first-class Asset under one AssetID, with files and packages attached as representations rather than as competing identities.

Catalog was created by Roberto Bourgonjen, who brings over twenty years of professional experience as an application developer for major institutional archives in the Netherlands. The CMS was engineered from the ground up — after his retirement from business, as a private undertaking — for massive, global scale, built on archival practice proven at national scale, and shaped around the founder's own needs as an active photographer, musician, and software developer managing a lifetime of work, while remaining fit for the largest institutional archives.

The CMS's ambition is high: to serve, with one canonical model, the international archival community, national libraries, university special collections, museums, government agencies, individual artists, photographers' estates, scholarly publishers, and the AI agents that operate in those contexts — and to do so while preserving the ecosystem's blockchain-anchored evidence and end-to-end encryption posture. The platform aims to be a reference application that integrates seamlessly with existing archival aggregation portals, deposit endpoints, content-management systems, repository platforms, harvest protocols, and persistent-identifier services.

The CMS is also designed to be unique in its first-class support for AI agents. Catalog.ID defines a dedicated agent identity type for AI models, automation scripts, and human operators acting on behalf of a personal or delegate principal. The CMS treats agents as full participants: agent-signed claims, agent-attributed derivatives, recorded session and scope context, and disclosed AI provenance attached to any artefact an agent produced or modified. Archival institutions need to know whether a description, transcription, or enhancement was authored by a human, an authorized agent, or an external AI service, and on whose authority that agent acted. The CMS answers that question cryptographically and visibly.


2. The Catalog Ecosystem and Where the CMS Sits

The Catalog ecosystem is a set of cooperating but independently operable modules. The CMS draws on four of them:

2.1 CPR

CPR records signed claims about digital assets and provides durable evidence that a claim existed at or before a certain time. AssetIDs are 12-character logical identifiers allocated by CPR operators. Registration claims, attribution claims, custody claims, and other signed statements are anchored into hourly blocks and timestamped against the XRP Ledger.

The CMS uses CPR as its evidence layer. Registration claims, attribution claims, custody claims, and conformance claims pass through CPR. The CMS never invents its own immutable provenance store; every claim that needs durability goes through CPR.

Reference: catalog.org/cpr_whitepaper/

2.2 Encrypted Filestorage

EFS stores encrypted byte packages associated with AssetIDs. One AssetID can have many EFS packages, each identified by a PackageID of the form {assetID}.{operatorID}.{role}.{serial}. Packages have protocol-level roles — source, preservation, preview, access, edition, text, metadata, and submission — which control retrieval and currentness independently. A package may contain many internal files of many file types, addressed through the encrypted package manifest.

The CMS uses EFS as its byte-storage layer. It never stores plaintext files. It stores metadata about packages and their contents, and resolves references to specific packages and files when authorized clients request them.

Reference: catalog.org/efs_whitepaper/

2.3 Catalog.ID

Catalog.ID is the privacy-preserving identity layer. It supports four identity types:

  • Personal identities for natural persons.
  • Organization identities for studios, archives, institutions, and other entities.
  • Delegate identities for natural persons acting in the capacity of an organization.
  • Agent identities for autonomous or semi-autonomous entities — AI models, automation scripts, or human operators — acting on behalf of a personal or delegate principal.

All four types share a common cryptographic and claims architecture. They differ in lifecycle control, username format, key custody, and legal framework. Agents have no independent key custody: their signing, encryption, and messaging keys are generated by and stored in the principal's key bag, and reach the agent only through an authorized session in a local runtime.

The CMS uses Catalog.ID for authentication, identity claim sharing, agent tokenization, and the resolution of pseudonymous party identifiers in claims and metadata.

Reference: catalog.id/whitepaper/
Agent identities (Catalog.ID §3c): catalog.id/whitepaper/#3c

2.4 Bitcash

Bitcash is the prepaid micropayment layer. AssetIDs are purchased from CPR operators against Bitcash; EFS storage and retrieval are funded through Bitcash; editions may carry payment markers for subscription gates, micropayments, and royalty splits.

The CMS records the existence and category of these payment relationships as metadata facets but is not itself a payment system. Bitcash settlement happens in Bitcash; the CMS merely references it.

Reference: bit.cash/whitepaper/

2.5 What the CMS adds

The CMS adds five things on top of the four modules above:

  1. A canonical metadata model that treats AssetIDs as first-class objects with versioned representations, file membership, claims, custody, rights, access conditions, preservation events, and agentic provenance.
  2. Profile-driven standards compatibility — one canonical graph projects to MDTO, E-ARK, NARA, RiC-O, EAD, EAC-CPF, DCAT, IIIF, CIDOC-CRM, Linked Art, LIDO, EDM, Schema.org, and others, without any one external standard becoming the internal model.
  3. Transfer workflows that take an AssetID from an artist's local workspace to an institutional custodian without losing provenance, cryptographic evidence, preservation metadata, agent provenance, or rights context.
  4. Aggregation and harvest endpoints — OAI-PMH, ResourceSync, IIIF Change Discovery, SWORD v2/v3, Signposting, Schema.org JSON-LD — so that the catalog plugs into existing archival network infrastructure rather than asking institutions to abandon it.
  5. First-class agent and AI provenance — every AI-derived artefact records who produced it, on whose authority, with which model and parameters, and whether it has been reviewed by a human curator.

3. Design Goals

The CMS is designed to satisfy ten goals.

  1. AssetID-first modeling. AssetID is the identity of the thing being archived, transferred, licensed, described, or preserved. PackageID names one encrypted package generation under that AssetID. File UUIDs name exact binary files. These three layers remain distinct.

  2. Profile-driven, not nation-driven. The internal canonical graph is broad enough to preserve nuance; profile projections are strict enough to satisfy receiving systems. No single external standard — MDTO, NARA, EAD, DCAT, RiC-O, CIDOC-CRM — becomes the internal model.

  3. Privacy-preserving by default. Public, restricted, and EFS-encrypted exposure classes are explicit. Public metadata is privacy-linted before publication. Immutable claim bodies carry no arbitrary free-text personal data.

  4. Encryption belongs in EFS. The CMS metadata layer itself is plaintext, gated by Catalog.ID authentication and AccessCondition rules. Anything that must be unreadable to the operator goes to EFS as an encrypted metadata-role package and is referenced from the catalog by an EncryptedMaterialReference. The catalog stays searchable, indexable, and live.

  5. Cryptographic evidence is preserved across transfer. CPR claims, ciphertext digests, and plaintext digests survive movement between catalog instances, jurisdictions, custodians, and aggregator portals.

  6. Progressive metadata completeness. Individual artists can register an asset with minimal metadata and grow it into transfer-ready, custodian-accessioned, government-transfer-ready state without re-identifying the asset.

  7. Effortless but not casual transfer. The transfer from an individual archive to an institutional custodian is simple for creators, cryptographically and archivally rigorous for custodians, and produces a signed custody claim plus a profile-conformant package.

  8. First-class AI agents. Agent identities are full participants — they sign claims, upload files, author derivatives, and accept custody when the principal has scoped them to do so. Every agent action records its session, scope, and AI provider context.

  9. Seamless integration with the existing archival network. OAI-PMH and ResourceSync for harvest, SWORD v2/v3 for deposit, Signposting for discovery, Schema.org for general-purpose web crawling, IIIF for compound-object presentation, ARK and Handle for legacy persistent identifiers — the CMS speaks the protocols institutions already use.

  10. Long-horizon design. Choices are made with century-scale preservation in mind: post-quantum cryptography by default, algorithm agility through explicit identifiers, immutable claim history that never invalidates retroactively, and storage decoupled from description so that obsolete media can be migrated without rewriting the metadata graph.


4. AssetID-First Modeling

4.1 What AssetID identifies

An AssetID identifies a first-class Asset. The Asset may be:

  • a born-digital object — a photograph, a video, a recording, a software release, a dataset, a 3D model, a website snapshot;
  • a physical object — a book, a manuscript, a painting, a sculpture, a tape, a film reel, a folder, a box;
  • a conceptual work — a composition, a literary work, a research project;
  • a record — a government document, a court file, a piece of correspondence;
  • an aggregation — a collection, a series, a fonds, an album, an issue, a portfolio, an edition;
  • a hybrid — a physical object whose digital representation is also catalog-relevant.

AssetID is not the file. A single AssetID can have zero or many EFS packages, each in one of the protocol-level package roles, each with multiple immutable serial generations, and each containing many files of many file types. The same AssetID can carry source masters, preservation copies, low-resolution previews, OCR text, IIIF tiles, edition bundles, and submission packages — all as packages and files attached to the same identity.

This design lets a physical book exist as a single AssetID. The book's source digitization, preservation master, OCR text, public preview, IIIF reader, MDTO-SIP transfer bundle, and edition publication are all packages under that one AssetID. The book does not become a different object because it was scanned.

4.2 When a child AssetID is appropriate

A child AssetID is appropriate when a sub-object needs first-class identity in its own right. Examples:

  • an individual scan of a manuscript page that has its own claims, rights, custody, or transfer trajectory;
  • a single photograph in a collection that is licensed independently;
  • a track within an album that is published separately;
  • a component within a compound object that needs its own provenance record.

A child AssetID is not the right model for derivatives that exist purely to serve presentation: thumbnails, low-resolution previews, OCR text, IIIF tiles, video chunks, ALTO XML files, captions, search indexes. Those are file roles inside a representation package under the parent AssetID.

The rule: derivatives become AssetIDs only when they have to be cited, owned, transferred, accessioned, or described independently from their source.

Asset hierarchy is arbitrary-depth. An encyclopedia contains volumes, each volume contains page scans, each page scan may itself have a sub-component. An album contains discs, each disc contains tracks. A fonds contains series, sub-series, files, and items. Each level that needs first-class identity gets its own AssetID. Levels that do not need first-class identity are expressed as descriptive structure inside the parent's metadata or as files inside its packages.

The structural parent/child link is anchored cryptographically in CPR via the child_of claim relation (CPR §4.11). CPR records the bare structural fact that one AssetID is a child of another, enforces single-parent and acyclicity, and uses the link to determine effective ownership and to cascade ownership transfers through the subtree. The Catalog records the descriptive layer on top: archival level (fonds, series, sub-series, item), domain structural role (page, track, scene, chapter, canvas), and ordinal sequence within the parent. CPR's primitive is deliberately vocabulary-free; archival and domain labels are projected to MDTO, RiC, EAD, CIDOC-CRM/Linked Art, and similar standards at the Catalog layer (§9).

Structural parent membership is single and cascading; descriptive collection membership is many and non-cascading. A track on its master album release is in a structural parent/child relationship with the album: child_of in CPR, transfers with the album, governed by the root's ownership. The same track appearing on a "Best of 2025" compilation, a film soundtrack, or a curator's playlist is descriptive collection membership, recorded as a member_of_collection relation in Catalog metadata only. Multi-membership lives in the Catalog because membership does not imply ownership and is not cascading.

4.3 AssetKind

Every Asset carries an AssetKind — a controlled value drawn from physical_object, conceptual_work, record, collection, born_digital_object, digital_surrogate, component, derivative, dataset, and similar values. Standards mappings depend on the kind: a parent physical-object Asset projects to MDTO Informatieobject and RiC RecordResource; a child digital surrogate projects to MDTO file/representation and RiC Instantiation; a collection projects to MDTO aggregation and RiC RecordSet; a museum object projects additionally to CIDOC-CRM E22 Human-Made Object and Linked Art.

4.4 Local identifiers

Custodians have existing identifiers — accession numbers, transfer-request numbers, museum object IDs, fonds IDs, finding-aid IDs, DOIs, ARKs, Handles, ISBNs, OCLC numbers. These travel with the Asset as local_identifiers, never overwriting AssetID. AssetID is the global, immutable identifier; local identifiers carry institutional and historical context and remain on the record across transfers, so that a custodian's finding aids and database systems continue to resolve naturally.

4.5 Three layers, never confused

AssetID        identifies the Asset (the thing being preserved, described, transferred)
PackageID      names one encrypted EFS package generation under one AssetID
File UUID      identifies one exact binary file inside a package

A package belongs to exactly one AssetID. A file UUID belongs to exactly one package generation, with a stable plaintext digest that survives across re-packaging. AssetIDs persist through migrations, repackaging, and custodial transfers; PackageIDs are immutable but generations supersede; file UUIDs preserve binary identity.

4.6 Ownership over time

AssetID is a stable identifier; the owner of an AssetID is not. A creator who sells their archive to a foundation, an estate that passes to a new executor, a small label acquired by a larger one, an artist who gifts their work to a museum — these are routine events. CPR handles ownership change as a registration claim that rebinds the AssetID to a new authorised key (CPR §6.2). The CMS observes the rebinding through CPR and updates its own party references accordingly.

Three things matter at the catalog level. First, ownership rebinding is a CPR concern; the catalog does not invent its own ownership-transfer mechanism. Second, local_identifiers (§4.4) survive ownership changes — a transferred record retains its prior accession numbers, finding-aid IDs, museum object IDs, and DOIs as historical context, never overwritten on transfer. Third, licenses issued before transfer remain valid by default; the new owner inherits authority to issue further licenses and to revoke or extend pre-existing ones going forward, but cannot retroactively invalidate them.

Custody (who holds and manages) is a separate axis from ownership (who has IPR) and may transfer independently or together. An institutional donation often transfers custody fully while ownership remains with the donor's estate; a sale typically transfers ownership while custody may stay with a third-party preservation service. The CMS records each axis distinctly through the Transfer workflow (§11) for custody and through CPR rebinding for ownership.

For hierarchical assets, only the root holds an explicit owner key in CPR; descendants inherit effective ownership from the root via the child_of chain (CPR §4.11). Rebinding the root cascades through the entire subtree in the same block, with no per-descendant claim and no per-descendant fee. The Catalog observes this by consulting CPR's effective-owner resolution for any asset in the subtree rather than reading a per-asset owner field. A child asset cannot be rebound in place: to move a single child to a different owner, the current effective owner submits a detachment claim that simultaneously clears the structural parent link and assigns a new explicit owner (CPR §4.11). Moving a child between parents that share an effective owner is a re-parenting, not a rebinding, and requires no detach.


5. Conceptual Model

The CMS's canonical graph contains the entities below. They are described in summary here; full schemas live in the CMS Design document.

5.1 Core entities

Entity Meaning
Asset First-class object identified by an AssetID.
AssetKind What the AssetID identifies (physical object, conceptual work, record, collection, etc.).
AssetVersion A registered state/snapshot of an Asset's claims, file membership, relationships, and package references.
AssetFile Exact binary file with file UUID, digest, size, format, file role, derivation lineage, and package membership.
RepresentationSet Coherent representation of an Asset, formed from one or more package roles, generations, files, and child AssetIDs.
EncryptedPackage EFS package generation under one AssetID and one package role, immutable after sealing.
PackagePart Addressable encrypted part of a package, used for transport and audit.
PackageRole Controlled, versioned role of an EFS package.
PackageFileEntry File entry inside an encrypted package manifest.
Claim Signed statement about an asset, file, attribution, custody, rights, or event, registered through CPR.
Party Person, organization, delegate, agent, family, group, operator, or pseudonymous participant.
Agent Specialization of Party where party_type = agent — a Catalog.ID agent identity acting under a principal.
AgentSession Authorized run of an agent within a defined scope.
AgenticProvenance Metadata recording how, by whom, and on whose authority an AI- or automation-derived artefact was produced.
RoleAssertion Claim that a Party had a role in relation to an Asset.
CustodyState Current or historical custody, holding, or management relationship.
Transfer Proposed, active, accepted, rejected, or completed transfer workflow.
Collection Aggregation of Assets, often itself an AssetID. Members that form a single structural parent/child relationship are anchored in CPR via child_of and transfer with the collection (CPR §4.11).
CollectionMembership Descriptive, non-cascading membership of an Asset in one or more Catalog-level collections (e.g., a track that appears on a compilation alongside its master album release, or a photograph included in multiple curated exhibits). Catalog-only, multi-valued, does not affect ownership or transfer.
PhysicalEmbodiment Physical-material facet of an Asset (carrier, dimensions, condition, shelf mark).
AccessCondition Legal, ethical, donor, privacy, classification, or policy condition affecting access.
RightsExpression Copyright, license, reuse, moral rights, preservation rights, publication rights.
License A bidirectional license issued through the Licensing Agent: an issuer-signed offer carrying a wrapped key and a terms digest, plus a recipient-signed acceptance over the same digest. EFS honours retrieval only when both halves are present.
PreservationEvent Event affecting preservation state — ingest, validation, fixity, migration, replication, deletion, rewrap.
ConformanceClaim Claim that an asset, package, or export conforms to a profile.
Profile Versioned standard, jurisdictional, institutional, or workflow profile.

5.2 The package-role registry

EFS packages occupy one of eight protocol-level roles, each with its own currentness rule and access expectation:

Role Definition
source Original capture, acquisition, or creator-master materials.
preservation Archive-managed preservation representation, validated and migrated as part of preservation activity.
preview Lightweight browsing and identification material — thumbnails, contact sheets, snippets.
access Authorized access representation — reading copies, streaming proxies, accessible copies.
edition Curated, publishable, distributable, or licensed representation.
text Textual extraction and representation — OCR, HTR, transcripts, captions, subtitles, search indexes.
metadata Metadata-only or metadata-dominant encrypted bundles, versioned independently from payload files.
submission Transfer or deposit packages prepared for a receiving custodian.

The role string appears in the PackageID and is therefore visible outside the decrypting client. Roles are kept broad, privacy-safe, and protocol-versioned. They do not encode legal categories, sensitive personal data, or specific workflow states; the CMS tracks richer institutional meanings in catalog_purpose and profile_id fields that live in the catalog graph rather than in the EFS PackageID.

5.3 Package and file role semantics

Inside a package, files carry a file_role (master, preservation_master, access_copy, ocr, alto, hocr, pagexml, transcript, caption, thumbnail, tile, manifest, premis, mets, mdto, runtime, signature, validation_report, etc.) and a structural_role (page, cover, track, scene, canvas, chapter, layer, component, attachment).

A text package may contain ALTO, hOCR, PageXML, transcripts, captions, and a search index together. An edition package may contain a browser runtime, IIIF manifest, image tiles, OCR overlays, and edition signatures. Distinct file roles inside a single package are normal. Distinct package roles for distinct retrieval, currentness, and grant lanes are also normal.

5.4 Versioning and currentness

The CMS uses a mixed versioning model:

Axis Rule
CPR claims Immutable. Corrections are additive new claims that supersede earlier ones.
EFS package generations Immutable per (AssetID, role, serial). Revisions create new serials.
Package currentness Resolved separately from existence — by latest-serial rule or by signed set_current.
File derivative version Recorded at file level (OCR v1, OCR v2, thumbnail v3) inside the relevant package generation.
Public/restricted description Mutable, versioned graph; historical states retrievable by date or revision.
Rights and access policy Versioned with effective dates and decision provenance.
Custody and transfer Event-sourced; completed events are not silently mutated.

A new AssetID is not created merely because a preview, OCR, thumbnail, PDF, IIIF bundle, or public edition has been regenerated. Those are package or file versions. New AssetIDs appear only when a sub-object becomes independently citable, owned, transferred, accessioned, or described.


6. Storage Architecture: Public, Restricted, EFS-Encrypted

The CMS metadata layer is not encrypted at rest. It is a professional collaboration and content-management surface whose access is gated by Catalog.ID authentication plus role-based and policy-based access controls. This separation lets the catalog remain searchable, indexable, and live for its members while keeping anything that genuinely needs encryption in EFS where it belongs.

The design rule is straightforward:

Any data that needs encryption belongs in EFS, not in the CMS metadata layer.

Sensitive material — personal addresses, contracts, donor agreements, identity evidence, private notes, sensitive provenance — is stored as files inside encrypted EFS packages (typically metadata-role packages under the relevant AssetID) and referenced from the catalog by package_id and internal_file_no. The catalog records the existence and category of the encrypted bundle, never its plaintext.

This rule has two consequences. First, the catalog cannot store fields whose plaintext should never be visible to the operator: if a field needs that property, it goes to EFS. Second, the public-web side of the ecosystem — curated editorial editions hosted on any domain, possibly behind micropayment gates — is explicitly separate from the catalog itself. Editions are exported edition-role EFS packages built from catalog state. The catalog remains the live, authoritative, member-facing store; the export side carries the public-web reach.

6.1 Three exposure classes

Class Storage Examples
Public CMS public graph, served without authentication AssetID URI, public title, public creator display name or pseudonym, public rights statement, public IIIF manifest references, claim verification references
Restricted CMS restricted graph, served only to authenticated members with appropriate roles and access conditions appraisal notes, accession notes, donor restrictions, government transfer context, embargo review dates, internal processing status, sensitive provenance
EFS-encrypted reference Catalog stores only a pointer; bytes live in EFS as encrypted package contents personal identity evidence, contracts, donor agreements, private creator notes, confidential donor correspondence, private rights negotiations

Public-immutable evidence — AssetID, ClaimID, package ciphertext digest, anchor timestamp, pseudonymous party identifier — lives on CPR rather than in the catalog graph. Secret material — private keys, plaintext package keys — never enters the CMS or EFS at all; only wrapped licenses from the Licensing Agent circulate.

6.2 The Licensing Agent

Licenses are issued through the Licensing Agent (LA), a Catalog-wide service with its own whitepaper at catalog.org/la/. A license in this ecosystem has three parts: a permission half (RightsExpression, AccessCondition, AccessDecision) that lives in the CMS; a capability half (a wrapped key) that lives in LA; and an acceptance half (a recipient-signed countersignature over the licensing terms) that also lives in LA. The license is a bidirectional handshake: the issuer offers, the recipient countersigns, and only then is the license effective. EFS consults LA on retrieval and refuses to deliver ciphertext unless both signatures are present and verified.

Why a separate service rather than an EFS-internal subsystem: licenses are needed by multiple consumers beyond EFS (CMS EncryptedMaterialReference resolution, Asset Market settlement, edition publication, transfer review, agent flows); they must remain valid across EFS-operator migration; they have a fundamentally different workload profile from storage; and the policy that authorises them lives in the CMS, not in EFS. The Licensing Agent serves all consumers symmetrically and travels with the catalog rather than with any one storage operator.

The Licensing Agent has two components: an LA Client that runs on the issuer or recipient side (in the desktop Catalog client for interactive operations, or as a Catalog.ID persistent-mode agent on a self-hosted hub for offline-publisher cases), and an LA Server that runs as a federated brokerage in the same trust posture as CPR operators. The LA Server is zero-knowledge: it stores wrapped licenses and acceptance signatures but cannot decrypt or fabricate.

A license proves technical capability to decrypt only when both halves are signed. Legal permission still lives in RightsExpression and AccessCondition and is evaluated separately. Public access can be mediated by licenses — for example, a service license held by an institution's public renderer with terms it has accepted — without exposing key material publicly.

6.3 Privacy linting of public metadata

Encryption of files does not protect catalog metadata. The metadata layer must still prevent leakage through titles, notes, relationships, identifiers, timestamps, geolocation, and aggregation patterns. The CMS runs a privacy linter before:

  • public metadata publication;
  • claim submission to CPR;
  • export to any profile;
  • transfer proposal;
  • IIIF or aggregator publication;
  • search-index promotion.

The linter detects personal names in pseudonymous mode, addresses, email addresses, phone numbers, precise geolocation, accidentally exposed private filenames, donor agreement text marked public by mistake, and similar leakage patterns. Issues are surfaced as warnings or hard blocks depending on the profile and the deployment policy.


7. Profile-Driven Standards Compatibility

The CMS uses a single canonical metadata graph internally and exposes validated projections for many archival, governmental, and creator workflows. No external standard becomes the sole internal model: MDTO, NARA metadata, EAD, DCAT, RiC-O, CIDOC-CRM, and others all represent different views of the same reality.

7.1 The Profile Registry

Every standard, jurisdictional, institutional, or workflow profile lives in a versioned registry with a defined lifecycle:

draft → candidate → active → deprecated → retired

National profiles are accepted into candidate when proposed by a recognized national archive, ministry, or institutional consortium with a sponsor of record. They are promoted to active when at least one production deployment validates against the profile and the sponsor signs off. They are demoted on deprecation when the upstream standard is superseded. Institutional or operator-local profiles do not require central review; they live in their own namespace (local.<operator_id>.<profile_id>) and are not eligible for global active status. Versioning follows semantic versioning; mapping rules are tracked per version.

A profile declares its applicable scope (asset, file, package, transfer, discovery, preservation, rights, identity), validation methods (SHACL, JSON Schema, XML Schema, CSV rules, policy engine, human review), serializers (JSON-LD, RDF Turtle, XML, CSV, METS, BagIt, IIIF), required and recommended fields, forbidden fields, privacy rules, and references to mapping rules and test fixtures.

7.2 Conformance claims

Every export or package build produces a signed ConformanceClaim recording the subject (asset, package, transfer, export, or metadata graph), the profile and version, the validator, the result (pass, fail, warning, partial, not_applicable), missing required fields, unmapped fields, privacy warnings, and human-review status. Conformance claims are first-class evidence and survive in the catalog and CPR alongside the artefacts they describe.

7.3 Initial profile families

Family Purpose
cms.core Minimal canonical CMS model.
artist.creator.minimal / artist.creator.transfer-ready Lightweight and transfer-ready profiles for individual creators.
ica.ric-cm / ica.ric-o Archival semantic context and linked-data publication.
oais.core / premis.3 Preservation reference model and event/object metadata.
mets.1 / mets.2 / bagit.1 Structural and transfer packaging.
eark.csip / eark.sip / eark.aip / eark.dip European eArchiving package interoperability.
ape.ead / ape.eac-cpf / ape.eag / ape.mets Archives Portal Europe discovery.
nl.mdto / nl.mdto-sip Dutch government metadata and transfer.
us.nara.uerm / us.nara.metadata-permanent-electronic-records / us.36cfr1236.54 / us.dacs U.S. federal records management and digitization.
dcat-ap.3 / dcat-us.3 European and U.S. public-sector dataset discovery.
iiif.presentation.3 Image and compound-object presentation.
museum.cidoc-crm / museum.linked-art / museum.lido Museum and cultural-heritage publication.
aggregation.edm Europeana cultural-heritage portal aggregation.
harvest.oai-pmh / harvest.resourcesync Aggregation harvest endpoints.
deposit.sword.v2 / deposit.sword.v3 Institutional repository deposit.
provenance.prov-o / provenance.c2pa Provenance and content-credentials emission.
agent.disclosure Public disclosure of agent involvement and AI provider context.
rights.public RightsStatements.org / Creative Commons / ODRL policy publication.
accessibility.eu-us WCAG / EN 301 549 / Section 508 conformance evidence.
security.eu-us ISO 27001 / NIST / FedRAMP / NIS2 control evidence.

8. Government Archive Profiles

8.1 European government profile

The European profile targets interoperability rather than a single national rule set. It exports E-ARK CSIP/SIP/AIP/DIP packages where required; supports METS and PREMIS in package construction; supports Archives Portal Europe profiles for discovery; supports DCAT-AP for dataset-style publication; supports GDPR Article 89 safeguards and data minimization; supports jurisdictional replication and access policy controls; and supports national profile adapters such as MDTO and SEDA.

European conformance reports document E-ARK CSIP and SIP results, Archives Portal Europe EAD and EAC-CPF results, DCAT-AP, GDPR privacy review, accessibility review, and per-country national profile results.

8.2 U.S. federal archive profile

The U.S. profile focuses on NARA compatibility while remaining extensible for state, local, tribal, university, and special-collection archives. It supports NARA Universal ERM lifecycle requirements, the NARA metadata requirements for permanent electronic records, 36 CFR 1236.54 metadata for digitized permanent records, NARA finding-aid and ERA transfer support, records schedules and disposition authorities, DACS for archival description, DCAT-US for dataset publication, FADGI digitization-quality metadata, and access-policy regimes for CUI, FOIA, Privacy Act, classified records (where the deployment is authorized for them), donor restrictions, and copyright.

8.3 Dutch national profile

The Dutch profile is a first-class adapter for MDTO. AssetIDs export as MDTO Informatieobject where object boundaries align; files export as MDTO Bestand; actors, locations, and business activities map to MDTO context objects; aggregating AssetIDs export as MDTO aggregations. Full conformance produces MDTO XML and MDTO-SIP. Minimal conformance shows that mandatory MDTO metadata is captured and machine-exportable. DUTO and Archiving by Design guidance influences user workflows and validation prompts.

8.4 Classification levels and authorized deployments

Some records bear formal classification levels — U.S. CONFIDENTIAL/SECRET/TOP SECRET, EU TRÈS SECRET UE/EU TOP SECRET, national equivalents — that may be stored or processed only on systems lawfully accredited to handle them. The CMS can be deployed inside an accredited environment, but a public or commercial deployment cannot lawfully accept those records.

The catalog therefore carries a deployment-level classification_authority setting that declares what the deployment is accredited to handle (unclassified_only, cui, up_to_secret, etc.). The system rejects ingest of any AssetID whose AccessCondition.regime includes a classification level the deployment is not authorized for, refuses to import EFS packages whose sealed metadata declares such a classification, and provides an "out-of-scope" rejection path that informs the submitter without storing the classified material itself. Classified deployments are separate environments — never a runtime flag on a public deployment.


9. Museum, Library, and Cultural-Heritage Profiles

Museums, national libraries, and cultural-heritage portals require their own ontologies and aggregation models. The CMS supports them as first-class profiles for museum-type AssetIDs, alongside the archival standards above.

9.1 CIDOC-CRM and Linked Art

CIDOC-CRM (ISO 21127:2023) is the reference ontology for cultural-heritage information. Linked Art is its JSON-LD profile, used by major museums for linked-data publication. The CMS projects museum-type Assets (asset_kind: physical_object with object_kind such as painting, sculpture, manuscript, artifact) to CIDOC-CRM E22 Human-Made Object and Linked Art types, with relationships to creators (E21 Person), production events (E12 Production), places (E53 Place), and time-spans (E52 Time-Span).

9.2 LIDO and EDM

LIDO 1.1 is the established XML schema for harvesting museum object metadata. EDM (Europeana Data Model) is the aggregation model used by European cultural-heritage portals. The CMS produces both as profile exports for museum-type Assets and collection-type Assets.

9.3 External authority links and Lemmas

The catalog records external authority links — Wikidata Q-IDs, Getty AAT/TGN/ULAN, VIAF, ISNI, ORCID, LCNAF — as sameAs pointers on parties, places, periods, concepts, and works. Where the catalog needs a curator-controlled, signature-bearing reference record of its own — for entities with no acceptable external authority record, or for curator overlays on existing external records — it maintains internal Lemmas: editable, versioned, signed reference entries.

The boundary rule is:

  • prefer external authority links when the entity has a clear counterpart in Wikidata, Getty, VIAF, or another upstream authority;
  • create a Lemma when no acceptable upstream entry exists, or when a curator-controlled overlay supplements an external record;
  • never duplicate Wikidata or Getty in Lemmas — extend them.

For museum-type Assets, the catalog preferentially links to Wikidata Q-IDs for persons, places, periods, and concepts, and to Getty for art-and-architecture concepts. Lemmas appear when the curator needs versioned, signed, in-catalog control.

9.4 IIIF, ResourceSync, and aggregator support

Museum and library publication relies on IIIF Presentation API 3.0 for compound-object presentation, IIIF Change Discovery and IIIF Content Search 2.0 for federated discovery, ResourceSync for resource synchronization with aggregators, and Schema.org JSON-LD for general-purpose web discoverability. The CMS exposes all of these as standard endpoints.


10. Creator and Artist Profiles

Professional archives need rigorous metadata, but many culturally important archives begin as personal archives. The CMS makes archival-quality metadata possible without making it intimidating.

10.1 Supported creator types

Photographers, filmmakers, musicians and composers, writers, visual artists, designers, architects, software developers, journalists, researchers, independent estates, small organizations, and community archives are all supported as first-class users. The creator profiles are not "simplified archive modes": they are proper professional profiles tuned for individual workflows.

10.2 Creator minimal profile

A creator can register an asset and produce durable evidence with very little metadata: AssetID; asset type; file digest and size; signed registration claim; creator or claimant key. Title, creation date, and rights intention are recommended. An encrypted source EFS package is recommended where preservation matters.

This is enough for a photographer to register a RAW capture from a tethered shoot, or a writer to register a manuscript draft, with durable timestamped evidence and an encrypted preservation copy.

10.3 Creator transfer-ready profile

A creator profile is transfer-ready when:

  • AssetID, asset_kind, and any local_identifiers are present;
  • one or more registered files have format identification (PRONOM PUID where available) and a plaintext digest;
  • at least one signed registration claim exists;
  • a RightsExpression resolves to a known statement (RightsStatements.org, Creative Commons, or unknown_with_reason);
  • an AccessCondition carries a non-default access_status;
  • a creator or claimant party reference is present (Catalog.ID username, even if pseudonymous);
  • a donor_or_deposit_intent value is recorded;
  • sensitive_material_flags are set.

Creator biographical notes, asset scope-and-content notes, provenance notes, and preferred custodian remain recommended rather than required. The validation engine produces a readiness score and a punch list rather than a hard gate; the receiving institution decides whether to accept partial readiness.

10.4 Creator-facing language

The user interface avoids archival jargon during capture but preserves archival meaning internally:

User-facing phrase Internal concept
"What is this?" Asset type, title, scope/content.
"When was it made?" Creation/capture/publication date with certainty.
"Who was involved?" Role assertions and attributions.
"Is it public?" Rights expression plus access condition.
"Can an archive have it?" Transfer intent and custody proposal.
"What should survive?" Preservation representation and EFS package.
"What should stay private?" Encrypted EFS package reference and access condition.
"Who can unlock it?" Licensing Agent license (issued, accepted by recipient).

11. Creator-to-Custodian Transfer

The transfer from an individual archive to an institutional custodian should be simple for creators, but cryptographically and archivally rigorous for custodians. The CMS makes the correct workflow the easiest workflow.

11.1 The transfer workspace

A transfer is represented as a structured workspace, not a ZIP file. The workspace contains an asset manifest with current registration claim IDs and package IDs; claim verification status for each registered claim; encrypted package references with ciphertext digests and availability state; metadata bundle references for the public, restricted, and encrypted layers; rights and access summaries; donor-agreement references where applicable; scoped decryption grants for the receiving institution; selected standards profiles and validation report IDs; and an acceptance state.

11.2 The flow

Creator → Catalog            Create AssetIDs, describe assets, register claims, upload encrypted packages.
Creator → Catalog            Build a transfer proposal naming the receiving institution.
Catalog → Catalog.ID         Share private identity and rights claims with the institution.
Catalog → Licensing Agent    Issue scoped review-licenses (offered to the institution; institution countersigns).
Institution → Catalog        Review metadata, claims, ciphertext fixity, plaintext file metadata, rights, restrictions, sensitive flags, and standards profile readiness.
Institution → Catalog        Accept transfer; the catalog records a signed custody claim through CPR.
Catalog → EFS                Confirm preservation grants and replication policy.
Catalog → Institution        Produce E-ARK / MDTO-SIP / NARA / EAD validation reports as required.

11.3 What acceptance produces

Acceptance produces a signed CustodyClaim recording the transferring and receiving parties, the asset IDs in scope, the custody type (legal_custody, physical_custody, digital_custody, preservation_custody, access_custody, descriptive_custody), the effective date, the agreement reference, the access conditions reference, the decryption grant references, and the receiving archive's profile.

Acceptance does not require public exposure of private donor data. The public catalog can say "held by X archive" while the agreement remains encrypted in EFS or visible only to authorized institutional staff.

11.4 Partial exports under encryption

When a transfer or export workflow runs against an AssetID whose descriptive layer references encrypted EFS packages — donor agreements, identity evidence, sensitive provenance — and the exporter does not hold the necessary decryption grants, the exporter never silently drops fields and never inserts plaintext placeholders. It produces three artefacts: the export bundle in the requested profile populated with whatever it can resolve; an EncryptedMaterialReference manifest listing every reference it could not decrypt with category, package_id, and key_grant_policy_ref; and a signed export-conformance report stating the export is partial-by-design and what was withheld. The receiving custodian can then either request the appropriate grants and re-export, or accept the partial bundle with the manifest as part of its own custody record.


12. EFS Integration and Encrypted-File Metadata

12.1 Package inventory per AssetID

The CMS mirrors EFS package state: every AssetID has a package inventory recording which packages exist under which roles and serials, which serial is current for each role, and what the storage state is (online, standby, nearline, vault-only, last audited). The catalog is the source of truth for catalog-side metadata; EFS remains the source of truth for storage state, currentness, and ciphertext existence.

The inventory supports high cardinality. A professional archive may keep many source, preservation, preview, access, edition, text, metadata, and submission generations for the same AssetID. A creator may keep only one source package and one preview package. The model scales from one package per AssetID to at least 99,999 package-generation records per AssetID where the deployed EFS protocol permits it.

12.2 File-level semantics across encryption

The CMS preserves file-level semantics even when files are encrypted inside EFS packages. A PackageFileEntry records the file UUID and the AssetID it belongs to, the file role and structural role, sequence within the structure, plaintext digest and size, media type and PRONOM identification, derivative lineage (which file or AssetID it is derived from), generation context (process, software, parameters, reproducibility), and visibility class.

This information is recorded before encryption where possible, inside the encrypted package manifest after encryption, or after authorized decryption by a custodian. It survives migration and re-packaging.

12.3 Plaintext and ciphertext fixity

The CMS distinguishes plaintext fixity (verifies intellectual file content) from ciphertext fixity (lets operators audit storage without decryption). Both regimes are recorded:

  • Plaintext digests are computed before encryption and bound into the encrypted package manifest.
  • Ciphertext digests are computed after encryption and recorded publicly enough for replication and fixity audit.

Ciphertext fixity alone does not prove that the decrypted original file remains semantically valid. The encrypted manifest binds plaintext digests so that an authorized decrypting client can verify both layers.

12.4 Public access with encrypted storage

Public access does not require unencrypted storage. The CMS supports several paths:

  • a public decryption grant for a preview or edition package;
  • a service-side grant held by a public access renderer under institutional control;
  • a derived IIIF, image, or video package with separate access grant;
  • downloadable ciphertext plus public key material where open redistribution is intended;
  • an embargoed package with a future release rule.

The metadata records which path is used, which package role and serial it serves, and whether the public view is a source, preview, edition, or derivative representation.


13. Privacy-Preserving Claims and Identity

13.1 Immutable claim hygiene

CPR stores claims forever. The catalog therefore lints every claim before submission to keep the immutable layer free of arbitrary personal data. The linter rejects claims with email addresses, phone numbers, home addresses, sensitive personal categories, accidental geolocation, filenames that reveal private data, and excessive free text. Pseudonymous party identifiers are used wherever possible. Encrypted evidence is stored in EFS, not in the immutable claim body.

13.2 Private identity sharing

Private identity proofs — passport scans, business registrations, signed donor letters, key-recovery instructions — are shared through Catalog.ID's encrypted claim mechanism or through encrypted EFS metadata-role packages. The catalog records the existence of the share (subject, recipient, purpose, expiry, revocability, audit event) but never the content. Catalog.ID's domain validation, revocation, and rotation flows govern lifecycle.

13.3 Attribution strength

The catalog represents attribution as evidence rather than as editorial certainty:

Strength Meaning
unclaimed No attribution claim recorded.
claimant_asserted Asset owner or claimant attributes another party.
self_asserted A party attributes themselves.
mutual Both claimant and party agree (two-sided attribution).
agent_signed Produced by an authorized Catalog.ID agent acting under a named principal; resolvable via the agent's session and principal link.
institution_verified A custodian or authority reviewed the evidence.
legally_determined A court, contract, statute, or formal determination established the attribution.
disputed Conflicting claims exist.
deidentified Former identity mapping has been removed or sealed.

13.4 Erasure and de-identification

When a person exercises an erasure or de-identification right, immutable CPR references may remain as dead pseudonymous identifiers (the cryptographic record cannot lawfully be tampered with), while resolvable personal mappings are removed or sealed according to Catalog.ID's policy. The catalog displays such records as deidentified rather than as live attributions.


14. AI Agents and Agentic Provenance

The CMS's first-class support for AI agents is one of its distinguishing features. It rests on Catalog.ID §3c, which defines a dedicated agent identity type, and on the catalog's own metadata domain for agent sessions and agentic provenance.

14.1 Why a dedicated metadata domain

PREMIS Agents and PROV-O Agents already cover the abstract concept of an actor in a preservation or provenance event. The CMS adds three things on top:

  1. Cryptographic agent identity. Catalog.ID agent identities are cryptographically distinct from their principals while remaining resolvable to them. Every agent action can be traced to an authorized session under a specific principal.
  2. AI provider disclosure. When an agent uses an external AI service to generate content, the provider, model identifier, version, and parameter context are recordable. This is essential for archival authenticity and reproducibility, and complements C2PA Content Credentials.
  3. Authority scope. An agent's signature is only valid within the scope of its session. Catalog metadata preserves enough context that a verifier can determine whether the action was within scope at the time it was taken.

14.2 Agent sessions

Every agent session is configured by the principal and recorded in the catalog. A session carries:

  • the agent's Catalog.ID username and the principal's username;
  • a set of authorized scopes (registration_signing, attribution_signing, custody_signing, file_upload, chat, chat_readonly, org_key_rotation, org_key_flattening, description_authoring, derivative_generation, rights_publication, access_decision, transfer_proposal);
  • an optional asset filter (e.g., a specific collection AssetID or file-type restriction);
  • start and expiry times;
  • the runtime kind (Catalog MCP server, local daemon, remote service);
  • the authentication path (delegation grant in session mode, stored refresh token in persistent mode, scoped API key for remote services);
  • AI provider details where applicable (provider name, model ID, model version, parameter blob references, disclosure visibility).

The runtime — typically a Catalog MCP server running locally on the principal's machine or institutional infrastructure — enforces scope independently of the agent. A compromised or misbehaving agent cannot escalate its own permissions: every request is checked against the session scope before execution.

14.3 Agent-signed claims

Every claim signed by an agent must reference an AgentSession whose scopes cover the claim type. The claim header carries issuer_party_type: agent, the signed_on_behalf_of_party_id (the principal), the agent_session_id, and the specific agent_session_scope under which the signature was produced.

Verifiers reject claims whose session was out of scope at signature time. Public Asset pages display, for each claim, whether the signer was a person, a delegate, or an agent, and (for agents) whether the principal's published CPR signing key chain confirms the agent's authority. Where an agent identity has been revoked, historical claims signed before revocation remain valid but display with a "retired agent" indicator.

Agents may sign claims of any type for which the principal has granted the corresponding scope, including custody claims. There is no built-in "must be human" requirement: if a delegate has configured an agent with custody_signing, that agent's custody claim is valid and equivalent to a delegate-signed claim. The principal remains accountable through the immutable principal link in Catalog.ID.

14.4 Agentic provenance

Any catalog artefact — a description, claim, OCR file, transcript, caption, embedding, classification, summary, restoration, redaction recommendation — that was produced or substantially modified by an agent carries an AgenticProvenance record. The record names the agent, the principal, the session, the generation role (original_authoring, enhancement, derivation, review, translation, redaction_proposal, classification, embedding, restoration, summarization), the human review status, and reproducibility context (method, parameters, seed, deterministic flag).

Public Asset pages disclose, by default, whether a description, transcription, or enhancement was machine-generated and whether it has been human-reviewed. Institutional profiles may forbid the publication of unreviewed AI-generated description for permanent records; the validation engine enforces this where the profile requires it.

14.5 Default-on AI provider disclosure

The CMS's default policy is to disclose AI provider context. Provider, model identifier, and model version are public catalog fields for any AI-derived artefact. The principal or institution may explicitly downgrade specific parameter blobs — proprietary RAG context, fine-tune dataset identity, prompt engineering details — to restricted or to an EFS-encrypted reference, but the bare provider/model/version triple is published unless an explicit exception with a documented reason is recorded.

This protects the catalog's authenticity posture and helps reviewers and aggregators distinguish human-authored from machine-authored content.

14.6 C2PA Content Credentials

Where an institution publishes AI-generated derivative files externally — for example, AI-restored images in a public access package, or AI-generated tiles in an edition — the file carries a C2PA Content Credentials manifest. The manifest is generated alongside the file inside the EFS package and is referenced from the AgenticProvenance record.

C2PA and CPR are complementary, not competing, evidence streams. C2PA is authoritative for what happened to a file's bytes outside the catalog (camera capture, editor, AI generation, format conversion), tied to the file's intrinsic provenance chain. CPR is authoritative for catalog-internal claims (registration, attribution, custody, rights). When the two disagree — a C2PA chain names an editor that is not a Catalog.ID member, or an CPR attribution claim names a creator the C2PA chain does not list — the catalog displays both, marks the disagreement as a provenance_conflict event for human review, and never silently picks a winner. Conflicting evidence is preserved, not flattened.

14.7 Serving unreviewed AI description to the public

By default, AI-generated fields with human_review_status: not_reviewed are visible in the catalog member UI but excluded from the public graph and from OAI-PMH, ResourceSync, IIIF, and Schema.org outputs. Once reviewed (reviewed_unchanged, reviewed_corrected), the field is published and emitted in change feeds with a fresh lastModified so aggregators refresh. Corrections after publication trigger a ResourceSync change-list entry and an Updated record on OAI-PMH. Major retractions emit an OAI-PMH Deleted tombstone or a change="updated" ResourceSync entry with a human-readable note.

Institutions that want pre-review fields published anyway must explicitly opt in per profile, and those fields carry an ai_unreviewed_disclosed marker downstream. This is conservative but defensible: aggregators cache safely, curators retain editorial authority, and the public never sees unverified machine-authored description without disclosure.

14.8 Institutional enrichment workflows

The typical institutional pattern: a delegate creates an agent identity for an enrichment pipeline (label "Description and OCR enrichment", agent_type ai_model), authorizes a session with appropriate scopes and an asset filter, runs the Catalog MCP server on the institution's infrastructure with a stored refresh token in persistent mode, and lets the agent generate OCR, structured description fragments, suggested subject headings linked to Catalog Lemmas or Wikidata Q-IDs, and draft rights inferences.

Each artefact is written with an AgenticProvenance record disclosing model, version, and parameter context. Generated derivative files in the access and edition packages carry C2PA Content Credentials. The catalog flags every agent-produced field with human_review_status: not_reviewed until a curator reviews and either confirms, corrects, or rejects.

The principal can revoke the agent's refresh token, the device record, or the agent identity itself at any time, halting all further enrichment instantly.


15. Rights, Access, and Licenses

The CMS separates five concepts that other systems often conflate:

  • Policy: who is allowed to access, use, preserve, publish, modify, migrate, or redistribute.
  • Capability: who holds a wrapped key (the cryptographic half of a license).
  • Acceptance: whether the recipient has signed acceptance of the licensing terms.
  • Evidence: why the policy, the offer, and the acceptance exist.
  • Decision: who approved access and when.
RightsExpression   → says what may be done
AccessCondition    → says who may access and under what restrictions
License            → issuer-signed offer carrying the wrapped key and a terms_digest
LicenseAcceptance  → recipient-signed countersignature over the same terms_digest
AccessDecision     → records who approved a specific disclosure or grant

A license is effective only when both the License and the LicenseAcceptance records exist, both signatures verify, and neither has been revoked or expired. EFS verifies the bidirectional handshake on retrieval.

15.1 Rights expressions

Use case Recommended expression
Public cultural-heritage object RightsStatements.org URI.
Open licensed creator work Creative Commons license URI.
Public domain assertion CC Public Domain Mark or institutional public-domain statement.
Dataset or API publication DCAT rights and license fields.
Complex permission/prohibition/obligation ODRL policy.
Preservation permissions PREMIS Rights.
Government restriction AccessCondition with jurisdictional regime.
Donor restriction AccessCondition plus encrypted or restricted agreement evidence.

15.2 Access regimes

The policy engine supports multiple regimes simultaneously: GDPR Article 89 archiving in the public interest, donor agreements, FOIA, Privacy Act, CUI, classified records (where authorized), court orders, child-protection, cultural sensitivity, institutional policy, and others. An AccessCondition carries the access status (public, restricted, embargoed, closed, review_required, classified, unknown), the regime, the jurisdiction, the basis reference, applicable dates, the release decision, dissemination controls, redaction requirement, and a license policy reference.

15.3 Licenses

A license is the bidirectional record produced through the Licensing Agent (§6.2). The issuer-signed License carries the wrapped key, the recipient identity, the scope (read, replicate, review, publish, migrate, preserve), the licensing terms (referenced by URI and bound by terms_digest), and signed authority context. The recipient-signed LicenseAcceptance countersigns the same terms_digest. Together they form an effective license; either alone is incomplete.

License lifecycle states are offered → accepted → effective for the happy path, with declined, withdrawn (issuer retracts before acceptance), withdrawn_by_recipient (recipient releases after acceptance), revoked, expired, and superseded as terminal or branch states.

Licenses prove technical capability to decrypt only when fully accepted. Legal permission lives in RightsExpression and AccessCondition. Public access is still mediated through licenses — for example, a service license held by an institutional public renderer with terms it has accepted — even when the underlying material is intended for unrestricted reading. See the Licensing Agent Whitepaper (catalog.org/la/) for the full record formats, federation behaviour, and protocol.


16. Preservation Events and Evidence

16.1 PREMIS-compatible event model

Every preservation-relevant action — ingest, validation, fixity check, replication, repair, migration, characterization, virus scan, encryption, decryption, key rewrap, package seal, package retrieval, access render, export, deletion, disposition, transfer, audit — produces a PreservationEvent record carrying event type, subject, timestamp, outcome, agent identifiers, software identification, and references to evidence reports and related claims.

Events are PREMIS-compatible by construction. They can also be projected to PROV-O for linked-data provenance publication.

16.2 Format identification

Format identification follows PRONOM PUIDs where possible, identified by DROID, Siegfried, FIDO, or a local tool. The catalog records the identification tool, version, signature-file version, identification date, preservation risk level, preferred preservation format, and migration recommendation. Plaintext characterization happens before encryption or after authorized decryption; ciphertext fixity is independent.

16.3 Digitization quality

For digitized physical materials, the catalog records a DigitizationQualityProfile referencing FADGI, Metamorfoze, ISO 19264, or local quality standards, with target quality level, equipment profile, calibration target, measurement tool and result, operator, capture date, quality review status, and any deviations or remediation events.

16.4 Two fixity regimes

Regime Scope Purpose
Plaintext fixity Original files before encryption; files after authorized decryption Verifies intellectual file content and file-level preservation.
Ciphertext fixity EFS packages and parts Lets operators and mirrors audit storage without decryption.

Both regimes matter. Ciphertext fixity alone does not prove decrypted content remains semantically valid; the encrypted manifest binds plaintext digests so an authorized client can verify both.


17. Discovery and the Public Web

17.1 The canonical Asset URI

The canonical public URI for an Asset is:

https://catalog.org/{asset_id}

The page exposes the public title, public description, public preview if authorized, public rights label, public claims and verification evidence, public attributions, package availability state where appropriate, institutional custody information, related assets and collections, and export links subject to profile and access policy.

Per-operator alternates resolve at {operatorid}.catalog.org/{asset_id} and may 302-redirect to the in-jurisdiction operator. Content negotiation supports text/html, application/ld+json (JSON-LD), text/turtle (RDF Turtle), application/rdf+xml, and application/json. Historical views are available through ?at=<ISO 8601 timestamp> and ?commit=<n> query parameters.

17.2 Linked data and Schema.org

The CMS supports JSON-LD and RDF/Turtle exports of the canonical graph for every public Asset. URIs follow predictable patterns:

https://catalog.org/{asset_id}
https://catalog.org/{asset_id}/claims/{claim_id}
https://catalog.org/{asset_id}/representations/{representation_id}
https://catalog.org/{asset_id}/packages/{package_id}
https://catalog.org/party/{party_id}
https://catalog.org/profile/{profile_id}

Public Asset pages also embed Schema.org JSON-LD (CreativeWork, Photograph, Book, ArchiveComponent, Museum, Person, Organization) so general-purpose web crawlers and search engines can index the catalog without parsing IIIF or RDF/XML.

17.3 IIIF

IIIF Presentation API 3.0 manifests are generated from representation-set and package metadata. IIIF Change Discovery and IIIF Content Search 2.0 expose federated discovery and full-text search where the catalog has authoritative text packages. IIIF manifests do not expose restricted images. Public IIIF for encrypted assets is backed by authorized rendering through public access packages or service-side grants. OCR and transcription link through ALTO, Web Annotation, or IIIF annotations as appropriate.

17.4 OAI-PMH and Dublin Core

OAI-PMH serves Dublin Core as a low-friction harvesting layer for legacy aggregators. It is not the canonical archival description: it is an interoperability layer for harvesters that have not yet migrated to ResourceSync.

17.5 DCAT

DCAT-AP and DCAT-US apply when the Asset or Collection is dataset-like, API-like, data-service-like, or public-sector data-catalog relevant. The catalog projects to dcat:Dataset, dcat:DataService, dcat:Distribution, dcat:DatasetSeries, and dcat:Catalog with dcterms:license, dcterms:rights, and dcterms:accessRights as appropriate.


18. Aggregation, Harvest, and Deposit

Seamless integration into the existing archival network depends on standardized harvest, deposit, and synchronization protocols. The CMS speaks them.

18.1 ResourceSync

ResourceSync (ANSI/NISO Z39.99-2017) is the modern web-scale resource synchronization protocol. The catalog publishes a capability list, resource lists, and change lists, allowing downstream aggregators (Europeana, Archives Portal Europe, DPLA, university aggregators, national libraries) to synchronize without long polling. ResourceSync complements OAI-PMH and is the preferred harvest path where the aggregator supports it.

18.2 Signposting

Every public Asset page emits typed Link headers (describedby, cite-as, item, author, license, type) following the Signposting profile. This makes machine-readable discovery low-friction for scholarly tooling that does not parse RDF.

18.3 SWORD deposit

The catalog supports SWORD v2 (Atom-based) and SWORD v3 (JSON-LD-native) deposit endpoints so external repositories — Hyrax, Islandora, DSpace, EPrints, Zenodo, ResourceSpace — can deposit content into the catalog without bespoke integration. SWORD deposits arrive as submission-role packages and pass through the standard transfer workflow before promotion to permanent custody.

18.4 Persistent identifier services

AssetIDs are the canonical identifier, but the catalog also supports ARK and Handle resolution for institutions that already maintain those services. DOIs through DataCite are available where the institution has a DataCite contract. External persistent-identifier services resolve to the same underlying Asset; AssetID remains immutable and is never replaced by them.

18.5 The export and edition side

A clear architectural separation runs between the CMS — the live, authoritative, member-facing content-management surface — and editions, exported edition-role EFS packages built from catalog state and hosted on any domain. The CMS is where description is live, search is current, and curators collaborate. Editions are the public-web, possibly micropayment-gated, possibly subscription-gated, possibly free-to-read versions of catalog material.

The CMS produces editions; it does not host them. Open APIs and interoperability at the live, authoritative layer remain in the CMS. Editions inherit the public CMS metadata at the time of export and carry edition-specific metadata for distribution.


19. API Surface

The CMS exposes RESTful APIs for assets, files, representations, packages, claims, rights, access conditions, transfers, profiles, exports, agent sessions, agentic provenance, verification, and aggregation. Representative endpoints:

GET    /assets/{assetId}
POST   /assets
PATCH  /assets/{assetId}/metadata
GET    /assets/{assetId}/claims
POST   /assets/{assetId}/claims/registration
POST   /assets/{assetId}/claims/attribution
POST   /assets/{assetId}/claims/custody
GET    /assets/{assetId}/files
POST   /assets/{assetId}/files/characterize
GET    /assets/{assetId}/representations
GET    /assets/{assetId}/packages
POST   /assets/{assetId}/packages/efs
GET    /assets/{assetId}/rights
PATCH  /assets/{assetId}/rights
GET    /assets/{assetId}/access-conditions

POST   /transfers
GET    /transfers/{transferId}
POST   /transfers/{transferId}/grant-review-access
POST   /transfers/{transferId}/accept
POST   /transfers/{transferId}/reject
GET    /transfers/{transferId}/exports/{profileId}

GET    /profiles
POST   /profiles/{profileId}/validate
GET    /assets/{assetId}/exports/ric-o
GET    /assets/{assetId}/exports/ead3
GET    /assets/{assetId}/exports/eac-cpf
GET    /assets/{assetId}/exports/mdto
GET    /assets/{assetId}/exports/linked-art
GET    /assets/{assetId}/exports/lido
GET    /assets/{assetId}/exports/edm
GET    /assets/{assetId}/exports/schema-org
GET    /assets/{assetId}/exports/prov-o
GET    /assets/{assetId}/c2pa-manifest
GET    /transfers/{transferId}/exports/mdto-sip
GET    /transfers/{transferId}/exports/eark-sip
GET    /collections/{collectionId}/exports/dcat-ap
GET    /iiif/{assetId}/manifest

GET    /verify/asset/{assetId}
GET    /verify/claim/{claimId}
GET    /verify/package/{packageId}
GET    /verify/file-digest/{digest}
GET    /verify/conformance/{conformanceId}

GET    /parties/{partyId}/agent-sessions
GET    /agent-sessions/{sessionId}
POST   /agent-sessions/{sessionId}/end
GET    /assets/{assetId}/agent-provenance
POST   /assets/{assetId}/agent-provenance
PATCH  /assets/{assetId}/agent-provenance/{provenanceId}/review

GET    /oai-pmh
GET    /resource-sync/capabilitylist.xml
GET    /resource-sync/changelist.xml
POST   /sword/v3/service-document
POST   /sword/v3/deposit
GET    /signposting/{assetId}

Verification works without exposing private metadata or plaintext files. Mutating endpoints reject any agent action whose session scope does not cover the requested operation. Aggregation, harvest, and deposit endpoints respect access conditions: a harvester sees only what its access scope allows.


20. Cryptography

The CMS adopts the post-quantum primitives chosen by Catalog.ID and CPR.

20.1 V1 algorithms

  • Signatures: ML-DSA-65 (FIPS 204).
  • Key encapsulation: ML-KEM-1024 (FIPS 203).
  • Symmetric authenticated encryption: AES-256-GCM (FIPS 197 + NIST SP 800-38D).
  • Hashing: SHA-256 and SHA-512 (FIPS 180-4).
  • Conservative archival signatures (where required): SLH-DSA-128s or similar (FIPS 205).

20.2 Algorithm agility

Algorithm agility is implemented through:

  1. Explicit algorithm identifiers in every claim header, package header, manifest entry, and grant record. There are no implicit defaults.
  2. Multi-signature support, so a claim can carry both a primary algorithm and a transitional or backup algorithm during a migration window.
  3. Preservation events for key rotation and rewrap, recorded as key_rewrap and migration events.
  4. A deprecation register inside the Profile Registry that lists retired algorithms and the conformance impact of holding claims signed under them.
  5. Historical claims under retired algorithms are never invalidated. They remain valid evidence of past state, with deprecation noted in display. The point of long-horizon evidence is that it survives algorithm churn.

XRPL anchoring caveat: the timestamping anchor uses XRPL transaction signing schemes (Ed25519 / secp256k1), which are not post-quantum. The post-quantum trust root remains the CPR claim signature itself; the XRPL transaction is an additional public timestamp anchor, not the security primitive.


21. Validation Strategy

Validation runs at well-defined points in the workflow rather than continuously.

21.1 Validators

Validator Used for
SHACL Canonical graph, RiC-O, SKOS, DCAT, policy shapes.
JSON Schema API payloads, profile configuration, metadata fragments.
XML Schema MDTO XML, EAD3, EAC-CPF, METS, ALTO, SEDA.
CSV rules NARA transfer metadata CSV and tabular exports.
Cryptographic validator Signatures, claim chains, anchors, ciphertext digests, plaintext digests.
Package validator BagIt, E-ARK, METS, MDTO-SIP.
Privacy linter Immutable claim hygiene and public metadata leakage.
Human review Legal restrictions, donor terms, sensitive material, classification, appraisal.

21.2 Profile readiness

The catalog shows users a profile-readiness score. For example, a NARA permanent-electronic-records readiness check might report a 0.72 score, list missing required fields (disposition_authority, transfer_request_number), warn about probable-but-not-exact format identification, and prescribe next actions (add_records_schedule_item, run_nara_csv_export_validator).

Readiness is a punch list, not a hard gate. The receiving institution decides whether to accept partial readiness.


22. Risks and Mitigations

22.1 AssetID boundary ambiguity

Risk: users may assign one AssetID to too much (e.g., an entire collection in one record) or too little (e.g., one AssetID per page scan). Mitigation: object-boundary guidance by asset type, splitting and merging through relationships rather than by changing historical AssetIDs, collection AssetIDs for aggregations, file-level verification without file-level identity replacing AssetID.

22.2 Metadata privacy leakage

Risk: even encrypted files can leak information through public metadata — titles, notes, relationships, identifiers, timestamps, geolocation, aggregation patterns. Mitigation: explicit exposure classes, mandatory privacy linting before publication and claim submission, EFS-encrypted references for sensitive material, public/restricted search separation, no arbitrary free text in immutable claims.

22.3 Key loss

Risk: encrypted files survive but keys do not. Mitigation: creator key-backup UX, organization delegate model, institutional key grants, optional split-key and succession workflows, paper and offline key-custody guidance, preservation events for key rotation and rewrapping.

22.4 Institutional acceptance without legal clarity

Risk: transfer metadata implies rights or custody not actually granted. Mitigation: separate custody, ownership, storage, description, access, and decryption capability into independent fields; require agreement references; force human review for donor or legal restrictions; record signed custody claims with explicit scope.

22.5 Standards drift

Risk: MDTO, E-ARK, NARA, DCAT, RiC-O, CIDOC-CRM, or other upstream standards evolve. Mitigation: versioned Profile Registry, mapping version history, validation report versioning, profile deprecation and migration tools, export reports that state exact profile versions.

22.6 Cross-jurisdiction replication

Risk: encrypted packages may be replicated into jurisdictions where policy disallows storage. Mitigation: jurisdiction-aware placement policies, metadata-level replication constraints, package-level storage policy, operator jurisdiction metadata, transfer-agreement terms linked to EFS placement.

22.7 Overpromising certification

Risk: the platform claims compatibility with NARA, national archives, ISO 16363, or FedRAMP without operational certification. Mitigation: use "profile-ready" and "export-compatible" language; maintain conformance evidence; separate technical validation from institutional acceptance; support audit evidence but do not claim certification automatically.

22.8 Agent compromise and AI authenticity

Risk: an authorized agent identity is misused (compromised refresh token, principal-side malware, prompt-injected agent output) and produces or signs content that pollutes the catalog. Or: AI-generated description is published as if it were human-authored. Mitigation: scope enforcement at the runtime not at the agent; mandatory AgentSession references on every agent-signed claim; principal-side kill switches (refresh token revocation, device record deletion) prominent in the UI; AgenticProvenance mandatory on AI-derived artefacts; C2PA Content Credentials on AI-generated derivatives published externally; mandatory human review before promoting AI-generated description to permanent-record profiles; preservation events for suspicious agent activity; "AI-suggested" never silently collapsed into "AI-confirmed".

22.9 Aggregator drift and harvest divergence

Risk: external aggregators cache stale views of the catalog, exposing description that has since been corrected, restricted, or revoked. Mitigation: ResourceSync change lists alongside OAI-PMH; record-level revision identifiers and lastModified timestamps in every harvest manifest; tombstone records for revoked or sealed assets; coordination with aggregator policies on update cadence; explicit distinction between "public-immutable" (CPR claims) and "public-mutable" (description) so aggregators understand what they may cache long-term.


23. Status, Roadmap, and Open Questions

23.1 Status

This whitepaper describes the CMS as an architectural target. Implementation proceeds in phases against this design.

23.2 Roadmap

Phase Deliverable
1 Canonical metadata foundation: entities, AssetID URI strategy, public/restricted/EFS-reference separation, CPR / EFS / Catalog.ID / Licensing Agent adapters, privacy linter, JSON-LD canonical serialization.
2 Creator workflows: creator minimal and transfer-ready profiles, offline AssetID assignment, EFS encrypted package upload, public Asset page, rights wizard, collection grouping.
3 Preservation foundation: PREMIS-compatible event model, format characterization pipeline, plaintext/ciphertext fixity separation, package audit events, preservation-risk dashboard, BagIt and METS exports.
4 Institutional transfer: transfer workspace, private identity sharing, scoped decryption grants, institutional review dashboard, custody claim creation, transfer validation report.
5 European archive profiles: E-ARK CSIP/SIP package builder, Archives Portal Europe profiles, DCAT-AP, GDPR Article 89 review template, MDTO and MDTO-SIP, SEDA adapter.
6 U.S. archive profiles: NARA UERM, NARA permanent-electronic-records metadata, 36 CFR 1236.54 digitization metadata, NARA finding-aid support, DACS, DCAT-US, FADGI, CUI/FOIA/Privacy Act access templates.
7 Discovery and public access: IIIF Presentation 3.0, OAI-PMH Dublin Core, RiC-O RDF, public verification endpoints, public/restricted search separation, accessibility evidence.
8 Trust and certification readiness: ISO 16363 evidence, CoreTrustSeal evidence, NDSA Levels self-assessment, ISO 27001 / NIST / FedRAMP / NIS2 evidence hooks, audit-ready operational dashboards.
9 Agent and AI provenance: Catalog.ID agent token acceptance on every endpoint, Agent / AgentSession / AgenticProvenance entities, Catalog MCP server reference implementation, C2PA emission, PROV-O export, public AI-disclosure UI, agent-aware privacy linter, profile rules forbidding unreviewed AI in permanent-record exports.
10 Museum and aggregation interoperability: CIDOC-CRM / Linked Art profile, LIDO, EDM, Schema.org embedding, ResourceSync endpoints, SWORD v2/v3, Wikidata Q-ID integration, Getty AAT/TGN/ULAN integration.

23.3 Open questions

The following items remain open and welcome external input:

  • The exact thresholds at which institutional profiles forbid pre-review AI-generated description — uniform across NARA, MDTO, and E-ARK, or per-profile?
  • Whether ResourceSync change-list cadence should be operator-configurable or globally fixed for predictability.
  • Whether agents should be allowed to originate attribution claims about external human parties, or only to countersign them after a human originator has signed first.
  • The right governance forum for adding national archive profiles beyond the Netherlands, France, EU-wide E-ARK, and U.S. NARA.
  • Whether Lemmas should be exportable to Wikidata as upstream contributions, and on what governance basis.

24. References

Catalog ecosystem

Archival description and linked data

Preservation, packages, file formats

Museum and cultural-heritage standards

Aggregation, harvesting, and deposit

European and Dutch government archive profiles

U.S. government archive profiles

AI agents and content authenticity

Rights and identifiers

Security, accessibility, and trust

Source design document

The CMS Design document (CMS Design.md in the project repository) carries the full canonical schemas, illustrative examples, every YAML structure referenced in this whitepaper, and the resolved-decisions log. This whitepaper summarizes; the design document is normative for implementers.