Encrypted Filestorage
Encrypted File Storage and Retrieval for the Catalog Ecosystem
Working title: Encrypted Filestorage
Working URL: catalog.org/efs/
Author: Roberto Bourgonjen
Last updated: 2026-05-26
1. Introduction
Encrypted Filestorage (EFS) is the storage service of the Catalog ecosystem. Where the other Catalog modules describe what assets mean, who they belong to, and how they are exchanged, EFS preserves and serves the actual encrypted file bytes.
The Catalog ecosystem comprises:
- Catalog.ID: pseudonymous identity for participants and signed actions.
- CPR: durable signed claims about digital assets, AssetID licensing, and the bindings of AssetID licences to owner keys.
- Catalog Management System (CMS): collaborative workspace for descriptive asset metadata — description, relationships, articles, editions, and active licensing state.
- Asset Market: bilateral sales agreements through which licenses, deliverables, and decryption grants are exchanged.
- Bitcash: prepaid micropayment and metering layer for service consumption.
- Encrypted Filestorage: preservation, replication, retrieval, and key management for encrypted files.
EFS is intentionally narrow. It does not mint AssetIDs, describe assets, broker sales, or define what content is. It accepts encrypted files that an authorized party has registered against an existing AssetID, replicates them across operators and jurisdictions, serves them upon authorized retrieval, and meters its services through Bitcash.
EFS is positioned as long-term preservation infrastructure rather than as a working file share. The expected use case is a publisher (an artist, photographer, musician, writer, archivist, or institution) depositing encrypted copies of their work for preservation and distribution, with the publisher's primary working copy remaining on their own machine. Catalog positions EFS explicitly as a candidate standard for the long-term preservation of digital assets, including assets that should outlive their authors. Section 2 develops the preservation problem this is intended to address; section 3 describes the approach EFS takes to it.
EFS requires an active Catalog.ID account for both storage and retrieval. Every publisher writing to EFS, and every retriever reading from it, is a Catalog.ID identity (personal, organization, delegate, or agent) in good standing at the moment of the operation. The credentials EFS requires for a write are therefore three: an active Catalog.ID account, an AssetID registered with CPR, and a funded Bitcash wallet. A retrieval requires an active Catalog.ID account, a funded Bitcash wallet, and an effective wrapped-key record at EFS addressed to one of the account's efs_encrypt keys.
Tying retrieval to an active Catalog.ID account is a security choice as well as an identity choice. EFS holds the wrapped-key records itself (§12) — each package key sealed under the recipient's Catalog.ID efs_encrypt hybrid public keys — and rechecks the recipient's Catalog.ID standing on every retrieval. If the recipient's account is terminated, locked down, or its efs_encrypt key revoked through key rotation (Catalog.ID §3.3), further EFS retrievals against any wrapped-key record addressed to that key stop immediately. Wrapped keys already in the attacker's hands at the moment of compromise remain cryptographically decryptable, but the verification gate prevents further ciphertext from being delivered against the revoked key. This bounds the damage of a key compromise to material the attacker had already retrieved at the time of revocation.
2. Problem Statement
EFS is intended for the long-term preservation of digital assets. Long term here means decades to centuries: a horizon over which the author may no longer be alive, the publishing organisation may no longer exist, and the technology landscape will certainly have changed several times. Catalog positions itself explicitly as a new standard for this kind of preservation. The remainder of this section examines why long-term preservation of digital data is genuinely hard, and why the conventional answers are not adequate to the horizon Catalog targets.
2.1 No durable medium
Unlike paper, whose printed contents can be preserved for hundreds of years under reasonable conditions, digital data depends on a continuously evolving stack of media, formats, drives, and software. Every layer of this stack is subject to obsolescence on a timescale much shorter than the preservation horizon.
The most durable optical medium currently available is the 100 GB M-DISC Blu-ray. NIST's digital evidence preservation guidance lists M-DISC as acceptable for archival use, with a manufacturer-claimed longevity in the range of one hundred years. Two structural problems still apply.
First, the medium is not the limiting factor: the reader is. The optical-disc ecosystem has been contracting for years. Sony ended production of Blu-ray Disc media in February 2025 with no successor models. Drive manufacture and recordable-media supply now depend on a small number of vendors. Whether reading hardware will still be manufactured, supported, and serviceable in fifty or one hundred years is unknown, and the trend points the wrong way.
Second, 100 GB is too small for serious archival use. A professional archive of image and video material commonly runs to tens of terabytes. A 30 TB archive needs three hundred discs, which makes routine deposit and retrieval impractical without a robotic library, and three hundred discs is one publisher's body of work, not the volume an archival service must handle.
2.2 Tape: long media life, short equipment availability
Linear Tape-Open (LTO) is the industry's dominant cold-archive medium. Manufacturer-rated archival life of modern LTO cartridges is around thirty years under controlled conditions, and tape stored unpowered consumes no energy.
The medium-life figure understates the actual constraint, which is again the equipment. Each LTO generation has a finite production and support window. The LTO consortium's read-back specification has historically extended two generations back, so a cartridge of any given generation has a practical read window bounded not by the medium but by drive supply for itself and for the two generations that follow it. LTO-4, released in 2007, has no current production in 2026; a contemporary LTO-4 deployment depends entirely on the refurbished-drive market. The constraint is also tightening: LTO-10, released in 2023, broke with the historical pattern by reading only LTO-10 cartridges, which means the practical read window for current generations is narrower than for earlier ones. A tape committed to the vault today will outlive several generations of drive availability over a century-scale horizon, on a curve that has recently bent against the publisher.
Cost evolution is also non-monotonic. The capacity ratio between LTO-10 (30 TB native) and LTO-9 (18 TB native) is a factor of about 1.7, but new-tape pricing in 2026 is roughly 250 EUR for an HPE LTO-10 cartridge against around 78 EUR for an HPE LTO-9 cartridge. The two generations are not backward compatible, so adopting LTO-10 means buying both new media and a new drive. Patent disputes have in the past blocked manufacture of certain LTO generations for years, and similar disputes can recur, affecting either tapes or drives.
In summary, tape has the lowest cost per gigabyte for cold storage, the longest unattended lifetime of magnetic media, and effectively zero idle energy. But over a horizon aimed at eternity the total cost is substantial, the cost trajectory is unpredictable, and the supply chain may face disruptions that cannot be foreseen.
For routine retrieval, tape is unsuitable on its own because of mount and seek latency. Tape is part of an answer, not the whole answer.
2.3 Spinning disk and SSD: short life, energy-bound
The next candidate is online disk: spinning hard drives or solid-state drives. These offer immediate retrieval and conventional service models, but for preservation they introduce two problems that grow worse the longer the horizon.
Energy. Powered drives consume electricity continuously. Energy prices in much of Europe have risen sharply in recent years, the supply mix is shifting, and the continuity of cheap and abundant electricity is itself an open question for the coming decades. The cost of keeping a multi-petabyte fleet powered 24x7 over a century is not just high, it is unknown, and committing to that profile in a fixed-fee preservation service is a commitment to costs that cannot be predicted.
Drive availability. The number of independent hard-drive manufacturers has shrunk to three. Manufacturing depends on increasingly complex processes and tightly-held patents. As of 2026, retail channels in the Netherlands restrict drive purchases per customer (one drive per order at Azerty, four at Alternate), and orders that exceed the limit are cancelled. Replacing a sixteen-drive RAID set has become an exercise in working around purchase quotas. SSDs face similar concentration in NAND fabrication. A long-term commitment to disk-based preservation is a commitment to a supply chain that is narrowing, not broadening.
2.4 Cost predictability
A service that promises preservation without recurring fees is a service that must price in all future costs at acceptance time. With tape, that means future migration and equipment cycles. With disk and SSD, that means future energy. Both involve forecasts of decades to a century, against media, equipment, energy, and supply-chain trajectories that have shown they can move in surprising directions.
Even with an absurd price tag, a service that offered a hard guarantee of free 24x7 online availability indefinitely would not in fact be able to keep the promise across a deep recession in supply, a sustained energy crisis, or a regional disruption. The honest design choice is not to make a promise that cannot be kept.
2.5 Distribution versus control
Even with perfect storage media, long-term preservation faces a logical problem that traditional publishing systems compound rather than solve. Survival is a numbers game. The probability that a work survives a century is dominated by how many copies of it exist and how widely they are distributed. A 1750 book printed in 10,000 copies, scattered across libraries, private collections, and used-book shelves, is almost certain to have descendants alive today. A 1750 book printed in 100 copies, kept in one workshop, faces overwhelming odds of being lost: fire, flood, disinterest, a single bad actor in the chain of custody, a war that incinerates the city. Archivists and book historians can read survival rates off issue counts directly. Few copies and narrow custody mean extinction risk; many copies and wide custody mean survival.
Publishing systems that aim to control access to their material are forced to be strict in three directions, all of which fight redundancy:
- Access authorisation. Only paying or licensed readers should reach the content, so wide redistribution undermines the access regime: every copy in unauthorised hands is a leak.
- Copyright. Many works cannot lawfully be redistributed in digital form, so a custodian who forwards bytes to a third party may be infringing on the publisher's behalf.
- Provenance. Readers and licensees need confidence that the bytes they receive really originate from the claimed publisher and have not been altered or substituted, and loose distribution creates substituted-in-transit risk.
To honour these three, traditional systems narrow their custody: a small number of trusted machines, a small number of trusted operators, no redistribution rights, no third-party caching. The unintended consequence is direct and severe. The protections built to keep the work safe make it more likely to be lost. A small custody surface is structurally more vulnerable to single-point catastrophe than a wide one, and the very mechanisms that enforce 1–3 increase preservation risk. Honouring access control and honouring durability pull in opposite directions, and a long-horizon preservation system that does not resolve this conflict will eventually fail at one or the other.
2.6 Summary
Long-term digital preservation must be designed against the realities of:
- no durable digital medium with a hundred-year reader story;
- tape that is cheap to hold but expensive and risky to migrate forever;
- disk that is fast to retrieve but energy-bound and supply-bound;
- a cost trajectory that cannot be predicted over the relevant horizon;
- a structural conflict between wide distribution (which preservation requires) and access control (which publishers require), where traditional systems have resolved against distribution and therefore against survival.
A serious preservation service has to combine media, has to plan for migration, has to decouple the unconditional preservation commitment from the convenience commitment, has to price its service so that it can survive cost shocks without breaking faith with the publishers who paid into it, and has to resolve the distribution-control conflict structurally rather than by choosing one side. Section 3 describes the approach EFS takes.
3. Catalog's Approach
EFS responds to the preservation problem with seven connected design choices.
1. Encryption at the format layer, with access control alongside storage. A package accepted by EFS is sealed once at ingestion using authenticated symmetric encryption. The ciphertext is the unit of storage and transport throughout the system. Storage operators, peer mirrors, and self-hosted hubs hold and serve ciphertext only; none ever see plaintext or hold keys to recover it. Authorisation lives in wrapped-key records that EFS stores alongside the ciphertext (§12): the package key is sealed once under the publisher's own efs_encrypt keys at ingestion, and additionally sealed under any recipient's efs_encrypt keys whenever the publisher shares the package. EFS verifies the wrapped-key record on every retrieval and refuses delivery if none exists for the requesting party. The policy side of access — what a recipient may do, under what restrictions, on whose decision — lives in the CMS (RightsExpression, AccessCondition, AccessDecision) and is independent of EFS's cryptographic gate.
This separation is what allows EFS to chase wide redundancy without breaking the publisher's grip on access. Encrypted bytes carry no information to anyone who lacks the decryption key, so the publisher and the legal regime around the work have no preservation-driven reason to constrain where the ciphertext lives. Federation operators across jurisdictions, self-hosted hubs at the user level (§17), peer mirrors, and archive caches can all hold copies of the same package without breaching the constraints of §2.5. Wide distribution is now compatible with strict control rather than at war with it.
Encryption also delivers cryptographic provenance. Every package is signed at ingestion with the publisher's signing key and timestamped against the ingestion event the federation records. A reader can independently verify that the bytes they hold are the bytes the publisher signed, and that they were sealed no later than the recorded timestamp. Substitution and forgery are detected by signature failure rather than by trust in the storage operator.
The encryption is end-to-end at the user layer, not the application layer. The decryption key is held by the recipient outside the storage system, and EFS never sees it. This distinction matters. Most consumer "end-to-end" services, WhatsApp included, generate and manage keys inside the application itself; the user is never asked to write down the key, and the app holds it. The practical result is an app-to-app channel, secure against external eavesdroppers but defeated by anyone who can access the app's key store on either device, including the app vendor under legal compulsion. EFS asks the user to hold the key, and the protections rest on the user holding it.
A decryption key is also small enough to fit on a piece of paper, and paper, unlike digital media, costs nothing to preserve and survives for centuries under ordinary conditions. A user can print the key as a hex string or QR code on paper, or write it down by hand, then place it in a safe or distribute it as secret shares across trusted parties. Modern inks are highly durable, so the resulting paper inherits the survival horizon of the analog medium for the small artefact that gates access. The expensive durability problem set out in §2 applies to the encrypted bytes, which are large and need active preservation; the key that decrypts them sidesteps that problem entirely, on a medium that has already proven itself across centuries.
The rule is always encrypt, not "encrypt confidential material and skip encryption for public content". Even archives that hold predominantly public material carry caveats: copyright wrappers around third-party items, embargo periods, donor access conditions, redactions for living individuals, jurisdictional clearance differences. An architectural rule that encrypts everything absorbs the entire class of edge cases without forcing the archive to maintain two pipelines. The verifiability benefit alone justifies the small overhead, particularly in an era of trivially generated forgeries: an institution that publishes through EFS gives its readers an instrument they can independently use to verify what was actually published, and when. That assurance is at least as valuable to public archives as it is to commercial ones, and arguably more so, since public archives are exactly the targets attackers most want to forge.
2. Multiple storage states, with offline tape as the unconditional floor. A package accepted by EFS is held on multiple physical copies across multiple media. Online disk is the primary serving copy. Tape inside an automated library serves as an online recovery copy. Offline tape, ejected from the library and stored off-site or in a data-safe vault, is the unconditional preservation copy: physically unreachable from the operator's network, immune to ransomware, credential compromise, malicious or mistaken commands, and replication-cascade errors. Section 14 develops the storage states; section 16 develops the redundancy floors.
3. Separately stated convenience and preservation commitments. Online retrieval is a convenience commitment, offered under normal operating conditions. Preservation on offline tape is an unconditional commitment, kept regardless. When sustained external pressure (energy cost spikes, supply disruption, regional crisis) threatens an operator's ability to keep online infrastructure running, the operator may scale back the convenience commitment, retire online copies into lower-energy states, or in the extreme suspend online availability for the duration of the pressure event, while continuing to honour the offline-tape preservation commitment. The publisher's reasonable expectation is therefore not "always online", but "always preserved, online under normal conditions". Section 15 develops the lifecycle and pressure responses.
4. Pay once for ingestion, pay per use for retrieval. A single ingestion fee funds preservation indefinitely. Retrieval is metered separately. There is no subscription, no renewal, no expiry. The lifecycle is governed by two fixed idle thresholds: a package that has not been retrieved for five years becomes eligible to have its disk-side copies moved to Standby (low-power disk) at the next storage-unit migration, and a package that has not been retrieved for ten years becomes eligible to have its disk-side copies released entirely at the next storage-unit migration, leaving it on tape only. A retrieval rehydrates the package back to full availability and resets the idle clock. Sections 15 and 18 develop the lifecycle and pricing.
5. Best-effort online, guaranteed preserved. Online availability is best-effort: the operator commits to the convenience tier under normal conditions, with named pressure responses for the degraded conditions in which it cannot. The preservation commitment, by contrast, is unconditional and rests on the offline-tape pair. This separation makes it possible to honour the long-horizon promise without a forecast of energy and supply costs that nobody can credibly make.
6. Continuous migration as a planned operational cost. Tape is rewritten on a fifteen-year migration cycle (well inside the manufacturer-rated medium life, well inside the historical drive-availability window). Disk volumes are migrated forward as new generations of capacity arrive. Migration is planned, scheduled, and budgeted from the ingestion fee. Section 13 develops the volume migration model.
7. A certified operator platform for interoperability and handover. EFS operators do not run arbitrary infrastructure. The EFS software is licensed under terms that specify a certified platform end-to-end: the operating system, the filesystem (XFS), the volume labelling and registry conventions, the on-disk directory layout, and a closed list of approved hardware suppliers and models for storage units, tape libraries, and tape drives. Approved hardware is named per-generation in a versioned hardware schedule that accompanies the license; an operator builds new capacity from the schedule current at the time of acquisition and migrates forward as later generations supersede earlier ones.
The certification has two purposes. First, operator infrastructure built today is interoperable with infrastructure built by other operators today, down to controller firmware and library robotics, so cross-operator placement is not complicated by silent format or behaviour differences. Second, if an operator ceases operation, its hardware can be transferred to another certified operator and integrated without on-disk conversion and without firmware-quirk reverse-engineering: the volumes mount, the labels resolve, the registries import, the tape libraries accept the cartridges, and the archive continues to be served.
A class-of-hardware specification (any RAID controller of class X, any tape library of class Y) leaves enough room for compatibility surprises that long-horizon preservation cannot afford; pinning to specific suppliers and models removes that room. Section 13 develops the architectural specifics; the hardware schedule itself is maintained as a versioned annex to the license.
The remainder of this whitepaper specifies these mechanisms and the protocol surfaces that expose them.
4. Scope
Encrypted Filestorage is a paid, encrypted, append-only storage network for digital files associated with AssetIDs registered in CPR.
EFS is concerned with:
- packaging files into immutable encrypted containers;
- assigning stable AssetID-scoped identifiers to those containers;
- storing them across operators, jurisdictions, and storage media;
- replicating, auditing, and repairing them;
- maintaining the key material that authorised parties use to decrypt them;
- propagating availability information across the federation;
- serving them on authorised retrieval;
- metering storage and retrieval through Bitcash.
EFS is not responsible for:
- AssetID minting or registration (CPR);
- ownership records or attribution claims (CPR, CMS);
- mutable description, tags, articles, collections, or editorial state (CMS);
- offers, acceptance, payment routing, or sales agreements (Asset Market);
- user identity (Catalog.ID, required for every writer and retriever in v1, §8.3);
- adjudicating what content is.
5. Identifiers
EFS uses three identifiers, each with a distinct role.
5.1 AssetID
An AssetID identifies a logical asset registered with CPR. It is the namespace under which all EFS storage for that asset is rooted. AssetIDs are minted by CPR, sold for BIT, and authorise subsequent storage actions. EFS does not allocate AssetIDs and treats them as opaque tokens for namespacing and authorisation.
5.2 PackageID
A PackageID identifies one logical package generation under an AssetID. A package is the unit of submission and retrieval (§6). Its form is:
{assetID}.{operatorID}.{role}.{serial}
Where operatorID identifies the EFS operator that accepted the package, role is the package role (§6.4), and serial is a per-role generation number that increases monotonically.
Example:
qjrm4821xwpa.b4np.source.000001
qjrm4821xwpa.b4np.preservation.000002
qjrm4821xwpa.b4np.preview.000003
qjrm4821xwpa.b4np.access.000001
qjrm4821xwpa.b4np.text.000001
qjrm4821xwpa.b4np.edition.000002
qjrm4821xwpa.b4np.metadata.000001
qjrm4821xwpa.b4np.submission.000001
5.3 PartNr
A package whose total ciphertext size exceeds the package part limit (§6.2), or whose parts are produced incrementally rather than as a complete batch, is split into parts. PartNr is a five-digit zero-padded counter, platform-uniform within a protocol version:
p00001 (first part)
p00472 (four-hundred-seventy-second part)
p99999 (last permitted part of the generation)
The fully-qualified address of a part appends the PartNr to the PackageID:
qjrm4821xwpa.b4np.source.000001.p00007
The PackageID alone is the address of the package; the part address requires the PartNr suffix in all cases, including single-part packages where it is always p00001. Higher-level references (CMS locators, public package URLs) use the PackageID and resolve to part addresses on retrieval; operator-level operations (storage, replication, audit, retrieval) act on part addresses.
PartNr is a structural attribute of a part inside its package, not a global identifier. Numbering starts at p00001, so the five-digit width caps a package generation at 99,999 parts. Combined with the 16 GiB part-size limit (§6.2), this is roughly 1.5 PiB of ciphertext per generation: about 87 years of continuous capture at a 5 Mbps security-camera bitrate, or 3.8 years at 100 Mbps high-bitrate cinema capture. A publisher whose package approaches the limit seals the current generation and continues under the next serial (§7).
The current part count of a package is the number of parts EFS has accepted under its PackageID. A publisher may submit an optional signed package-level marker that asserts a final part count, sealing the generation against further parts. Submitting the marker at first-part time gives a batch package a fixed, advertised part count from upload onward; deferring it (or never submitting it) leaves the package open to further parts. Each part binds to exactly one canonical ciphertext digest (§10) under the same rules that apply to single-part packages.
5.4 Identity rules
Each package refers to exactly one AssetID. Multi-asset packages are not supported. Assets that group other assets are recorded as container assets in CPR (§6.4) rather than as packages spanning multiple AssetIDs.
The AssetID portion of an identifier is a namespace, not a cryptographic key. Encryption uses fresh random per-package keys (§11), wrapped for recipients through the wrapped-key records EFS stores (§12). The identifier names the object; the cryptographic key protects it.
5.5 Identifiers and digests
EFS distinguishes between naming and verification:
- PackageID and PartNr name the object.
- Ciphertext digest verifies the encrypted bytes.
- Plaintext digests in the encrypted manifest verify individual files after decryption.
Any party with the encrypted bytes can verify the ciphertext digest. Plaintext-level verification requires possession of the decryption material.
5.6 On-disk filenames
The bare address is the canonical identifier of a part. When parts are materialised in a general-purpose filesystem (self-hosted hubs §17, client-side download caches, legal exports §19.4), the on-disk layout is a directory per package and a file per part:
qjrm4821xwpa.b4np.source.000001/
├── p00001.cfc
├── p00002.cfc
└── p00003.cfc
The package directory name is the PackageID; each part file is named by its PartNr with a .cfc extension. A signed seal marker (§5.3), if submitted, lives alongside the parts in the same directory. Single-part packages get a directory containing one p00001.cfc file; the directory is retained as a uniform shape rather than collapsed to a single file at the package's path.
This convention applies only to general-purpose filesystem materialisation. Operators store parts internally on volumes, buckets, and tar chunks under their own conventions (§13).
6. Packages
The unit of submission and retrieval in EFS is a package. A package is a sealed encrypted container of files associated with a single AssetID under a single role (§6.4). Files travel into and out of EFS as packages. A package is what the publisher pays to ingest and what a retriever receives on retrieval.
6.1 Why packages
A package is the right unit of operation for several reasons. Most preservation submissions are not single files: a source deposit is a master file plus its sidecars, an edition deposit is a browser-ready bundle with payloads and signatures, a preview deposit is a small set of derivatives. Packaging them together gives them a single ciphertext digest, a single signature, a single license target, a single audit unit, and a single price. It also keeps the encryption boundary tight: the public header carries identifiers and digests, and everything else (file names, file structure, file count, plaintext) is inside the encryption.
6.2 Single-part and multi-part packages
A package whose total ciphertext fits within the platform part-size limit is a single-part package (p00001). A package whose total ciphertext exceeds the limit is split into parts, each addressable independently for replication, audit, and transport, but only the complete package can be decrypted: the encrypted manifest binds all parts together.
The v1 part-size limit is 16 GiB (17,179,869,184 bytes) per part. The choice balances several considerations:
- Client reliability. A part is the granularity at which a retry happens. At 16 GiB, a residential 1 Gbps link transfers a part in roughly 140 seconds, a 100 Mbps link in roughly 22 minutes. Both are short enough that a transient network failure costs at most one part-retry rather than a multi-hour upload.
- Server-side buffering. Each part is received into a temporary location before being moved into a bucket (§13.4). 16 GiB is comfortable for typical operator hardware without requiring large dedicated upload buffers.
- Tape efficiency. An LTO-9 cartridge holds 18 TB native, an LTO-10 cartridge 30 TB. At 16 GiB per part, more than a thousand parts fit on one tape, leaving ample room for the bucket structure (§13.4) and for batched writes.
- Round-number practicality. 16 GiB is a power-of-two byte count, which sits cleanly in filesystem allocation, in network transfers, and in part counts: a 1 TiB package is 64 parts, a 30 TB archive is roughly 1900 parts.
The boundary is configurable per protocol version. A future version may raise it as residential bandwidth and operator hardware improve, or lower it if reliability data argues for finer granularity. It is platform-uniform within a version: operators do not set their own part-size limit.
A package may also be published incrementally, with parts produced over time rather than as a single batch. Long desktop or task recordings, security-camera streams, and live event captures fit this pattern. A streaming-published package shares the same per-part format and per-part operational rules as a batch multi-part package; what differs is that the part count is not known when the first part is written and the seal marker (§5.3) is deferred or omitted. The streaming TOC arrangement (§9.5) supports navigation while the part list is still growing. EFS treats a streaming package as the same logical object as a batch package: a single PackageID, a single package key (§11), parts arriving and replicating under the same lifecycle (§15) and redundancy floors (§16) regardless of whether they arrived in one upload session or over months.
6.3 What a package contains
A package contains:
- A public sealed header (visible to operators without decryption): PackageID, AssetID, role, serial, format version, encryption parameters, ciphertext digest, public ciphertext size, part count, part number, and the issuer's signature over the header.
- An encrypted index and an encrypted framed body, encrypted with a fresh per-package symmetric key. The body holds the file contents as one or more authenticated frames; the index lists those frames and the files they hold.
- The internal manifest (inside the encrypted index): per-file entries giving the file number, plaintext digest, plaintext size, optional original path or name, optional media type, and the frame indices that hold each file's bytes. The manifest is what a recipient uses to verify and extract individual files after decryption.
- A signature block: the publisher's signature over the public header and ciphertext digest, using a key authorised under §8.
Section 9 specifies the on-the-wire and on-disk container format.
6.4 Package roles
Files belonging to an AssetID published by an EFS member are grouped into packages by role. The v1 role vocabulary is closed, small, and protocol-versioned. The role string appears in the PackageID and is therefore visible outside the decrypting client, so roles are kept privacy-safe and generic. They do not encode legal categories, sensitive personal data, classification levels, or specific workflow states.
This section defines the user package roles. System packages — packages produced by an operator's own internal services (EFS database preservation, future operator-level feeds) for ingestion through the same pipeline — use a separate system role registry defined in §6.5 and are catalogued in the system registry (§13.8a) rather than the package registry (§13.8).
source. A source package contains the original or master files associated with an asset: RAW image files, project files, master video or audio renders, manuscript files, high-resolution scans, acquisition sidecars, and source checksums. Source packages are optimised for preservation rather than for browsing; they are typically large and the body usually has few frames since partial retrieval of a master file is rarely meaningful. The current source generation is the preferred master, but all source serials remain preservation-relevant.
preservation. A preservation package contains an archive-managed preservation representation derived from or validated against source material: normalised preservation masters, validated TIFF, WAV, PDF/A and XML, PREMIS records, METS records, preservation manifests, migration outputs, validation reports. The current preservation generation is the active preservation representation; migrations create new serials. preservation is distinct from source: source is the origin, preservation is custodian-managed preservation state. The two travel separately because their lifecycles are different. Source serials are immutable history; preservation serials are routinely rewritten as formats migrate.
preview. A preview package contains derivative material intended for browsing and identification: thumbnails, low-resolution previews, poster frames, contact sheets, short clips, snippets. Preview packages are typically much smaller than source packages and are optimised for low-cost retrieval. The current preview generation is the default low-cost display package.
access. An access package contains an authorised user-facing access representation that is not necessarily a published edition: medium- or full-resolution JPEGs and PDFs, reading copies, streaming proxies, redacted access copies, accessible copies, viewer manifests. The current access generation is the default authorised access representation. access is the right role where preview is too small and edition is too publication-specific.
edition. An edition package is a curated, publishable, distributable, or licensed representation of an asset. It is the publication artifact: a browser-ready bundle containing structured data, thumbnails, image tiles, media chunks, browser runtime files, payment markers, and signatures sufficient for the bundle to be opened in a Catalog client or unpacked to a static web host. Editions are typically generated from a Catalog edition specification by a publishing engine that runs against catalog state. Multiple simultaneous editions for an AssetID are modelled with explicit edition identifiers and package references rather than only by the current pointer.
text. A text package contains textual extraction or textual representation derived from source or access material: OCR text, ALTO, hOCR, PageXML, transcripts, captions, subtitles, search indexes, text-alignment files. The current text generation is the preferred searchable or readable text representation. text is the v1 protocol role for OCR, HTR, transcription, caption, subtitle, and search-index workflows. Specific outputs are distinguished by file role inside the package manifest, not by separate package roles. This avoids fragmenting the small protocol-level role registry across narrow textual variants while still allowing text material to be versioned, granted, and retrieved independently from previews or editions.
metadata. A metadata package contains a metadata-only or metadata-dominant encrypted bundle versioned independently from payload files: descriptive metadata, administrative metadata, PREMIS, METS or MDTO records, EAD fragments, rights documents, validation reports, restricted finding-aid material, encrypted donor agreements, encrypted identity evidence. The current metadata generation is the preferred package-level metadata bundle for a given profile or purpose. metadata is the role used when sensitive material that does not belong in the plaintext catalog metadata layer needs encrypted, package-versioned, independently grantable storage. The CMS references such material through EncryptedMaterialReference pointers and never holds plaintext copies of its own.
submission. A submission package is a transfer or deposit package prepared for a receiving custodian or formal workflow: an E-ARK SIP, an MDTO-SIP, a NARA transfer support bundle, a BagIt package, an accession bundle, a transfer manifest, a SWORD deposit envelope. The current submission generation is the active transfer package for a target profile, unless transfer workflow selects a specific serial. Submission packages move through the creator-to-custodian transfer workflow as the carrier of profile-conformant exports.
Container assets. CPR records the parent-child relationships that compose hierarchical works (a music album of tracks, a book of pages, an encyclopedia of volumes). An asset whose role is to group child assets, with no source files of its own, is a container asset. A container asset has no source package. It may have a preview package containing thumbnails and structural overview material, an access or edition package representing the published view of its contained assets, and a metadata package carrying descriptive material that applies to the aggregation. EFS treats container assets identically to other assets; they simply happen to lack a source role.
File roles and structural roles inside a package. Inside the encrypted package manifest, each file carries a file_role and a structural_role. File roles include master, preservation_master, access_copy, ocr, alto, hocr, pagexml, transcript, caption, subtitle, thumbnail, tile, manifest, checksum, premis, mets, mdto, runtime, signature, and validation_report. Structural roles include page, cover, track, scene, canvas, chapter, layer, component, and attachment. EFS does not read these. They live inside the encrypted manifest and are interpreted only by the decrypting client and by the catalog. The split lets a single package contain many file types organised by their function in the asset — a text package, for example, may contain ALTO, hOCR, PageXML, and a search index together — without expanding the small protocol-level package-role registry.
Why the package-role registry stays small. Package roles control retrieval, currentness, routing, and access-grant lanes at the protocol level. They are visible in the PackageID and observable to operators, so the cost of widening them is paid in privacy and in client compatibility. File roles, by contrast, are internal to the encrypted manifest and can be rich and extensible without exposing anything to the operator. A new package role appears only when a category of material needs independent versioning, currentness, retrieval routing, or access grants that cannot be expressed by an existing role plus file roles plus catalog-level metadata.
The role registry is closed and protocol-versioned. Operators may not invent operator-local roles. Additional roles may be introduced in future protocol versions when concrete need is established, so that clients and indexers can validate and route packages on role without operator-specific knowledge.
6.5 System packages and system roles
Some packages written into the operator's storage substrate are produced by the operator's own internal services rather than by a user-facing publisher: EFS's own database-preservation streams (§16a), and any future operator-level feeds the protocol introduces. These are system packages. System packages share the on-the-wire CFC format (§9), the storage architecture (§13), the storage states (§14), the storage lifecycle (§15 with the refinements specified per system role), and the redundancy floor (§16) of user packages. They are not exposed on any of EFS's external surfaces: there is no HTTP ingestion endpoint for system packages, no HTTP retrieval endpoint, no federation availability claim (§19.2), no wrapped-key record issued against the parts, no Bitcash metering on writes or reads, no CMS browse entry, no public currentness query (§7), and no mute path (§20). They are operator-internal across the board.
What separates them from user packages is identity at three levels:
-
Separate role registry. System packages use a system role registry distinct from the user package-role registry of §6.4. The v1 system roles are:
Role Producer Purpose efs-walEFS database-preservation pipeline (§16a.3) PostgreSQL WAL segment stream from the operator's EFS database efs-basebackupEFS database-preservation pipeline (§16a.3) Periodic pg_basebackupsnapshots of the operator's EFS databaseLike the user role registry, the system role registry is closed and protocol-versioned; operators may not invent system roles.
-
Separate registry (§13.8a). Operators track system packages in the system registry alongside the package registry. Lookups, audits, repair, lifecycle bookkeeping, and migration operate on the same fields and follow the same disciplines, but the two registries are administratively distinct so that user-facing queries (CMS browse, federation availability claims, retrieval billing tied to user wallets) never resolve through system packages.
-
Unencrypted permitted. The "always encrypt" rule of §3 (design choice 1) is a property of user packages; specific system roles may be unencrypted where the system role's specification explicitly permits it. The
efs-walandefs-basebackupsystem roles are unencrypted (§16a.4) because their payload is either already-KEM-ciphertext (the wrapped-key envelopes stored in EFS's database) or operator-visible metadata, and encrypting them would couple long-term recovery to the survival of an operator key. Future system roles that carry user-confidential material remain encrypted under the standard rule.
Write and read paths. A system package is written by an operator-internal service handing a finalised part to the write coordinator (§13.6) via a local IPC channel, bypassing the network ingestion step (§13.5) that user packages use. The coordinator then runs the same fan-out, two-medium redundancy, bucket placement, and registry update as for user packages, recording the result in the system registry. Reads are issued by the same operator's recovery and integrity-check tooling against the system registry, reading directly from the volumes that hold the parts. No external read path exists.
System-package preservation is funded out of the operator's overhead budget for the AssetIDs the system packages support, not by a separate metered fee (§16a.7).
7. Generations and Currentness
Each PackageID carries a serial that orders generations of the same role under an AssetID. Serials are monotonically increasing and never reused. Once published, a PackageID is immutable; revisions create a new serial.
By default the latest published serial of a role is the current one for that role. A publisher who wishes to override this default, for example to keep an older generation current while staging a successor, or to roll back to an earlier generation after a bad upload, does so by submitting a signed set_current record naming the role and the chosen serial. Absent such a record, the latest serial wins.
EFS is the source of truth for what generations exist and which generation is current. The CMS queries EFS to resolve current generations when constructing public views. Version-agnostic locators of the form {assetID}/preview resolve through EFS to the current preview generation for that asset.
EFS does not interpret what a generation contains. It records existence, ordering, and currentness; the meaning of a particular generation belongs to the CMS.
A streaming-published generation (§6.2, §9.5) accumulates parts over time under a single PackageID and serial. Each new part adds to the same generation rather than creating a new serial; serials still order generations, not part appends. The publisher may close the generation at any time by submitting the signed package-level marker described in §5.3, which fixes the part count and rejects further parts. EFS records and serves the marker alongside the part list.
8. Authorisation for Writes
EFS accepts a write against an AssetID only when both of the following hold: the writing key is recognised by CPR as authorised to act on that AssetID, and the principal behind that key is an active Catalog.ID identity at the moment of write.
8.1 Authorised writer
The authorised writer is the Catalog.ID identity (personal, organization, delegate, or agent) whose per-machine CPR hybrid signing key pair (Ed25519 + ML-DSA-65, Catalog.ID §3.4.1a) is currently active and recognised by CPR as authorised for the AssetID. By default this is the buyer principal bound to the AssetID at purchase; if the asset has subsequently been transferred, the currently bound owner principal is the authorised writer.
EFS does not maintain the underlying authorisation mappings itself. It verifies two bindings before accepting a write:
- Active Catalog.ID account. The signing key belongs to a Catalog.ID identity whose account is currently active.
- CPR authorisation. That same identity is recognised by CPR as authorised to act on the named AssetID.
Both bindings must hold; either failure rejects the write.
8.2 How the bindings are verified
In normal operation EFS does not call Catalog.ID or CPR per write. Both bindings are evidenced by signed artefacts the client presents alongside the upload.
-
Catalog.ID session authorisation. When the principal authenticates an interactive or persistent-mode session, Catalog.ID issues a short-lived session authorisation token signed under the operator's Catalog.ID signing key. The token names the principal's username, the active per-machine signing key, the issuance and expiry timestamps, and (for agent sessions) the scope and the principal link. The client presents this token with the upload; EFS verifies the operator signature and the expiry. A live Catalog.ID query is the fallback for an expired or absent token and the canonical recourse if account standing is in question.
-
CPR proof-of-purchase. AssetIDs are sold in blocks, and CPR issues a signed proof-of-purchase for each block (or for an individual AssetID rebound by transfer). The proof names the AssetIDs covered, the authorised signing key, and the issuance timestamp; it is signed by the CPR operator. The client presents the proof with the upload; EFS verifies the operator signature and that the targeted AssetID falls within the proof's covered set. A live CPR query is the fallback for AssetIDs whose ownership has changed since the proof was issued, or where the proof is missing.
In both cases EFS additionally verifies that the part-header signature (§9) on the upload is produced by the same key the two artefacts name. Operators may set local policy on the maximum age of a presented session token or proof-of-purchase before a fresh consultation is required, and may run periodic asynchronous reconciliation against Catalog.ID and CPR without putting either service in the per-write critical path.
8.3 Active Catalog.ID account requirement
Every writer to EFS is an active Catalog.ID identity. Publication entirely outside the Catalog.ID account system is not supported. Three properties motivate this:
- Symmetry with retrieval. Retrieval requires an active Catalog.ID account by virtue of wrapped-key records addressing recipients by their
efs_encryptkeys (§12). Symmetric requirements on writes simplify the trust model and the legal posture. - Identity-bound revocation. EFS rechecks Catalog.ID standing on every retrieval; making writes equally identity-bound means a compromised or terminated account stops further writes against the principal's AssetIDs at the same moment it stops retrievals against the principal's wrapped-key records.
- Hybrid key handling. The hybrid signing model (Catalog.ID §3.4.1a) and the hybrid
efs_encryptmodel (Catalog.ID §3.4.1) are uniform across all Catalog.ID identities. Constraining writers to Catalog.ID identities means the same hybrid primitives apply on both sides of every operation without a parallel non-Catalog code path.
Pseudonymous publication remains possible within the Catalog.ID model: a Catalog.ID identity is pseudonymous by default (§9a of the Catalog.ID Whitepaper). What the account-system requirement excludes is publication that bypasses Catalog.ID entirely.
8.4 Bitcash and authorisation
Bitcash funds storage and retrieval (§18) but does not authorise writes. A funded wallet without an active Catalog.ID account and a CPR-authorised signing key cannot publish to an AssetID. An active Catalog.ID account with an authorised signing key but without a funded wallet cannot pay for storage. All three are required.
9. The Catalog File Container (CFC)
EFS packages use a single container format: the Catalog File Container (CFC), version 1. CFC is intentionally minimal. It defines what an EFS package part looks like on the wire and on disk; it does not define application-level packaging conventions.
A CFC v1 part has the following on-disk layout:
[ public sealed header ] fixed size
[ encrypted index ] variable size, declared in header
[ encrypted framed body ] variable size, declared in header
[ signature block ] fixed size
Public sealed header. Visible to operators without decryption. Carries:
- PackageID, AssetID, role, serial, PartNr (§5.3)
- format version
- encryption parameters
index_size: length in bytes of the encrypted index regionbody_size: length in bytes of the encrypted framed body regionindex_profile: flags describing the cumulative TOC window (zero, the full known part count for the part 1 of a batch package, or a fixed sliding-window value for streaming)- canonical ciphertext digest, public ciphertext size
- the issuer's signature over the header
The header is fixed size so that a client can fetch it with a single ranged GET of the first N bytes of the part.
Encrypted index. A single AEAD frame, encrypted with K_index = HKDF(K_pkg, "index"), containing:
- Local frame index. Per-frame entries for the frames in this part: ciphertext offset within the body, ciphertext length, plaintext digest, frame nonce.
- Local file metadata. Per-file entries for files (or file fragments) held in this part: file number, plaintext path or name, plaintext size, plaintext digest, optional media type, and the local frame indices that hold the file's bytes.
- Cumulative part-range TOC, optional, declared in
index_profile. For each part covered by this part's TOC window: part number, frame range, timestamp range (if applicable), and that part's canonical ciphertext digest. Approximately 40 bytes per entry.
Encrypted framed body. A concatenation of AEAD frames. Each frame is one chunk of plaintext content encrypted under K_pkg (§11) with a deterministic per-frame nonce derived from the package key and the frame's local index. Frames are independent: a client that has the package key and one frame's ciphertext can decrypt and verify that frame without reading neighbours. A small archival package may have a single frame containing all plaintext bytes; the format does not require multiple frames, only that frame boundaries exist where partial retrieval is meaningful.
Signature block. The publisher's signature over the public header and ciphertext digest, using a key authorised under §8.
The public header does not expose plaintext file names, file count, file sizes, frame count, or frame sizes. Operators see only what they need to identify, store, and audit the part.
9.1 Multi-part packages
A multi-part package consists of multiple CFC parts. Each part is a complete CFC object with its own public header, ciphertext digest, signature, and replication state. Parts share the PackageID prefix and are linked by their PartNr in their public headers (§5.3). A single part may be retrieved, audited, and replicated independently of its siblings; each part is decryptable for the frames it carries, without requiring any sibling. The package-level navigation index, mapping files to the parts that hold them, lives in part 1 for batch packages and in cumulative form across parts for streaming packages (§9.5).
9.2 Forward compatibility
The CFC format is versioned. Version 1 specifies the layout above. Future versions may introduce media-stream profiles, alternative cipher suites, or other format-level changes. The format version is declared in the public header of each part. Operators that do not support a given format version refuse storage of packages in that version rather than store them opaquely.
9.3 Edition packages
An edition package is a CFC package like any other; its role is edition and its content is the resolved output of a CMS Edition Spec. The publishing engine that resolves the spec is responsible for producing the CFC; EFS accepts and stores it under the same rules as any other package.
9.4 Partial retrieval
A client can retrieve any subset of a part's encrypted bytes without decrypting the whole part. This is what makes CFC suitable for workloads where most content may never be retrieved at all: edition packages with image tiles or media chunks (§6.4), long task or session recordings, security-camera streams. Archival packages where partial retrieval is not meaningful simply produce a single body frame and the same flow degenerates to a whole-part fetch.
Client retrieval flow.
- Issue a ranged GET for the first
header_sizebytes of the part. Parse the header and learnindex_size. - Issue a ranged GET for the encrypted index region. Decrypt using
K_index = HKDF(K_pkg, "index")from the package key obtained through the recipient's wrapped-key record (§12). - From the local file metadata (and the cumulative TOC, if present), identify the frame ranges that hold the wanted content.
- Issue a ranged GET for each needed frame and decrypt with
K_pkg. Verify each plaintext block against the plaintext digest in the index.
Once the encrypted index has been fetched and decrypted for a part, it is cached client-side for the lifetime of the part: the part is immutable (§10), so the index never changes.
Package-level TOC in a batch multi-part package. Part 1 carries the package-level table of contents in its encrypted index: total file count, total ciphertext size, total frame count, and the global file map file_number -> [(part_nr, local_frame_idx), ...]. Parts 2..N carry only their local frame index and local file metadata. A client fetches part 1's header and index once per session, caches the TOC, and from then on issues targeted byte-range retrievals against the parts that actually hold the content of interest. Streaming packages distribute the TOC differently; see §9.5.
Canonical ciphertext digest. Computed over the concatenation of header input, encrypted index, and encrypted body. Replication, audit, and repair operate on the part as a whole; partial retrieval changes nothing about how operators verify a part's bytes.
Subrange retrieval and operator pricing. A partial retrieval is one or more byte-range GETs against a part already resident on the operator's storage. The per-package operational overhead that motivates the §18.3 minimum-billing floor (acceptance, signing, replication, lifecycle bookkeeping) is incurred at part ingestion, not at each subrange retrieval. Subrange retrievals are billed on the per-byte component only, with no minimum.
9.5 Streaming-published packages
A streaming-published package is a package whose parts are produced and ingested incrementally rather than as a complete batch. Examples include long desktop or task recordings, security-camera streams, live event captures, and any workload that produces ciphertext at a steady rate over hours, days, or years.
The packaging differences from a batch multi-part package are minimal:
- Cumulative TOC carriage. Each part's encrypted index carries a cumulative part-range TOC covering the latest W parts ending at and including this one. W is fixed for the package and declared in each part's public header (
index_profile). Bounded streaming may set W to the expected lifetime part count; unbounded streaming sets W to a sliding-window size such as 1000. - Ingestion cadence. Parts arrive over time. Each arriving part follows the same authorisation (§8), redundancy fan-out (§15.2), and ingestion payment (§18.1) rules as any other part. The single ingestion fee per part, the eternal-storage commitment, and the lifecycle treatment are unchanged.
- Deferred or omitted seal marker. A streaming publisher submits the §5.3 seal marker only when ready to close the generation, or never. Until then, the package's part list is open and operators continue to accept further parts under the same PackageID and serial.
Client navigation, cold start.
- Ask EFS for the part list of the PackageID. EFS returns the highest accepted PartNr (and the full list of part numbers if requested).
- Fetch the highest-numbered part's header and encrypted index. The index includes the cumulative TOC for the latest W parts.
- Decrypt the TOC. Use it to map a desired chunk, timecode, or file to a
(part_nr, local_frame_idx)pair. - Fetch that part's header and encrypted index, then the targeted frame range, following the retrieval flow in §9.4.
Reaching parts older than the TOC window. In an unbounded stream where the desired part is older than W parts back from the head, the client either walks back through earlier parts' cumulative TOCs in W-sized hops, or estimates the target part from elapsed time and binary-searches via ranged header GETs. EFS itself remains structurally blind: it serves bytes and counts parts, and does not need to know the frame or timecode boundaries that live inside the encryption.
EFS service surface. EFS adds two small operations to support streaming-published packages:
- List parts. Given a PackageID, return the set of accepted PartNr values and the highest one. Does not require decryption.
- Subscribe to part arrivals. Optional. A consumer that wants to follow a live stream may subscribe to receive notifications as new parts arrive. The notification carries only the PartNr and the part's canonical ciphertext digest; the consumer fetches and decrypts on its own.
Neither operation gives EFS visibility into the encrypted index or the package contents.
10. Canonical Ciphertext
A package part binds to exactly one canonical ciphertext digest. All replicas of a part across the network store the same encrypted byte sequence. Operators may apply local at-rest encryption to their own disks, but this internal protection must not change the public bytes returned on retrieval.
Canonical ciphertext is the foundation for replication, repair, audit, self-hosted mirroring, and verification. A client that retrieves a part from any operator can verify it against the public digest registered at the original ingress and detect corruption, substitution, or partial damage without access to decryption keys.
11. Encryption Model
EFS stores encrypted bytes. Plaintext access requires possession of decryption material; EFS itself never sees plaintext.
Each package is encrypted with a fresh symmetric key, the package key, generated at packaging time and never reused. The package key is unique to the package and immutable for the lifetime of the ciphertext. AES-256-GCM is the v1 symmetric primitive; a 256-bit symmetric key retains 128 bits of effective security against a quantum adversary under Grover's algorithm, so bulk encryption at this layer is independent of any asymmetric primitive whose hardness might later be revised. The publisher's signature over each part header is a hybrid signature (Ed25519 + ML-DSA-65) produced by the principal's currently active per-machine CPR signing key pair (Catalog.ID §3.4.1a); both halves must verify on ingestion and on later integrity audits.
Access to a package key is mediated through wrapped-key records that EFS stores alongside the ciphertext (§12). The wrap targets a dedicated EFS-purpose hybrid keypair held in every Catalog.ID identity's key bag, the efs_encrypt key. The efs_encrypt keypair is a hybrid of an X25519 classical part and an ML-KEM-1024 post-quantum part (Catalog.ID §3.4.1), so wrapping under it requires breaking both an elliptic-curve and a lattice problem to recover the package key. Key separation by purpose — efs_encrypt is used only by EFS — keeps the blast radius of a key compromise scoped to one functional area: a compromised efs_encrypt exposes EFS-wrapped material the holder had already retrieved, but does not propagate to other Catalog.ID encryption functions, and rotation through Catalog.ID §3.3 immediately stops further EFS retrievals against the revoked key without disturbing the rest of the identity's key surface.
At ingestion, the publisher's client wraps K_pkg under the publisher's own efs_encrypt hybrid public keys and posts the resulting WrappedKey record to EFS; later shares produce further wraps addressed to the named recipients. The package key itself never appears in plaintext outside the wrapping party's and decrypting recipient's local memory. Ciphertext and wrapped keys both live inside EFS, and EFS verifies the recipient's wrapped-key record on every retrieval before delivering ciphertext (§12.6).
12. Wrapped Keys
EFS holds the wrapped-key material that admits a recipient to a package. The CMS holds the policy half of any access relationship — what the recipient may do, under what restrictions, on whose decision, recorded as RightsExpression, AccessCondition, and AccessDecision entries. EFS holds the cryptographic half: the wrapped package key that gives the recipient technical capability to decrypt the corresponding ciphertext. The two halves are independently maintained: the policy half lives in the CMS where it can be amended, refined, and audited without disturbing the cryptographic half; the cryptographic half lives in EFS, co-resident with the ciphertext it unlocks.
A wrapped-key record is recipient-bound from inception. The publisher (or an agent acting under the publisher's authority) wraps the package key under the recipient's Catalog.ID efs_encrypt hybrid public keys (Catalog.ID §3.4.1) and posts the signed record to EFS. There is no countersignature step on the recipient side and no handshake: once the record is in place at EFS and signed by an authorised wrapper, the recipient can retrieve and decrypt. Whether the recipient is contractually bound by terms accompanying the share is a matter for the policy half (Asset Market settlement captures buyer consent at sale time; out-of-band agreements may exist) and is outside EFS's cryptographic gate.
The publisher wraps for themselves first. The same package key K_pkg is wrapped under the publisher's own efs_encrypt keys at ingestion and stored as the publisher's own wrapped-key record; the publisher's later retrievals consult the same store as anyone else's. There is no special path for the owner.
12.1 What EFS stores and what it does not
EFS stores, in its operational database:
- the wrapped-key envelope for each
(asset_id, package_role, recipient_party_id)triple it has accepted; - the recipient binding (
recipient_party_id,recipient_pubkey_digest); - the issuing party's identity, signing-session reference, and hybrid signature;
- the scope tag, the
issued_attimestamp, and an optionalexpires_at; - the revocation state and revocation signature if revoked.
EFS does not store:
- the publisher's master keys or
efs_encryptprivate keys (these live in the publisher's Catalog.ID key bag); - the recipient's
efs_encryptprivate keys (these live in the recipient's Catalog.ID key bag); - plaintext K_pkg (it never appears in plaintext outside the wrapping party's and decrypting party's local memory);
- policy material (RightsExpression, AccessCondition, AccessDecision live in the CMS);
- payment routing (Bitcash handles settlement).
The wrapped-key record carries no terms_digest, no terms reference, and no recipient countersignature. The policy half is the CMS's responsibility and is not part of EFS's cryptographic gate.
12.2 The wrapped-key record
WrappedKey:
asset_id: string
package_role: string
recipient_party_id: string # the recipient's Catalog.ID username
recipient_pubkey_digest: string # SHA-512 over (classical_pk || pq_pk) of the
# recipient's efs_encrypt hybrid key pair
wrapped_key:
classical_ephemeral_pk: bytes # X25519 ephemeral public key (32 bytes)
pq_ciphertext: bytes # ML-KEM-1024 ciphertext (1568 bytes)
kdf_salt: bytes # fresh 32-byte salt for HKDF-SHA-512
aead_nonce: bytes # 12-byte AES-256-GCM nonce
aead_ciphertext: bytes # AES-256-GCM ciphertext + tag over K_pkg
algorithm: string # "hybrid-x25519-mlkem1024-aesgcm256-v1"
scope: read | replicate | review | publish | migrate | preserve
expires_at: datetime | null
issued_at: datetime
issuing_party_id: string # the wrapping party's Catalog.ID identity
issuing_session_id: string # the session under which it was signed
issuing_signature:
classical_signature: bytes # Ed25519 (64 bytes)
pq_signature: bytes # ML-DSA-65
algorithm: string # "hybrid-ed25519-mldsa65-v1"
Records are keyed on (asset_id, package_role, recipient_party_id). A later record for the same triple supersedes the earlier one; this is the path used for recipient key rotation, where the publisher posts a fresh wrap addressed to the recipient's new efs_encrypt keys.
The recipient_pubkey_digest is SHA-512(classical_pk || pq_pk) over the recipient's efs_encrypt hybrid public key pair as fetched from Catalog.ID at wrap time. The digest binds the wrap to the specific key pair: if the recipient rotates their efs_encrypt key (Catalog.ID §3.3), retrieval against the prior record fails verification at EFS until a fresh wrap is posted, even though the prior wrapped bytes remain cryptographically decryptable by whoever already held the corresponding private key.
The issuing_signature is a hybrid Ed25519 + ML-DSA-65 produced by the issuing party's per-machine CPR signing key (Catalog.ID §3.4.1a). Both halves must verify independently.
12.3 The hybrid KEM wrapping construction
The package key K_pkg is wrapped using a hybrid X25519 + ML-KEM-1024 KEM combined into a single AES-256-GCM key. An attacker must break both layers — solve the elliptic-curve discrete-logarithm problem on Curve25519 and the Module Learning With Errors problem on structured lattices — to recover K_pkg. The two problems are unrelated, so a breakthrough in one does not weaken the other.
# Inputs from the recipient's Catalog.ID account
classical_pk = recipient.efs_encrypt.classical_pk # X25519 (32 bytes)
pq_pk = recipient.efs_encrypt.pq_pk # ML-KEM-1024 (1568 bytes)
# Step 1 — classical ECDH with a fresh ephemeral keypair
(eph_sk, eph_pk) = X25519.keygen()
S_classical = X25519(eph_sk, classical_pk) # 32-byte shared secret
zero(eph_sk) # discard ephemeral private key
# Step 2 — post-quantum KEM
(ct_pq, S_pq) = ML-KEM-1024.encaps(pq_pk) # 1568-byte ciphertext, 32-byte shared secret
# Step 3 — combine via HKDF with fresh salt
salt = random(32)
combined_secret = HKDF-SHA-512(
ikm = S_classical || S_pq,
salt = salt,
info = "catalog-efs-wrap-v1"
) # 32-byte AES key
# Step 4 — AEAD-encrypt the package key under the combined secret
aead_nonce = random(12)
AAD = canonical(asset_id || package_role
|| recipient_party_id || recipient_pubkey_digest
|| scope || expires_at || issued_at
|| issuing_party_id || issuing_session_id)
aead_ciphertext = AES-256-GCM.encrypt(combined_secret, aead_nonce, K_pkg, AAD)
AAD (additional authenticated data) binds the wrap to its record context. An operator who tries to substitute a wrap by repackaging the same wrapped-key bytes into a different (asset_id, package_role, recipient_party_id) row fails AES-GCM authentication on decapsulation.
The issuing hybrid signature covers (asset_id || package_role || recipient_party_id || recipient_pubkey_digest || classical_ephemeral_pk || pq_ciphertext || kdf_salt || aead_nonce || aead_ciphertext || scope || expires_at || issued_at || issuing_party_id || issuing_session_id). A verifier checks both signature components against active Ed25519 and ML-DSA-65 keys on the issuer's Catalog.ID account; both must verify.
Decapsulation. The recipient runs the construction in reverse: derives S_classical = X25519(classical_sk, classical_ephemeral_pk), derives S_pq = ML-KEM-1024.decaps(pq_sk, pq_ciphertext), re-derives combined_secret via the same HKDF call with the published kdf_salt, and AES-256-GCM-decrypts aead_ciphertext with the rebuilt AAD. Any of: revocation of the recipient's efs_encrypt keys, mismatch on AAD, or breakage of either KEM half causes decapsulation to fail.
12.4 Who may post a wrapped-key record
Three categories of party post wrapped-key records:
- the publisher, at ingestion (the self-wrap) and on any subsequent decision to share with a named recipient;
- the publisher's persistent-mode agent acting under a
share_signingscope, when the publisher's interactive workstation is offline and a sale settlement or pre-authorised share pattern matches. The agent is itself a WrappedKey-record recipient at EFS for the AssetIDs in its scope: the publisher delegates by posting, at publication or scope-update time, a WrappedKey record addressed to the agent's CID. To wrap for a further recipient, the agent fetches its own WrappedKey record from EFS, decapsulates K_pkg with theefs_encryptprivate key in its Catalog.ID key bag, re-wraps under the new recipient'sefs_encryptpublic keys, signs the new WrappedKey record, and posts it to EFS. K_pkg lives in plaintext only in the agent's memory during the re-wrap and is zeroized afterwards; the agent holds no separate persistent K_pkg store; - the seller in an Asset Market agreement, whose settlement triggers a wrap for the buyer at sale time. Asset Market presents the seller's terms in the purchase UI; completing the purchase records the buyer's agreement to terms in the CMS (RightsExpression / AccessCondition) and triggers the seller's wrap and post to EFS. The cryptographic gate at EFS does not depend on this Catalog-side consent capture.
EFS accepts a record on a valid hybrid signature from a party whose per-machine signing key is currently active and recognised by CPR as authorised to act on the AssetID (the same authorisation gate as for writes, §8). A wrap whose signature is invalid, whose issuer is not authorised for the AssetID, or whose recipient pubkey digest does not match the recipient's currently published efs_encrypt keys at Catalog.ID is rejected.
Layered encryption involving an outer ingress-operator wrapper is not part of v1.
12.5 Sharing without handshake
Sharing is one action. The publisher's client fetches the recipient's efs_encrypt hybrid public keys from Catalog.ID, obtains K_pkg in memory (freshly generated if this share rides on ingestion, otherwise recovered by decapsulating the publisher's own WrappedKey record at EFS with the publisher's efs_encrypt private key from the Catalog.ID key bag), wraps K_pkg under the recipient's keys, signs the WrappedKey record, posts it to EFS, and zeroizes K_pkg in memory. The recipient is not asked for anything; they discover the share next time they connect, or via a Catalog.ID notification if the publisher chose to send one. From the moment EFS accepts the record, the recipient can retrieve and decrypt.
A recipient who does not wish to use a shared wrap simply does not retrieve. They may also post a signed self-revocation against the record (§12.7) to release the wrap formally — useful for institutional recipients releasing prior access after a transfer has completed.
12.6 Verification on retrieval
When a client retrieves a package from EFS, EFS:
- looks up the wrapped-key record for
(asset_id, package_role, recipient_party_id)in its own store; - verifies the record has not been revoked and has not expired;
- verifies the record's hybrid issuing signature against an active per-machine CPR signing key on the issuer's Catalog.ID account;
- checks that the issuer was authorised by CPR to act on the AssetID at
issued_at; - checks the requesting party's Catalog.ID account is currently active (not terminated, not under lockdown);
- checks the
recipient_pubkey_digestmatches anefs_encryptkey that is currently active on the recipient's Catalog.ID account (i.e., has not been revoked through key rotation per Catalog.ID §3.3).
EFS delivers ciphertext on positive verification and refuses on negative verification. The recipient fetches their WrappedKey record from EFS alongside the ciphertext, decapsulates K_pkg locally using the efs_encrypt private key in their Catalog.ID key bag, and decrypts. Operators may cache standing state for a short operational window; Catalog.ID is authoritative for account and key liveness and is consulted live when the operational cache is stale.
The active-account and active-key rechecks are the operational mechanism by which a compromised recipient key, once detected and rotated, stops further EFS retrievals against that key. Wrapped-key bytes already in the attacker's hands at the moment of compromise remain cryptographically decryptable by anyone holding the corresponding private key; the verification gate prevents further ciphertext from being delivered against the revoked key. This bounds the damage of a key compromise to material the attacker had already retrieved at the time of revocation.
12.7 Revocation
The publisher (or an agent acting within scope) may revoke a wrapped-key record by posting a signed revocation against (asset_id, package_role, recipient_party_id, issued_at). The recipient may also self-revoke their own record by signing a withdrawal; this is the path a recipient uses to release access formally without waiting on the publisher.
On revocation, EFS transitions the wrapped-key record to a terminal state. EFS retains the record's metadata (asset_id, package_role, recipient_party_id, issued_at, revoked_at, revocation signature) in its active database, and zeroizes the wrapped-key envelope bytes (classical_ephemeral_pk, pq_ciphertext, kdf_salt, aead_nonce, aead_ciphertext) in the same committed transaction. Subsequent retrieval verification against the record fails.
The historical wrapped-key bytes are not destroyed: they survive in EFS's database preservation streams (§16a) for the long-horizon evidentiary record. A legitimate need to inspect a historical wrap (a forensic investigation, a successor operator reconciling state, a recipient disputing a wrongful denial) is served by reading the preservation streams under operator-internal access (§16a.8); it is not served from the live database.
Revocation affects future retrieval. It does not retract material that the recipient has already retrieved and decrypted.
12.8 Privacy
Wrapped-key records are metadata-visible (existence, recipient identity, package role, issuer, timestamps, expiry, status) but content-hidden (the wrapped-key envelope only decapsulates against the corresponding recipient private key). An EFS operator sees which Catalog.ID identities hold wrapped-key records for which assets and roles, and can infer access patterns from retrieval traffic; the operator cannot recover any wrapped key without the corresponding private key.
The recipient's recipient_party_id is their Catalog.ID username. This may be a long-lived identity that links wrapped-key records across packages (revealing a graph of who has been granted what), or a per-share pseudonym at the cost of complicating later rotation. The choice is the recipient's, governed by Catalog.ID's pseudonymous-by-default identity model (§9a of the Catalog.ID Whitepaper).
12.9 Mirror-set replication
Wrapped-key records for an AssetID live at every EFS operator in the AssetID's mirror set (§16.4) — the set of operators where the ciphertext has been ingested. Records do not replicate beyond the mirror set; an operator not in the mirror set does not hold the AssetID's records and cannot answer retrieval against it. The two distribution paths are:
- client-side fan-out: the issuing client queries the AssetID's mirror set from CPR and posts the WrappedKey record (and any subsequent revocation) to each operator in the set independently;
- operator-side forwarding: the issuing client posts to one operator and marks the record for mirror-set forwarding; the receiving operator forwards the signed record to its peer operators and records acknowledgements; the client treats the post as complete only when all peers have acknowledged.
Either path produces the same end state: one copy of the record at every operator in the mirror set, in lockstep with the ciphertext it unlocks.
Operator cessation (§16.7, §16a.6) transfers the wrapped-key records to the successor operator together with the ciphertext, registries, and database-preservation streams: an operator's database is the set of records that make its ciphertext interpretable, and the two move as one substrate.
13. Storage Architecture
This section specifies how an EFS operator organises stored package parts on its own infrastructure: how filesystem volumes are labelled, registered, and migrated; how parts are placed into bucket folders; and how one writable bucket per volume coordinates concurrent writes. The architecture is the substrate that the storage states (§14) and the storage lifecycle (§15) operate over.
13.1 Volumes and storage units
The unit of storage allocation inside an operator is a volume. A volume is an XFS-formatted filesystem on a RAID set, on a single drive, on an LTO tape, or on any other block device the operator manages. XFS is mandatory under the certified-platform commitment introduced in §3 (design choice 6), which also fixes the operating system, the labelling and registry conventions of this section, and the directory layout used by buckets (§13.4) and migrated volumes (§13.3).
The hardware unit underlying a typical Online or Standby volume is a storage unit: a drive enclosure with a controller, presenting one volume to the operating system. A server may host several storage units; large operators host many. Each storage unit is independently addressable, independently powerable, and independently mountable, so the failure of one storage unit does not affect the others. Storage units are sourced from the closed list of approved suppliers and models in the licensed hardware schedule; tape libraries and tape drives are likewise sourced from the schedule. Other operators receiving an operator's hardware on cessation rely both on the on-disk specifics covered in this section and on the model-level identity of the storage units, libraries, and drives, so that no firmware-quirk surprise sits between the receiving operator and a working archive.
A volume is identified by a volume label of the form:
EFS{operatorID}{volumeID}
where operatorID is the operator's identifier as it appears in PackageIDs (§5.2), and volumeID is a four-character lowercase hexadecimal string assigned in monotonically increasing order at the operator. Four hex digits accommodate 65,536 volumes per operator, sufficient for very large fleets at present-generation drive capacities. Examples:
EFSb4np0001
EFSb4np00a3
EFSb4npffff
A volume label is never reused, even after the underlying physical media has been migrated to a successor (with the legacy media retained in archival custody, §16.3). A label retired from active service remains in the operator's volume registry as evidence of past existence and migration history.
13.2 The volume registry
The operator maintains a PostgreSQL volume registry with one row per volume. Each row carries at minimum:
- volume label (primary key);
- nominal capacity in bytes;
- current package-part count (starting at zero, incremented when a part is written, never decremented in routine operation);
- writable flag (boolean): true while the volume can accept new writes, false once it has reached its planned fill level or been frozen for migration;
- state: one of the storage states defined in §14;
- mount point or device path when the volume is currently online;
- parent volume label, if the volume has been migrated into another volume (§13.3);
- creation timestamp, last-state-change timestamp, last-audit timestamp.
The volume registry is operator-internal and is not part of the federation's published metadata. The package registry, separately, records which volume(s) hold which parts (§13.8) and can resolve a part to its current physical location through the chain of parent-volume entries.
13.3 Volume migration
Storage technology evolves. The operator's first volumes might be 20 TB RAID arrays; later it will run 60 TB volumes, then larger. To carry a long-horizon archive forward across this evolution, EFS treats migration as a first-class operation rather than as an ad-hoc copy.
A volume can hold either:
- package parts (the leaf case), or
- migrated child volumes, with each child appearing as a subdirectory whose name is the child's volume label, containing the child's complete on-disk structure.
A migrated volume is always one level deep at any given time. When three 20 TB volumes are migrated into a single 60 TB volume, the new volume has three subdirectories named after the three children; the children's volume registry rows now point to the parent. When that 60 TB volume is later migrated into a 100 TB volume alongside others, the chain is flattened: the child volumes' rows are rewritten to point directly at the new parent, and the intermediate volume is retired. The parent-pointer field always names the volume that physically holds the part on disk now, never an intermediate.
Package-part lookups resolve through the parent pointer transparently: the package registry records the volume label that originally received the part; the volume registry tells the lookup which mounted path corresponds to that label, whether through a direct mount or through a parent volume's subdirectory.
13.4 Buckets
Inside a volume, package parts live in bucket folders:
{volume root}/{YYYY-MM-DD}/{bucket-seq}/
Where YYYY-MM-DD is the calendar date the bucket was created (operator local time), and bucket-seq is a five-digit zero-padded sequence number that increases within the date. Examples:
/mnt/EFSb4np00a3/2026-05-06/00001/
/mnt/EFSb4np00a3/2026-05-06/00002/
/mnt/EFSb4np00a3/2026-05-07/00001/
A bucket has:
- a maximum size, set by operator policy. The v1 default is 1 TiB. A bucket fills up when the next part to be written would push it over the limit.
- a writable flag: at most one bucket per volume is writable at any moment.
When the writable bucket on a volume is closed (because adding the next part would exceed its size limit, or because the operator schedules a flush), it is set read-only at the filesystem level and a new bucket is created with the next sequence number.
A server with several storage units (§13.1) has several volumes, and therefore several writable buckets in parallel, one per volume. The bucket folders on different volumes may carry the same date-prefixed name without conflict, because each folder lives under its own volume root. A given calendar date can therefore produce, for example, /mnt/EFSb4np00a3/2026-05-06/00001/ and /mnt/EFSb4np00b1/2026-05-06/00001/ simultaneously, each receiving parts independently of the other under its own coordinator.
Bucket boundaries are organisational, not structural. They group parts on disk into a chronological browsing order, limit write concurrency to a single coordinator per volume, eliminate filesystem-level contention, and make the read-only/writable distinction explicit at the directory level so that operator scripts and audits can reason about it without consulting the volume registry. They do not dictate the boundaries of tape writes: tape writes (§13.9) are organised into tar-chunk batches that may aggregate parts from multiple buckets across multiple storage units.
13.5 INCOMING and FAST: volume-agnostic stores
Two volumes inside an operator are special: they carry no permanent data, do not participate in the labelled-volume scheme of §13.1, and serve transient or cache roles. Both are SSD-backed.
INCOMING. The INCOMING volume is the entry and exit point for byte traffic. Every part that the operator receives over the network lands in INCOMING first, where it is verified (signature, ciphertext digest, format) before any labelled volume ever sees it. Every part that the operator restores from tape is extracted into INCOMING (§13.9), from where it is treated as a fresh intake into a new bucket on a labelled volume. INCOMING also holds the transient state of parts that have been copied to one labelled volume but not yet to a second: a part stays catalogued in INCOMING until two-medium redundancy across distinct storage units has been confirmed, at which point its INCOMING copy is released. INCOMING therefore tracks two populations:
- parts in the process of being uploaded by clients or restored from tape;
- parts uploaded and copied to one labelled volume but not yet to a second.
INCOMING is operator-internal and volume-agnostic: clients and the federation never reference an INCOMING path, and the package registry does not bind a part's permanent location to INCOMING.
FAST. The FAST volume is an SSD-backed cache for frequently retrieved packages. The operator's caching heuristic copies a package into FAST when access frequency justifies it, and reclaims FAST entries when they cool. The path schema inside FAST is operator-local; the package registry row for a cached package carries the FAST path so that the retrieval pipeline can locate the cached copy without searching FAST itself. FAST is volume-agnostic in the same sense as INCOMING: it is not part of the labelled-volume scheme, and a missing FAST entry is not a repair condition.
Neither INCOMING nor FAST counts toward the redundancy floor (§16). Their presence does not raise the floor, and their absence does not lower it.
13.6 The write coordinator
A write coordinator owns the writable-bucket lock for each labelled volume. Disk-side ingest of a part goes through it. The flow for placing a verified part on disk:
- The part has been received over the network and verified into INCOMING (§13.5). The INCOMING entry remains during the entire flow below.
- The ingestion pipeline asks the coordinator on the first target volume for a target path. The pipeline picks a volume on a different storage unit from any subsequent target it intends to use, so the two disk-side copies live on physically distinct storage units (§13.1).
- The coordinator acquires the volume's writable-bucket lock.
- The coordinator inspects the volume registry and the writable bucket: if writing this part would push the bucket over its maximum size, it closes the current bucket (sets read-only at the filesystem level), opens a new bucket with the next sequence number, and updates the volume registry.
- The coordinator returns the target path inside the writable bucket.
- The pipeline copies (does not move) the verified part from INCOMING to the target path. The INCOMING source is preserved, so two-medium redundancy is in place from the moment the disk write completes: SSD (INCOMING) plus the labelled disk volume.
- The pipeline increments the volume's package-part count in the registry, and writes the part's volume binding into the package registry.
- The coordinator releases the lock.
- The pipeline now repeats steps 2 through 8 against a second labelled volume on a different storage unit. The second copy may land on an Online volume or a Standby volume, depending on operator policy.
- Once the second labelled-volume copy is registered, the INCOMING entry for the part is released.
If a volume reaches its overall planned fill level (a configured fraction of nominal capacity, leaving headroom for filesystem metadata and for migration target space), the coordinator sets the volume's writable flag to false, declines further part requests for that volume, and routes new ingests to the next writable volume.
13.7 Read-only volumes and hardware enforcement
A volume that has been filled to its planned capacity is set read-only in the volume registry, mounted read-only at the operating-system level, and where the operator's RAID controller supports it, marked read-only at the controller level so that even a misbehaving operating-system command cannot write to it. This is the disk-tier complement of the air-gap discipline applied to vault tapes (§14): once a volume's writable life has ended, the operator's online infrastructure cannot accidentally or maliciously rewrite it.
Read-only does not mean immutable forever: when the volume reaches the end of its operational life, it is migrated into a successor volume (§13.3) and the original is retired. Until then, however, the contents are stable bytes.
13.8 The package registry
Separately from the volume registry, the operator maintains a package registry that records, per user package part:
- the part's address (PackageID and PartNr);
- the canonical ciphertext digest;
- the labelled volume(s) currently holding the part, with the storage state (§14) of each;
- the tape barcode and tar-chunk index of any tape copies (§13.9);
- the FAST path, if a copy is currently held in FAST;
- audit history (last challenge, last successful verification);
- the part's lifecycle position (§15).
The package registry is the operator's primary lookup structure for user-package retrieval. It does not duplicate the federation-wide availability claims (§16); those are derived from it. The package registry tracks only user packages (§6.4); system packages are tracked in the parallel system registry (§13.8a).
13.8a The system registry
Alongside the package registry, and using the same volume infrastructure, the operator maintains a system registry that records, per system package part:
- the part's address (PackageID and PartNr);
- the system role (§6.5:
efs-wal,efs-basebackup, or a future protocol-registered system role); - the producing internal service (EFS database-preservation pipeline identifier, or a future producer identifier);
- the canonical ciphertext or cleartext digest (system packages may be unencrypted, §6.5);
- the labelled volume(s) currently holding the part, with the storage state (§14) of each;
- the tape barcode and tar-chunk index of any tape copies (§13.9);
- audit history (last challenge, last successful verification);
- the part's lifecycle position (§15) with the role-specific refinements that apply (e.g., §16a.5).
The system registry is the operator's lookup structure for operator-internal access to system packages. There is no external read path against the system registry — its entries are not resolved through HTTP retrieval, are not advertised in federation availability claims, are not gated by wrapped-key records, and are not billable under Bitcash. Reads against the system registry are issued by the operator's own services: the EFS database-preservation pipeline during recovery preparedness drills and during actual recovery (§16a.8), and any future operator-internal service that consumes a future system role.
The system registry is administratively distinct from the package registry for three reasons:
- Operator-internal scope. System packages exist to support the operator's own infrastructure (EFS database preservation, future operator-level feeds). They are not advertised to user-facing surfaces, do not appear in federation availability claims for user assets, and are not billed against user wallets. Keeping them in a separate registry prevents accidental cross-contamination of user queries by operator-internal records and makes it structurally impossible for a misconfigured user-facing handler to resolve a system part to a user-facing response.
- Distinct lifecycle adaptations. Specific system roles have lifecycle behaviour that diverges from the user-package defaults (e.g., the
efs-basebackupstream does not fall to tape-only under the idle thresholds of §15.3, per §16a.5). Encoding these per-role differences cleanly is simpler in a dedicated registry than as exception flags in the package registry. - Auditable separation of concerns. A regulator, auditor, or successor operator examining the operator's holdings can read the two registries independently and reason about user-published material and operator infrastructure separately, without resolving them through a discriminator flag on every row.
The system registry shares the volume registry (§13.2), the bucket model (§13.4), the write coordinator (§13.6), the read-only and hardware-enforcement disciplines (§13.7), and the tape volume and tar-chunk architecture (§13.9). It is a parallel index over the same storage substrate, not parallel storage. The write path is operator-internal: the producing service hands a finalised part to the write coordinator via local IPC and the coordinator performs the same two-medium fan-out, bucket placement, and tape integration as it does for user packages — without the §13.5 network-ingestion step that applies to user uploads.
13.9 Tape volumes and tar-chunk writes
A tape cartridge (LTO) is a volume with the same registry obligations as a disk volume: it has a row in the volume registry, a unique label, a recorded capacity, and a parent-pointer if it has been migrated to a successor cartridge. Tape volumes are physically labelled with LTO barcodes that the tape library reads on every mount; the registry row binds the operator's volume label to the barcode.
Tape boundaries do not follow disk-volume or bucket boundaries. Tape writes are organised into tar chunks: a tar chunk is a TAR archive of fixed nominal size that aggregates parts collected from one or more buckets across one or more storage units. The v1 nominal tar-chunk size is 100 GiB. A tape volume holds many tar chunks written sequentially, so an LTO-9 cartridge (18 TB native) holds roughly 180 tar chunks, an LTO-10 cartridge (30 TB native) holds roughly 300.
When the operator schedules a tape write, the writer assembles the next tar chunk by drawing parts from sealed (read-only) buckets on disk, packs them into the TAR archive, and streams the archive to the next sequential position on the target tape. The chunk header records the parts it contains, with their PackageIDs, PartNrs, sizes, and ciphertext digests. The package registry is updated, per part, with the tape barcode and the tar-chunk index of its new tape copy.
A tape is identified by its barcode for physical purposes (mount, eject, audit, migration) and by its volume-registry label for logical purposes (lookup, parent-pointer chains). The two are bound by the registry row.
Retrieval from tape works on whole tar chunks rather than on individual parts. To retrieve a part the library mounts the tape, the head seeks to the chunk index, the entire 100 GiB chunk is read in one streaming operation, and the chunk is extracted to INCOMING (§13.5) as if it were a fresh intake. From there the requested part flows into the regular intake pipeline (§13.6), landing in a fresh bucket on a writable disk volume and getting its Online state recorded in the package registry. A retrieval triggered for service to a customer therefore returns the package to higher-state availability automatically, restarting the idle clock (§15.3).
A tar chunk normally contains parts other than the one a given retrieval requested, since 100 GiB holds many parts. The handling of those incidental parts is operator-local: typically they are kept available in INCOMING for a short coalescing window (a default 24 hours) so that further requests against parts in the same chunk are served from INCOMING without a second tape mount, and are then released. Section 18 describes how this affects retrieval pricing and the protocol open question on incidental promotion is recorded in §21.
Tape volumes used for Nearline and Vault copies are treated as if they were WORM (Write Once Read Many) media, regardless of whether the underlying cartridge is mechanically WORM-only. Genuine WORM cartridges exist (LTO supports a WORM variant) but are substantially more expensive and less universally available than standard cartridges, so the protocol does not require them. The operational discipline is the equivalent of WORM in any case: a rolling-rotation cartridge accumulates tar chunks by sequential append during its writing life (across many library turns under §15.6), but once filled and sealed no further chunks are ever appended, no chunk is ever edited or rewritten, no part is selectively deleted, the cartridge is never wiped, and the cartridge is never returned to the writable pool. Sealed cartridges live forever in the operator's custody: at the fifteen-year migration cycle (§16.3) the cartridge's contents are read out and rewritten to a fresh-generation successor cartridge, and the legacy cartridge is retained in archival custody alongside the successor (it is never destroyed). Sealed tape volumes occupy the Nearline and Vault states (§14); unsealed rolling-rotation cartridges occupy Rolling Online and Rolling Offline.
At the moment a rolling-rotation cartridge fills, the operator additionally slides the cartridge's hardware write-protect tab to the ON position before the cartridge is moved to its long-term location (slot-C cartridges return to the library shelf as Nearline; slot-A and slot-B cartridges leave for vault A and vault B respectively as Vault, through the vault clearance gate). The hardware tab is a mechanical safeguard on the cartridge itself, independent of the operator's software, the drive firmware, and the library's control plane: a drive that detects the tab in the protected position will refuse to write, regardless of what any higher layer instructs. This is the physical companion to the operational WORM discipline. Combined with the air-gap of Vault cold storage and the retain-don't-destroy rule at migration (§16.3), it removes every routine and accidental path by which a sealed cartridge could lose data.
No EFS cartridge is ever wiped and returned to service, and no EFS cartridge is ever destroyed. Every cartridge enrolled in the volume registry is either currently rolling (Rolling Online or Rolling Offline), sealed (Nearline or Vault), or sealed-and-superseded (a legacy cartridge that has been migrated to a successor at the fifteen-year cycle but retained in archival custody as a hedge against migration error or later-discovered defects in the successor generation).
13.10 Tape cartridge labels
Tape volumes accepted by an EFS operator must be drawn from the platform-wide list of permitted cartridge types. At v1 the permitted set is LTO-8 and LTO-9; further generations are admitted to the set by protocol amendment as the ecosystem moves forward. A cartridge whose generation is not on the permitted list cannot be enrolled as a volume in the volume registry (§13.2) and cannot be written to in any state.
Every permitted cartridge carries a physical barcode label sourced exclusively from the founder under the catalog operator licensing agreement. Labels are supplied either at cost price or free of charge, in batches of one thousand labels per order, and are printed on demand. No third-party labels, hand-printed labels, or repurposed labels are admissible: the centralised supply is what makes the labels globally unique across the ecosystem and what binds every cartridge in service to the protocol's lifecycle records.
The barcode follows the format [A-Z]{4}[0-9]{2}{XX}, where the final two characters identify the cartridge generation (L8 for LTO-8, L9 for LTO-9, and so on as further generations are admitted). An example label for an LTO-9 cartridge is CTLG42L9: a four-letter block (CTLG), a two-digit block (42), and the generation suffix (L9). The first six characters form a generation-scoped serial: each generation has its own independent [A-Z]{4}[0-9]{2} series, so the six-character prefix may legitimately repeat across cartridges of different generations (an LTO-8 cartridge labelled CTLG42L8 and the LTO-9 cartridge CTLG42L9 above are distinct, both valid, and both globally unique). The full eight-character code, including the generation suffix, is globally unique.
Cleaning cartridges, which the tape library uses to clean drive heads and which never carry catalog data, use the reserved prefix CLN. Cleaning cartridges do not receive globally unique identifiers because they do not enter the volume registry or the package registry, are never associated with operator data, and are operator-local consumables.
Each barcode sticker carries, alongside the machine-readable barcode that the tape library reads on every mount: a human-readable rendering of the same code for human consumption (audit, manual handling, off-site logistics), the Catalog brand, and a URL that resolves to the cartridge's lifecycle metadata record. The lifecycle record begins at the moment the label is assigned by the founder to an operator, and is updated through the cartridge's life with the operator that currently keeps it, its current write status (writable, accumulating, sealed, migrated-and-retained), and its current physical location class (in-library, controlled offline, off-site rolling holding, vault A, vault B, in transit, archival custody). The lifecycle record is the cross-ecosystem source of truth for what a given cartridge is and where it sits; the operator's own registry rows bind the operator's internal volume label to this canonical lifecycle record via the barcode.
The labels are supplied with a licensing condition that restricts their use to labelling backup tapes within the catalog ecosystem, as specified by this whitepaper. Use of the labels for any other purpose, including labelling cartridges outside the catalog ecosystem, repackaging labels for resale, or applying labels to media that has not been enrolled as an EFS volume, is not permitted by the licensing agreement under which the operator obtained the labels.
13.10.1 The tape label range as a protected work
Unlike the Catalog.ID username namespace (§3.1.2 of the Catalog.ID Whitepaper), where the namespace format itself is a creative work, the tape barcode format [A-Z]{4}[0-9]{2}{XX} is heavily constrained by the LTO Ultrium industry standard (a six-character volume serial and a two-character media-type suffix, with CLN reserved for cleaning cartridges). The originality of the format alone is therefore modest, and the protection claimed here does not rest on the format pattern as a creative expression.
What is protected is the issued-label range: the curated, expanding database of eight-character codes that the Developer has allocated within the LTO-conformant space, together with the lifecycle metadata system, the sticker artefacts, and the registry software that bind a given code to a Catalog-ecosystem cartridge.
13.10.2 Intellectual property basis
The Developer's intellectual property in the tape label range rests on multiple, overlapping legal bases:
- Database right (EU Directive 96/9/EC, sui generis right): The set of allocated labels, together with the lifecycle record bound to each label (assignment to operator, write status, location class, migration history), constitutes a database produced through substantial investment in design, allocation discipline, registry operation, and cross-operator verification. The sui generis right protects against unauthorised extraction or reutilisation of a substantial part of the range. This is the primary basis for the range itself.
- Copyright (Berne Convention, EU Directive 2001/29/EC): The sticker artwork (the visual composition of barcode, human-readable code, Catalog branding, and lifecycle URL on the physical label), the lifecycle metadata schema, and the registry software that issues labels and maintains the lifecycle record are original works of authorship by the Developer. Although the eight-character code itself is constrained by the LTO standard, the Developer's restriction of the six-character serial to the
[A-Z]{4}[0-9]{2}subset, and the integration of that subset with the lifecycle URL, the branding, and the centralised supply, is an expressive design choice that meets the originality threshold for the sticker artefact taken as a whole. - Trademark: The Catalog brand appears on every sticker. Use of the brand on physical labels is governed by the Developer's trademark rights, independent of any copyright or database-right claim.
- Contractual protection: The catalog operator licensing agreement under which labels are supplied establishes that labels are Developer property licensed to the operator for the sole purpose of identifying EFS-enrolled cartridges in the catalog ecosystem. The agreement forbids resale, application to non-enrolled media, and any other use outside the scope this whitepaper defines. The contractual layer is enforceable regardless of which statutory IP rights apply in a given jurisdiction.
13.10.3 Licensing chain
The tape label range is licensed in a two-step chain that parallels the Catalog.ID namespace (§3.1.4 of the Catalog.ID Whitepaper):
- Developer → Operator: The Developer licenses the right to apply allocated labels to EFS-enrolled cartridges to each catalog operator under the terms of the catalog operator licensing agreement, in batches of one thousand labels per order, supplied either free of charge or at cost price.
- Operator → cartridge: The operator applies an allocated label to a specific cartridge at the moment that cartridge is enrolled as a volume in the operator's volume registry (§13.2). From that point forward the label, the lifecycle record, and the cartridge form a single bound triple.
The operator does not own the label; the operator holds a sublicense to keep the label affixed to a specific cartridge for as long as that cartridge serves the catalog ecosystem. EFS cartridges are not destroyed at end of life: when a cartridge is migrated to a successor at the fifteen-year cycle (§16.3) the legacy cartridge is retained in archival custody. The eight-character code therefore remains bound to its original cartridge in perpetuity and is never reassigned, even after the cartridge's contents have been carried forward to a successor.
13.10.4 Continuity
"The Developer" and "the founder" throughout §13.10 refer to Roberto Bourgonjen, the natural person who designed and created the Catalog ecosystem, including the tape labelling scheme and the centralised registry that backs it.
The Developer's role in the labelling scheme is durable beyond the Developer as a natural person. To carry the role forward without dependence on a single lifetime, the Developer intends to establish the Catalog Continuity Foundation, a non-profit successor entity (yet to be incorporated as of v1; jurisdiction and legal form to be settled at incorporation) whose sole purpose is to hold and continue the Developer's role in the Catalog ecosystem. That role includes maintaining the label range, issuing and printing label batches, operating the lifecycle metadata registry, granting and enforcing operator licences, and exercising the database, copyright, trademark, and contractual rights set out in §13.10.2.
Where §13.10 refers to "the Developer" or "the founder" in respect of label supply, range allocation, lifecycle registry operation, or licensing, those references extend to the Catalog Continuity Foundation from the moment the role is formally transferred to it, and to any further successor that subsequently inherits the role under the foundation's governing documents. Until the foundation is incorporated and the role is transferred, the Developer holds the role personally.
13.11 Tape formatting, enrolment, and mount verification
The physical barcode sticker (§13.10) identifies a cartridge externally to the library and to human operators. The operator's volume registry (§13.2) identifies the cartridge in the EFS service's database. The cartridge itself carries a third identity record — a tape header written at the start of the tape — that binds the physical cartridge to the barcode and is checked on every mount. The three identifiers must agree at every mount before any read or write operation proceeds; this protects routine operation, and particularly protects manually operated setups (§15.6), against the human-error class in which a wrong cartridge is loaded into the drive.
Tape header
Every cartridge in service carries a tape header at the start of the tape. The header is a small fixed record containing:
- the cartridge's barcode code (matching the physical sticker, §13.10),
- the cartridge's LTO generation,
- the formatting timestamp,
- the operator's Catalog.ID identifier (the operator who formatted the cartridge),
- a hybrid signature (Ed25519 + ML-DSA-65) over the above fields, signed by the operator's signing key.
The header is written once at formatting time and is never updated. The cartridge's lifecycle state (writable, sealed, migrated-and-retained) is tracked in the volume registry and in the cross-ecosystem lifecycle metadata record (§13.10), not in the tape header. The header is a fixed identity record, not a state record.
Formatting
A new blank cartridge is brought to a sticker-and-header-paired state through a single workflow that:
- Selects an unused barcode sticker from the operator's allocated stock (§13.10) and applies it to the physical cartridge.
- Mounts the cartridge in a drive and writes the tape header, with the barcode code in the header matching the sticker code.
- Reads the header back to verify the write and confirms the cartridge holds the header and no other data.
For manually operated setups (§15.6) the recommended pattern is batch formatting: a skilled operator runs a dedicated formatting session in which a stock of fresh cartridges is processed under undivided attention, building up an inventory of pre-formatted, ready-to-use cartridges. Day-to-day library operations then pick from this inventory when a fresh cartridge is needed. The justification is that "now insert tape XYZ" and "now format a new cartridge" demand different attention modes: the routine mount-and-rotate workflow is procedural and can be handled by ordinary staff following a runbook, whereas sticker-and-header pairing is an attention-sensitive matching task that must not be interleaved with routine work without elevating the error rate. Concentrating the formatting work into dedicated skilled-staff sessions, separated in time and personnel from daily operations, keeps each mode in its own focus. The pre-formatted inventory is held in a controlled, labelled area near the drive.
For robotically operated libraries an operator may format ad-hoc instead, in the moment a fresh cartridge is required (typically when a slot's previous cartridge has just sealed and a successor is needed), because the formatting workflow runs largely under software control and is less exposed to between-task human confusion. Operators may choose batch formatting in any setup; ad-hoc formatting is appropriate only where the formatting workflow does not depend on undivided human attention.
Enrolment
When a formatted cartridge is presented to the EFS service for enrolment in the volume registry, the service:
- Mounts the cartridge and reads the tape header.
- Validates the header signature against the operator's active signing key.
- Checks that the barcode code in the header matches the physical sticker (cross-checked via the library robot's barcode reader for robotically operated libraries, or via human-operator confirmation against the sticker for manually operated setups).
- Checks that the code is not already registered in the volume registry (no duplicate enrolment, no sticker-swap with an in-service cartridge).
- Checks that the code is in the operator's allocation from the founder (§13.10) and is in
unusedstate in the operator's local code inventory. - Confirms the cartridge holds the header and no other data (the cartridge is empty beyond the header).
- Creates the volume registry row binding the operator's internal volume label to the cartridge's barcode and marks the code as
in-servicein the operator's local code inventory.
A cartridge that fails any of these checks (header missing, header signature invalid, code mismatch, code already in use, code not in the operator's allocation, cartridge not empty) is rejected from enrolment and returned to the operator for inspection. Enrolment is the protocol's check against any error introduced during formatting, including batch-formatting errors: a sticker-and-header mismatch surfaces at the enrolment step rather than reaching active service.
Mount verification
Every time a cartridge is loaded into a drive — by the robot in a robotically operated library, or by a human operator in a manually operated setup (§15.6) — the EFS service performs a mount-time check before any read or write operation:
- Read the cartridge's tape header.
- Validate the header signature.
- Confirm that the barcode code in the header matches the code the EFS service expected for this mount (the cartridge the volume registry believes is being loaded into this drive at this time).
If the verification fails — for example, if a human operator in a manually operated setup loads cartridge CTLG42L9 when the system expected CTLG41L9 — the EFS service refuses to continue the operation, logs the mismatch to the audit log, and prompts the operator to recover. In a robotically operated library this check catches barcode-reader or robot-arm misbehaviour; in a manually operated setup it catches human-handling confusion. Either way, no write or read against the wrong cartridge ever proceeds past the mount-verification step.
14. Storage States
A copy of a package part lives in one of seven storage states, distinguished by the medium, by whether the medium is powered, by whether the medium is reachable from the operator's online control plane, and (for tape) by whether the cartridge is sealed or rolling. Storage states are states, not tiers: a single part normally has multiple copies, each in its own state, simultaneously. The operator's user interface presents a part's storage profile as a row of state checkboxes, one box per copy, so that a publisher can see at a glance that, say, two copies are Online, one is Rolling Online, and two are Rolling Offline.
14.1 The seven states
Online. The copy is on a powered, mounted disk or SSD volume reachable from the operator's online control plane. Retrieval is served immediately, in milliseconds for SSD, in tens of milliseconds to seconds for spinning disk. Online is the routine serving state.
Standby. The copy is on a disk volume whose spindle or device has been powered down, but the volume remains catalogued by the operator's online control plane. On a retrieval request the volume is automatically powered up; once spun up (typically tens of seconds for a single drive, around a minute for a RAID array), every part on it can be served at Online speed. Standby reduces idle energy without surrendering reachability or retrievability. Standby is the energy-aware sibling of Online and is the state into which Online copies migrate when their corresponding packages have not been accessed for a long time.
Controlled Offline. The copy is on operator-owned media (disk or sealed tape) that has been removed from the operator's online control plane. The medium may be physically disconnected, parked in a separate rack, held in a controlled-access cabinet, or (for sealed tape) shelved near a drive that the operator does not commit to mounting on a Nearline-class SLA. A retrieval from Controlled Offline requires a documented operator action (mount the medium into a recovery path, run the verification script, copy the part out). Controlled Offline serves two functions: as a hot-spare replacement for an Online, Standby, or Nearline copy that has failed or degraded (allowing the operator to substitute a healthy on-site copy without waiting on Vault retrieval), and as a last-resort defence against compromise of the online control plane.
Rolling Online. The copy is on an LTO tape cartridge that is currently loaded in the operator's tape library as the active rolling-rotation cartridge for one of the three rolling slots A, B, C (§15.6). Like Nearline, retrieval requires the library to mount the tape and read the bucket containing the part. Unlike Nearline, the cartridge is not sealed: it remains writable and accumulates further tar chunks until it physically fills. Rolling Online is a transitional state: the same cartridge will swap with Rolling Offline at each weekly rotation, and on filling will seal terminally into Nearline (slot C, in operators who hold a Nearline service) or Vault (slots A and B).
Rolling Offline. The copy is on an LTO tape cartridge that has been ejected from the operator's tape library and physically moved to off-site rolling holding under the three-slot rolling rotation (§15.6). Off-site rolling holding is distinct from the two vault sites: vault access is reserved for the insertion of sealed cartridges, while rolling cartridges are held at one or more operator-chosen off-site locations that the rotation can move them in and out of routinely. Like Vault, Rolling Offline is air-gapped: physically unreachable from the operator's network, and any direct retrieval would require a manual workflow. Unlike Vault, the cartridge is unsealed and will return to the library at the next weekly rotation for further append. Rolling Offline is the off-site twin of Rolling Online; the same cartridge oscillates between the two states across rotations until it fills and seals.
Nearline. The copy is on a sealed LTO tape cartridge held permanently inside the operator's robotically operated tape library and reachable through automated mount on demand (typically minutes per mount). Nearline is the operator's on-site tape backstop with a 24x7 mount SLA: it is reachable through automation, sits in a controlled environment, and is the source from which Online copies are repaired when a disk volume fails or fails an audit. Nearline is optional in the v1 architecture: an operator without a robotically operated library holds the equivalent sealed slot-C cartridge as Controlled Offline tape instead, and follows the alternative redundancy profile of §14.2. Where an operator does hold Nearline, every Nearline cartridge is a sealed slot-C cartridge produced by the rolling rotation (§15.6).
Vault. The copy is on a sealed LTO tape cartridge that has been ejected from the operator's tape library and physically moved off-site into a data-safe vault. Vault is air-gapped: the cartridge is physically unreachable from the operator's network, and any retrieval requires a documented manual workflow (operator collects the cartridge from the vault, mounts it in a recovery host, reads the bucket, copies the part out, returns the cartridge to the vault). Vault is the unconditional preservation state; sections 15 and 16 discuss what this means for the durability commitment. In the v1 architecture, every Vault cartridge is a sealed slot-A cartridge at vault site A or a sealed slot-B cartridge at vault site B (§15.6).
The seven states span a continuous range from "powered, mounted, instantly served" to "sealed, off-site, manual recovery". Energy consumption falls roughly monotonically along this range; expected retrieval latency rises along it; reachability from the online control plane disappears at Controlled Offline and remains absent for Rolling Offline and Vault. Rolling Online and Rolling Offline are the transitional tape states; Nearline and Vault are the terminal sealed tape states.
14.2 Many copies, many states
A package part normally exists in several states simultaneously, one row in the package registry per state. After the ingestion fan-out (§15.2) and one full rotation cycle (~2 weeks), the part is on:
- 1 copy on a labelled disk volume in Online state (powered, mounted, serving immediately).
- 1 copy on a labelled disk volume on a different storage unit, in Online or Standby state per operator policy (the second storage unit may be in the same facility as the first or in a different facility).
- 1 copy in Rolling Online state (the rolling-rotation cartridge currently loaded in the operator's tape library, whichever of slots A, B, C is in turn).
- 2 copies in Rolling Offline state (the rolling-rotation cartridges for the other two slots, currently held at off-site rolling holding).
That is five copies across two media types, in transitional-rolling tape form. As each of the three rolling-slot cartridges fills and seals (§15.6), the corresponding tape copy transitions from its rolling form into its terminal sealed form:
- the sealed slot-A cartridge becomes a Vault copy at vault site A,
- the sealed slot-B cartridge becomes a Vault copy at vault site B,
- the sealed slot-C cartridge becomes a Nearline copy in the library.
The steady-state terminal profile, reached within at most twelve months of acceptance (§15.6 enforces a twelve-month maximum age on rolling cartridges), is therefore:
- 1 copy in Online state,
- 1 copy in Online or Standby state on a second storage unit,
- 1 copy in Nearline state (sealed slot-C cartridge in the library),
- 2 copies in Vault state (sealed slot-A cartridge at vault A, sealed slot-B cartridge at vault B).
Five copies, three states, two media types, with two off-site vault sites. The structure of the floor is the same in both the rolling and the sealed forms; what changes at sealing is that the cartridge is hardware-write-protected (§13.9) and physically moved into a higher-clearance location (the vault, for slots A and B) or kept in place (the library, for slot C). The choice between Online and Standby for the second disk copy is a state question, not a copy-count question: the copy is always present. Operator pressure responses (§15.4) move copies between states without dropping below the floor commitment of §16.
A second Nearline cartridge per part is reserved for a future hardware-certification level and is not part of the v1 floor.
Alternative profile without a robotic library
An operator that runs the rolling rotation against a single drive without a robotic tape library (§15.6, "Manually operated tape backup") produces the same three sealed tape copies per chunk as a fully-equipped operator: sealed slot-A at vault A, sealed slot-B at vault B, sealed slot-C on a nearby shelf at the library facility. The redundancy count is unchanged at five copies. What changes is only the storage state of the slot-C copy and the SLA associated with it:
- 1 copy in Online state,
- 1 copy in Standby state on a second storage unit (Standby is required, not optional, when Nearline is absent: the Standby disk carries the warm-backstop role that Nearline would otherwise carry, and must be reachable through automatic spin-up),
- 1 copy in Controlled Offline (tape variant) state — the sealed slot-C cartridge on a nearby shelf, reachable through documented human-operator action on a bounded staffing window rather than through automated 24x7 mount,
- 2 copies in Vault state.
Five copies, four states, two media types. The trade-off, set out at §15.3, is that the operator cannot release the second disk copy: the ten-year tape-only release threshold of §15.3 does not apply to operators without Nearline, because there is no Nearline backstop to fall back on. Disk-side copies are retained indefinitely on the new storage unit at each migration. The Vault pair remains the unconditional preservation floor for both profiles.
14.3 What state transitions are routine, what are bespoke
Some state transitions are part of normal operation:
- Online to Standby and Standby to Online: driven by access frequency and operator energy policy. Standby-to-Online is automatic on retrieval. Online-to-Standby for a specific package happens at storage-unit migration time (§15.3); operator-wide Standby moves under pressure response are also possible (§15.4).
- Disk to tape-only release: a package whose idle time has crossed the ten-year threshold (§15.3) is no longer copied to a disk volume on the next storage-unit migration, leaving it on tape only.
- Nearline to Online (rehydration): triggered by retrieval of a tape-only-floor package (§15.3). The chunk containing the part is read into INCOMING and the part flows back to disk through the regular intake pipeline.
Other transitions are bespoke:
- Vault retrieval: requires the operator to physically retrieve a cartridge from the vault. Section 16.6 describes this flow.
- Controlled Offline restoration: requires the operator to mount the disconnected drive into a recovery host. This is used either as repair of a failed Online or Standby copy, or as a controlled disaster-recovery event.
14.4 The vault is the unconditional state
Of the five states, only Vault is structurally defended against the failure modes that can take down the operator's online infrastructure as a whole. Online, Standby, Nearline, and to a lesser extent Controlled Offline all sit close enough to the operator's control plane that a single sufficiently bad event (ransomware that crosses replication boundaries, a credential compromise paired with a malicious automation, a fat-fingered bulk command, a regional energy or hardware crisis) can in principle reach them. A cartridge that has been ejected from the library and placed off-site cannot be reached by any of these. Section 16 develops the durability argument that rests on this property.
15. Storage Lifecycle
This section describes how a package part moves through the storage states (§14) over its preservation life. The lifecycle has four parts: ingestion fan-out, the steady state, idle-driven release, and operator pressure response.
15.1 Lifecycle terminology
Three terms denote points in the lifecycle without renaming the storage states:
- Full redundancy floor: the v1 five-copy state that follows ingestion fan-out. Two disk-side copies on different storage units (the first Online, the second Online or Standby per operator policy), one tape copy in Nearline (or Rolling Online during the pre-seal phase), two tape copies in Vault (or Rolling Offline during the pre-seal phase). Whether the second disk copy is in Online or Standby state is a question of state, not of copy count: the copy is present in either case. The full floor is reached within roughly two weeks of acceptance in rolling form (§15.6), and transitions to sealed form as each rolling-slot cartridge fills or reaches the twelve-month rolling-state cap; every chunk is therefore in its terminal sealed Nearline-or-equivalent + 2 Vault profile within at most twelve months of ingestion.
- Tape-only floor: the three-copy state that remains after the disk-side copies have been released at a storage-unit migration following the ten-year idle threshold (§15.3). One Nearline plus two Vault. By the time a package reaches the ten-year threshold, all rolling-slot cartridges covering it have long since sealed.
- Vault-only floor: the two-copy state that remains when the operator has released the Nearline copy as well, under sustained pressure (§15.4). Two Vault.
These terms describe redundancy floors, not service tiers. The operator's user interface continues to present the storage profile as state checkboxes, not as a tier label.
15.2 Ingestion fan-out
When an EFS operator accepts a package, each part follows a deterministic fan-out across the storage states. The fan-out lands each part on all three rolling-rotation cartridges (one Rolling Online in the library, two Rolling Offline at off-site rolling holding) within roughly two weeks of acceptance; each rolling-rotation cartridge transitions to its terminal sealed state (Nearline for slot C, Vault for slots A and B) when it physically fills or reaches the twelve-month rolling-state cap under §15.6, so the part is in its terminal sealed profile within at most twelve months of acceptance. Tape writes use the tar-chunk batch model (§13.9): parts accumulate from buckets across multiple storage units into a 100 GiB tar chunk, and the tar chunk is written to tape as a single sequential stream. Effective fan-out times are often much shorter than the bounds below.
| Time after acceptance (maximum) | Action |
|---|---|
| T+0 | Part received and verified into INCOMING (SSD). |
| Within 1 hour | Part copied from INCOMING to a writable bucket on the first labelled disk volume (first disk copy, Online). The INCOMING source is preserved, so the part is on two media (SSD and disk) from this point. |
| Within 4 hours | Part copied to a writable bucket on a second labelled disk volume on a different storage unit. Second disk copy is held Online or in Standby per operator policy. INCOMING entry is released. |
| Within 24 hours | The tar chunk containing the part is appended to the rolling-rotation cartridge currently loaded in the library (Rolling Online, §15.6). The part is now on its first tape, on-site. |
| Within ~1 week | At the next weekly rotation, the previously loaded rolling cartridge ships to off-site rolling holding (Rolling Offline); the next slot's rolling cartridge is loaded and caught up, appending the chunk. The part is now on two rolling-slot cartridges, one Rolling Online and one Rolling Offline. |
| Within ~2 weeks | At the following weekly rotation, the third slot's rolling cartridge is loaded and caught up, appending the chunk. The part is now on all three rolling-slot cartridges: one Rolling Online and two Rolling Offline. |
| When the slot-C rolling cartridge covering the part fills or reaches twelve months from its own first write, whichever comes first | The slot-C cartridge seals; the part acquires its permanent Nearline copy on a sealed in-library cartridge (or its Controlled Offline tape copy under the §14.2 alternative profile). |
| When the slot-A rolling cartridge covering the part fills or reaches twelve months from its own first write, whichever comes first | The slot-A cartridge seals and is shipped through the vault clearance gate to vault site A; the part acquires one permanent Vault copy. |
| When the slot-B rolling cartridge covering the part fills or reaches twelve months from its own first write, whichever comes first | The slot-B cartridge seals and is shipped through the vault clearance gate to vault site B; the part acquires its second permanent Vault copy. |
The bucket size, the tar-chunk size, and the rolling-rotation cadence specifics are operator-local within the bounds set out at the protocol level (§21). Every rolling-slot cartridge seals within at most twelve months of its own first write (§15.6); chunks written shortly before a cartridge's seal land in sealed state within days, and chunks written shortly after a cartridge's enrolment land in sealed state within up to twelve months. The whole-archive guarantee: every chunk is in its terminal sealed Nearline-or-equivalent + 2 Vault profile within at most twelve months of ingestion. Per-part redundancy during the pre-seal phase is provided by the rolling cartridges themselves under §15.6.
Once all three rolling-slot cartridges covering the part have sealed, the part is fully archived in its steady-state terminal profile: five copies, three states, two media types, two off-site vault sites (§14.2).
15.3 Idle thresholds and storage-unit migration
The lifecycle is governed by two fixed idle thresholds, applied at storage-unit migration time. The thresholds are protocol-defined and are not configurable per package or per operator.
- Five years idle without customer-initiated retrieval makes a package eligible to have its disk-side copies moved to Standby state at the next storage-unit migration. Up to five years, the first disk copy is required to be in Online state (immediately serving); the second disk copy may already be in Standby per operator policy under §14.2.
- Ten years idle without customer-initiated retrieval makes a package eligible to be released from disk entirely at the next storage-unit migration. Its disk-side copies are not carried over to the new storage unit; the package drops to the three-copy tape-only floor (one Nearline plus two Vault). This threshold applies only to operators that hold a Nearline copy. Operators without Nearline (§14.2 alternative profile, §15.6 "Manually operated tape backup") retain both disk-side copies indefinitely on the new storage unit, because the Standby disk is carrying the warm-backstop role and cannot be released.
A customer-initiated retrieval of a package resets the idle clock from the time of retrieval, regardless of which state served the retrieval. Operator-internal activities (audits, fixity checks, repair, tape migrations, RAID rebuilds) do not count as customer access and do not reset the timer.
Append-only volumes and migration-time release
EFS labelled volumes are append-only. A part written to a volume is not selectively deleted; the part stays on its volume until the entire volume is migrated and retired (§13.3). Idle thresholds therefore do not trigger an immediate state change for a specific package. They define eligibility, not action; the action happens at the next storage-unit migration.
A storage-unit migration is the operational event in which the operator moves the contents of an old storage unit onto a new storage unit. Migrations are triggered by capacity expansion (a new generation of larger drives lets the operator consolidate several old units into a new one) or by enclosure end-of-life. Storage-unit enclosures are typically good for around twenty years (with continuous drive replacement on failure), but operators commonly run a faster capacity-driven migration cycle of every five to ten years as new disk generations arrive. The exact cadence is operator-local and depends on hardware availability and economics.
At each storage-unit migration the operator reapplies the idle thresholds:
- packages with less than five years idle keep both disk copies in their current Online states on the new storage unit;
- packages with five to ten years idle have their disk copies placed in Standby on the new storage unit;
- packages with more than ten years idle are not carried over to the new storage unit; they drop to the tape-only floor and the FAST cache, if it held them, is reclaimed.
Because the actual transition happens at migration time rather than at the exact idle anniversary, a package's effective Online retention depends on the operator's migration cadence. A package eligible for Standby at year five may remain Online until year seven if that is when the next migration runs; a package eligible for tape-only release at year ten may remain on Standby disk until year twelve. Operator pressure response (§15.4) can shorten this in either direction.
A subsequent customer retrieval of a tape-only-floor package rehydrates it. The operator reads the chunk containing the part from a Nearline tape into INCOMING, the part flows through the regular intake pipeline into a fresh bucket on a writable disk volume (§13.6), it acquires Online state again, and the idle clock resets. The retrieval price reflects the chunk-read cost (§18.3).
15.4 Operator pressure response
Sustained external pressure can threaten an operator's ability to keep its full online infrastructure running: order-of-magnitude electricity-price spikes, prolonged grid instability, regional fuel or hardware disruption, an episode that materially affects staffing. The operator has a graduated set of responses, none of which compromises the unconditional Vault floor.
Energy moves: Online to Standby. The operator may move Online copies into Standby. Idle energy drops sharply; reachability from the online control plane is preserved; retrieval picks up a one-minute spin-up latency. This is the lightest response and is largely invisible to publishers.
Lifecycle compression. Under more sustained pressure the operator may transition idle packages out of the disk states ahead of the five-year and ten-year thresholds, scheduling an out-of-band storage-unit migration that applies tighter eligibility cuts (for example three years to Standby, six years to tape-only release). The protocol specifies how much compression an operator may apply and how it must announce it (§21).
Scheduled online availability. Under heavier pressure the operator may power its online tier (disk volumes and tape-library frame) only during declared windows. Outside the windows, retrieval requests queue and are served when the next window opens. The redundancy floor is not changed by scheduled availability; what changes is when retrievals can be served.
Vault-only contraction. Under the most severe pressure the operator may release the Nearline tape copies as well, leaving only the two Vault tapes. The Vault tapes continue under the fifteen-year migration cycle (§16.3). Retrieval becomes a §16.6 manual operation. The operator's service contract identifies under what conditions this contraction may occur, how publishers are notified, and how Vault retrievals are priced.
The four responses compose. The operator chooses the rung that matches the severity of the pressure event and the duration of its expected persistence, and steps back down the ladder when the pressure eases. Movement back up is also operator discretion.
15.5 What pressure response can affect, and what it cannot
Pressure response can compress the five-year and ten-year idle thresholds, schedule out-of-band storage-unit migrations to apply those tighter thresholds sooner, narrow or close the windows in which online retrieval is offered, contract packages to a smaller set of states ahead of the lifecycle schedule, and in the extreme suspend online retrieval entirely while a pressure event runs its course.
What pressure response cannot affect is the Vault pair: the two air-gapped vault tape copies and their fifteen-year migration cycle (§16.3) remain in force regardless. The eternal-preservation commitment rests on the Vault pair and is therefore unconditional; everything above the Vault pair, including the idle thresholds, the routine 24x7 online retrieval path, and the published convenience commitments, is a normal-conditions optimisation that the operator may scale back to keep the floor intact.
15.6 Three-slot rolling tape rotation
The operator's tape backup runs on a three-slot rolling rotation. The rotation produces each part's Nearline and Vault copies as the terminal sealed forms of its rolling-rotation cartridges, without dedicated per-part Vault tape pairs and without wasteful tape consumption at low ingestion volumes.
The three slots
The rotation uses three slot positions, named A, B, and C. At any moment, exactly one slot's cartridge is loaded in the operator's tape library (robotic or manually operated, see "Manually operated tape backup" below); the other two slots' cartridges sit at off-site rolling holding. At each weekly rotation event the cartridge currently in the library is ejected and shipped to off-site rolling holding, and the next slot's cartridge is returned from off-site rolling holding and loaded into the library. Slots cycle A then B then C then A. Each slot therefore spends roughly one week in three inside the library.
A rolling cartridge stays in its slot, accumulating tar chunks across many library turns, until it physically fills. At low ingest volumes the same cartridge may serve its slot for months or years.
Catch-up and live append
When a rolling cartridge is loaded into the library, the operator catches it up. Catch-up appends two classes of chunk to the rolling cartridge:
- Pending chunks (the routine class): every tar chunk whose copy on this slot is still pending and which is not already on this cartridge. Catch-up of this class is needed because while the slot was off-site, new chunks accumulated on whichever other slot was loaded during those weeks.
- Ceiling-driven migration chunks (the safety class): every tar chunk in this slot's series whose current copy sits on a sealed predecessor cartridge that has exceeded the fifteen-year ceiling (§16.3). This class is the automatic enforcement of the ceiling and runs whether or not the operator has declared an upgrade event; operator-declared migration projects normally discharge aging cartridges well before the ceiling and are described in §16.3.
During the working week, newly produced tar chunks (§13.9) are appended live to the currently loaded rolling cartridge as ingestion fan-out (§15.2) generates them.
Sealing
When the currently loaded rolling cartridge fills (whether during catch-up or during live append), or when it reaches twelve months from its own first write, or when it reaches the fifteen-year ceiling (§16.3) without having filled (whichever comes first):
- The cartridge is closed; the hardware write-protect tab is slid to the ON position (§13.9). The cartridge's storage state transitions out of Rolling Online.
- A fresh empty cartridge is enrolled in the volume registry under the same slot label and immediately resumes live append for that slot in Rolling Online state.
- At the next weekly rotation event, both cartridges leave the library together: the sealed cartridge to its permanent destination, the fresh successor cartridge to off-site rolling holding to await its next library turn.
A sealed cartridge is permanent: never appended to, never wiped, never returned to the writable pool. It stays at its destination until the fifteen-year tape migration cycle (§16.3) rewrites its contents into a successor cartridge.
The twelve-month sealing limit is the rolling-state lifetime cap. Rolling-state cartridges are still handled manually (mounted weekly, transported between library and off-site rolling holding, written to) and are therefore exposed to handling risk and to operator-level access on the writable side. The twelve-month cap guarantees that every chunk reaches the sealed Nearline or Vault state — where the cartridge is hardware-write-protected, no longer handled in routine rotation, and (for Vault) accessible only behind the vault clearance gate — within twelve months of writing, regardless of how slowly the cartridge fills. For low-volume operators this means partially-filled cartridges seal at twelve months and a fresh cartridge takes over; the trade-off is accepted because the security and access-control benefits of the sealed tier dominate over cartridge-utilisation efficiency.
Permanent destinations: how the Nearline and Vault copies arise
A simple per-slot rule sets where a sealed cartridge lives permanently:
- Sealed slot A cartridges go to vault site A, permanently. Storage state on arrival: Vault.
- Sealed slot B cartridges go to vault site B, permanently. Storage state on arrival: Vault.
- Sealed slot C cartridges return to the library and remain there permanently. Storage state on arrival: Nearline.
The library therefore accumulates a growing collection of sealed slot-C Nearline cartridges alongside the single currently-loaded rolling cartridge for whichever slot is in turn. Vault site A accumulates sealed slot-A Vault cartridges; vault site B accumulates sealed slot-B Vault cartridges. The Nearline and Vault copies referenced in the redundancy floor (§16.1) are exactly these sealed cartridges; no separate per-part Nearline or Vault fan-out exists.
Off-site rolling holding
Off-site rolling holding is distinct from the two vault sites. The vault sites accept incoming cartridges only when they have been sealed: their access regime is correspondingly restrictive, and may be operated under a different clearance level from the operator's day-to-day staff. Rolling cartridges, by contrast, move in and out of the library every week and require routine logistics access; they cannot live behind the vault clearance gate.
Off-site rolling holding may be operated as a single off-site facility, as multiple off-site facilities, or as a logistics-provider intermediary; the choice is operator-local. The only protocol requirement is that the holding location be physically off-site from the operator's library facility. Operators are encouraged but not required to split the three slots across two or more off-site holding locations so that no single off-site loss event removes all currently-rolling cartridges at once.
Block redundancy invariant
A tar chunk produced on day D is appended live to whichever rolling cartridge is loaded that day. At the next library turn for each of the other two slots, the chunk is also appended (under the catch-up rule) to those slots' rolling cartridges. Within roughly two weeks of production, the chunk therefore exists on all three rolling-slot cartridges that were active at the time: one in the library (Rolling Online) and two at off-site rolling holding (Rolling Offline).
When each of those three rolling cartridges eventually seals, the chunk is captured into its slot's permanent home: one permanent copy at vault A (Vault), one at vault B (Vault), one in the library (Nearline). Three permanent copies.
The three sealed cartridges that together cover a given chunk do not in general carry identical block ranges. The slot-A sealed cartridge containing chunk N may cover a different range of other chunks from the slot-B and slot-C sealed cartridges that also contain chunk N, because the slots fill and seal at staggered moments. The package registry (§13.8) and the system registry (§13.8a) record, per part, the tape barcode and tar-chunk index of every copy, so per-slot divergence is operationally inert.
Rotation transit and the steady-state floor
In steady state between rotation events, one rolling cartridge is in the library (Rolling Online) and two are at off-site rolling holding (Rolling Offline). During a rotation event itself the inbound cartridge arrives at the library and the outbound cartridge departs for off-site holding; depending on logistics sequencing, brief transit windows may have two rolling cartridges on-site (returning cartridge already arrived, departing cartridge not yet shipped) or, more rarely, zero rolling cartridges on-site (departing cartridge already shipped, returning cartridge not yet arrived). Operators sequence transit so that the departing cartridge leaves the library only after the incoming cartridge has been verified loaded.
The redundancy floor of §16.1 holds continuously from roughly two weeks after first write, in either rolling form or sealed form. The transition from rolling form to sealed form does not change the copy count; it converts a writable cartridge into a hardware-write-protected permanent cartridge and either keeps it where it is (slot C, library) or moves it through the vault clearance gate into permanent off-site storage (slots A and B).
Operator-internal files
The rolling cartridges also carry operator-internal infrastructure files alongside ingested user-package parts. The current case is the EFS database's plain-text log files (§16a.2), which ride on the currently loaded rolling cartridge under the same append rules as user tar chunks. Operator-internal files inherit the same three-copy redundancy as user content: within roughly two weeks of writing they are on all three rolling-slot cartridges (Rolling Online plus two Rolling Offline), and they acquire one Nearline and two Vault copies as the rolling cartridges seal. Operator-internal files do not count toward any user package's availability claim and are tracked through the system registry (§13.8a).
Manually operated tape backup
The rolling rotation does not require a robotically operated tape library. A small operator (and the v1 founder-operator during initial rollout) may run the same three-slot rotation against a single LTO drive without a library robot, attended by a human operator who responds to mount and unmount requests. In this setup:
- The drive sits in the operator's facility, attended by a human operator on a defined staffing window.
- All three slots A, B, C still exist and produce the same three sealed tape copies per chunk as a robotically equipped operator: sealed slot-A at vault A, sealed slot-B at vault B, sealed slot-C on a nearby shelf at the library facility. The redundancy count is unchanged (§14.2).
- Sealed slot-C cartridges accumulate on a shelf next to or near the drive. They are reachable through human-operator action on the operator's staffing window, but not through automated 24x7 mount, and therefore sit in the Controlled Offline (tape variant) state rather than in Nearline state (§14.1).
- The weekly rotation event is performed by hand: the human operator unloads the currently loaded rolling cartridge, hands it off to the off-site rolling holding logistics path, receives the next slot's cartridge from off-site rolling holding, and loads it into the drive.
- Catch-up and live append run the same algorithms as in the robotically operated case; only the mount operations involve human action.
The protocol-visible consequences are confined to two:
- Nearline state is not held by this operator. The slot-C tape copy sits in Controlled Offline (tape variant) instead. The redundancy floor of §16.1 alternative-profile applies.
- The ten-year tape-only release threshold of §15.3 does not apply. Disk-side copies are retained indefinitely, with the second disk copy required to be at least Standby (the warm-backstop role Nearline would otherwise play).
Every other aspect of §15.6 carries forward unchanged: the catch-up rule, the live append rule, the sealing rule (including the fifteen-year ceiling auto-fire), the permanent destinations rule, the off-site rolling holding semantics, the block-redundancy invariant, the operator-internal-files handling, and the operator-visibility framing. An operator that later acquires a robotic library may upgrade to the full §14.2 profile without changing any other protocol commitment; the existing slot-C shelved cartridges become Nearline cartridges in the new robotic library at the moment the library is brought into service.
Suspending the rotation for ceased-intake operators
An operator that has ceased accepting new ingest may suspend the weekly rotation. The three rolling cartridges remain at their then-current locations (one in the library, two at off-site rolling holding) and the operator continues to serve retrievals from all storage states unchanged. The ceiling-driven migration class of the catch-up rule still applies: the operator must re-engage the rotation, or run a §16.3 migration project, before any cartridge in their inventory crosses the fifteen-year ceiling. An operator-declared "new-generation LTO library attached" event under §16.3 is the normal path for a ceased-intake operator; the weekly rotation does not need to run between such events.
If the operator resumes ingest, they re-engage the weekly rotation with the same three rolling cartridges in place; any new chunks accumulated through interim operator-declared migration events ride forward under the routine catch-up rule.
Operator-visibility
The three-slot rolling rotation is operator infrastructure. What it produces (the sealed Nearline and Vault cartridges, plus the transitional Rolling Online and Rolling Offline cartridges) is visible through the package registry on a per-part basis and through the volume registry on a per-cartridge basis. Section 16.7 records the rotation's role in the operator's disaster-recovery posture.
16. Permanence and Redundancy
Every part accepted by an operator is preserved indefinitely. The redundancy floor depends on the package's lifecycle position (§15) and on whether the operator is responding to sustained pressure, but the Vault pair is always present.
16.1 The three floors
The full redundancy floor consists of two disk-side copies on different storage units (the first Online, the second Online or Standby per operator policy; both copies may be in the same facility), one Nearline tape copy inside the operator's robotic library, and two Vault tape copies at two separate off-site vault sites. Five copies, three states, two media types. The two disk copies on different storage units protect against the failure of any single storage unit; the off-site Vault pair protects against any single-facility loss and against compromise of the operator's online infrastructure (§16.2). This floor maps to NDSA Level 2 on the National Digital Stewardship Alliance's preservation scale: at least three complete copies, at least two storage media, copies in geographically distinct locations.
During the pre-seal phase, the part's three tape copies sit on the three rolling-rotation cartridges in their respective Rolling Online and Rolling Offline states. Rolling Online and Rolling Offline carry the same per-copy guarantees as Nearline and Vault (medium, off-site placement, air-gap) and the floor is therefore continuously held; sealing converts the cartridges in place without changing the copy count or geography.
EFS does not promise that the two disk-side copies live in separate facilities. Multi-facility redundancy at the disk tier is not part of the within-operator floor; a publisher who wants disk-side copies in separate facilities should place the package at two operators (§16.4), where the disk copies of the second operator are by construction in a different facility, under different staffing, and under a different operating jurisdiction.
The tape-only floor consists of the three LTO tape copies (one Nearline, two Vault) with the disk copies released. Three copies, three locations (one in the library, two in vaults), one medium type. This still maps to NDSA Level 2.
The vault-only floor consists of the two Vault copies. Two copies, two locations, one medium type. This maps to NDSA Level 1, the absolute durability minimum, and is reached only under sustained operator pressure (§15.4).
In every floor, the off-site locations of the Vault pair are chosen so that no single fire, flood, facility loss, or online compromise can take down both Vault copies, and the air-gap defences of the Vault pair (§16.2) remain in force.
Alternative full floor for operators without Nearline
An operator running without a robotically operated tape library (§14.2 alternative profile, §15.6 "Manually operated tape backup") holds a structurally similar full floor in which the Nearline copy is replaced by a Controlled Offline tape copy and the second disk copy is required to be at least Standby. The full floor for such an operator is two disk-side copies (one Online, one Standby on a different storage unit), one Controlled Offline tape copy (the sealed slot-C cartridge on a nearby shelf at the library facility), and two Vault copies at two separate off-site vault sites. Five copies, four states, two media types. The tape-only and vault-only floors of §15.3 / §15.4 are not reachable for this operator: their disk-side copies are retained indefinitely. The NDSA Level 2 mapping holds (three complete copies, two storage media, geographically distinct locations).
16.2 The Vault pair as the unconditional floor
A tape that has been ejected from the library and physically removed from the operator's network is unreachable by:
- operator mistakes: a fat-fingered command, a misconfigured script, a botched migration;
- bad automation: a process that propagates a single bad write across replicas;
- ransomware on the operator's online systems;
- malicious insiders with credentials but no physical access to the off-site location;
- cascading deletion or overwrite through the operator's replication topology;
- compromised credentials or stolen administrative tokens;
- prolonged loss of cheap electricity or of network reachability to the operator's facility.
Online, Standby, Nearline, and Controlled Offline copies, however well managed, share enough of the operator's control plane and energy posture to share its failure modes. A cartridge at an off-site location does not.
The lifecycle in §15.2 and §15.6 brings a part to its Vault protection in stages. Within roughly two weeks of acceptance, the part is on three rolling-rotation cartridges: one Rolling Online in the library, two Rolling Offline at off-site rolling holding. The air-gap defences listed above engage on each rolling-offline cartridge from the moment it leaves the library, before any sealing event. As each rolling-slot cartridge fills, it seals and transitions to its terminal state: slot-A cartridges move through the vault clearance gate to vault site A as Vault, slot-B cartridges to vault site B as Vault, slot-C cartridges remain in the library as Nearline. Sealing does not move bytes; it write-protects the cartridge in hardware and, for slots A and B, walks the cartridge through the vault clearance gate into its permanent home.
If the rolling rotation is interrupted, restoring the three-slot weekly cycle takes precedence over other tape work so that no rolling-slot cartridge spends an extended period off-cadence.
16.3 Tape migration
Tape volumes are write-once and selective deletion of a single part from a tape volume is impractical; operators do not undertake it. Tape migration runs as a separate mechanism from the weekly rolling rotation (§15.6) and is governed by two triggers:
-
Operator-declared "new-generation LTO library attached" event. When the operator brings a new-generation LTO library into production, they declare the event in the volume registry with a parameter
Nin the range0..15. The declaration schedules migration of every chunk in the operator's archive whose current copy sits on a cartridge that has exceededNyears of write-life. The migration target is the new generation.N = 0migrates the entire archive;N = 15migrates only what would otherwise hit the ceiling. -
Fifteen-year ceiling (automatic). Any chunk whose current copy sits on a cartridge that has exceeded fifteen years of write-life is migrated automatically through the rolling-rotation catch-up rule of §15.6, regardless of whether the operator has declared an upgrade event. The ceiling is the protocol's safety floor against drive obsolescence and is non-negotiable.
The choice of N is driven by drive backward-compatibility. LTO generations vary in how far back their drives can read. The consortium's historical pattern was two-generations-back read support, so an operator on LTO-N could keep cartridges down to LTO-(N-2) without retaining old drives. LTO-10 broke that pattern by reading only LTO-10, which means an operator upgrading from LTO-9 to LTO-10 cannot keep any LTO-9 cartridges without retaining working LTO-9 drives. In that situation the operator may declare N = 0 to migrate everything off LTO-9 and earlier and decommission the LTO-9 read hardware entirely. An operator whose new generation does support reading older cartridges may pick a higher N and budget for keeping working drives of older generations in service. The N = 0 choice is not specific to small archives; it is the natural choice whenever the new generation's lack of backward compatibility makes keeping old read hardware impractical or impossible.
Backstop horizon. The fifteen-year ceiling sits inside the manufacturer-rated 30-year archival life of LTO media. The harder constraint is reader hardware: each LTO generation is manufactured for a finite production window, and the practical read window for cartridges of any given generation is bounded by drive supply rather than by media life. The LTO consortium's historical two-generations-back read specification gave cartridges of any generation a roughly fifteen-year practical read window between release and effective obsolescence; LTO-10's narrower read scope tightens that pattern. As a concrete benchmark, LTO-4 (released 2007) is no longer in manufacture: a 2026 LTO-4 deployment depends entirely on the refurbished-drive market. The fifteen-year ceiling is set so that no chunk is left on a cartridge generation past the worst-case drive availability horizon under either the historical or the tightening pattern. In practice, operator-declared events trigger migration well before the ceiling, following hardware purchase cadence rather than calendar time.
Mechanics of a migration project. When a migration event runs (operator-declared or ceiling-driven), the operator identifies the migration-eligible chunks from the volume registry and groups them by source cartridge. For each source cartridge:
- A sealed slot-C predecessor is mounted in the operator's library directly.
- A sealed slot-A or slot-B predecessor is brought back from vault A or vault B through the vault clearance gate to a controlled migration host (or to the operator's library if the operator's tooling supports it).
- The source cartridge is read end-to-end and its tar chunks are written into one or more fresh-generation successor cartridges in the same slot's series. Because the read source is a single predecessor cartridge, the read pattern is linear (a sequential restore of the predecessor's tar chunks in order), not a fragmented gather across the wider archive.
- The package registry's per-part volume bindings are rewritten to point at the new cartridge.
- The legacy cartridge transitions to
migrated-and-retainedand returns to its archival location.
For operators running the weekly rolling rotation, the migration writes flow through the rotation's catch-up rule as an additional class of chunks alongside pending chunks. For operators who have suspended the rotation (§15.6), migration writes are scheduled as a dedicated batch project; in that case the successor cartridges are sealed directly on completion of the batch and dispatched to their permanent destinations.
The legacy cartridge is retained in archival custody, not destroyed: a successor cartridge that turns out to carry a migration error or that suffers a later-discovered defect (a write-time read-back miss, a manufacturing flaw in the successor generation, a generation-wide media bug discovered years after deployment) can be replayed from the legacy cartridge as long as the legacy cartridge survives, so destroying it would discard a hedge against the very risks that motivate migration. Legacy cartridges are held under the same archival conditions as their successors: Vault-state cartridges remain at their vault site (in a dedicated legacy section), Nearline-state cartridges remain in the library or in a library-side archival cabinet, and the volume registry records the cartridge as migrated-and-retained with a pointer to its successor.
Vault tapes are never migrated through the operator's routine online tape library traffic; the migration host is a dedicated, controlled path.
16.4 Cross-operator redundancy
The redundancy floors above are within-operator commitments. Placing a package at two operators independently doubles every floor: eight to ten copies at the full-redundancy floor, six at the tape-only floor, four at the vault-only floor, across at least four to six locations under at least two organisational and legal regimes. Cross-operator placement is also the only path to disk-side multi-facility redundancy (§16.1) and the only protection against operator failure and against correlated regional pressure, neither of which within-operator redundancy covers. The choice of how many operators, in which jurisdictions, is the publisher's.
16.5 Permanence is not contingent on continuing payment
A part's permanence is a property of its having been accepted, not of any continuing payment. There is no expiry, no renewal, no lapse to deletion. A part enters EFS once and stays.
Mute (§20) removes a part from public service but does not erase its archival copies. Publication is revocable; history is not.
16.6 Retrieval from Vault
Retrieval from Vault is scheduled physical work, submitted through a dedicated batch retrieval API rather than through the routine retrieval protocol. It is the path used for catastrophic recovery (when an operator has lost all higher-state copies) and for routine retrieval of vault-only-floor packages (when the operator has contracted to that floor under sustained pressure, §15.4). The flow is:
- The requestor submits a batch of PackageIDs and PartNrs to the Vault retrieval API. Retrieval is not restricted to the publisher; any party with a funded Bitcash wallet may submit a Vault retrieval batch. (What the requestor receives is the encrypted ciphertext; decryption requires an effective wrapped-key record under §12, which is a separate matter from the storage-side retrieval gated here.)
- The operator looks up the per-tape, per-chunk, and per-byte breakdown for the batch from the package registry (§13.8) and returns a quote against the structured pricing of §18.3, together with an estimated time to fulfilment based on its current vault-collection cadence.
- The requestor pays the quote through Bitcash (§18.6).
- The operator schedules the vault collection. When the cartridges have been collected, mounted, the relevant tar chunks read into INCOMING, and the requested parts verified, the operator notifies the requestor that the parts are available.
- The requestor retrieves the parts through the regular retrieval protocol. The requested parts are returned to higher-state availability as part of what the retrieval fee paid for: they flow through the regular intake pipeline into a fresh bucket on a writable disk volume, acquire Online state, and have their idle clock reset under §15.3. The non-requested parts that were extracted from the same tar chunks as a byproduct (§13.9) are kept available in INCOMING for a short coalescing window (operator-local, default 24 hours) and are then removed. They are not promoted to Online state and the package registry is not updated to reflect them, because their retrieval was not paid for. While they sit in INCOMING, however, a fresh retrieval request for any of them can be served immediately at the Online retrieval rate (0.001 BIT per KiB, §18.3), without the per-tape and per-chunk penalties: the cartridge is back in the vault, but the chunk has already been read, so a follow-up retrieval is data work, not physical work. After the coalescing window expires, a retrieval against a byproduct part requires a fresh Vault batch and pays the full structured price.
EFS marks a vault-only-floor package in availability claims (§19.2) so that clients know the only retrieval path is the batch API. Routine retrieval requests against vault-only-floor packages are rejected with a pointer to the batch API.
16.7 Operator disaster recovery
The operator's disaster-recovery posture combines two infrastructure-level elements that supplement the per-package redundancy floor.
The first is the three-slot rolling tape rotation (§15.6), which is the operator's primary backup mechanism and also the mechanism through which each part acquires its permanent Nearline and Vault copies. A tar chunk is on the rolling cartridge in the library (Rolling Online) within 24 hours of ingestion, and on all three rolling-slot cartridges (one Rolling Online plus two Rolling Offline at off-site rolling holding) within roughly two weeks. As each rolling-slot cartridge fills, sealing produces the part's terminal Nearline (slot C) and Vault (slots A, B) copies.
The second is system-level operator backup, in which an operator writes its volume registry, package registry, and other operational metadata to additional media for catastrophe recovery. This is operator-internal infrastructure: the metadata is what makes the operator's tapes interpretable, and an operator that lost its registries while keeping its tapes would be unable to resolve customer queries against them.
Both are part of the operator's certified-platform commitment under §13.1. The per-part availability surface reflects the rolling rotation indirectly, through the storage-state checkboxes in the operator's UI (a part shows as Rolling Online + Rolling Offline + Rolling Offline during the pre-seal phase, transitioning to Nearline + Vault + Vault as the rolling cartridges seal); the rotation itself is operator infrastructure, not a separately claimed surface.
16a. EFS Database Preservation
The wrapped-key records (§12), the package registry (§13.8), the system registry (§13.8a), the volume registry (§13.2), and the operator's audit log are jointly preservation-critical with the ciphertext: loss of either side renders the other inert. An operator that lost its wrapped-key records while keeping its ciphertext would be unable to admit any recipient to retrieval; an operator that lost its package registry while keeping its tapes would be unable to resolve any customer query. The preservation regime for the EFS database therefore mirrors the regime for the ciphertext it makes interpretable, runs on the same physical infrastructure, and is bound to ciphertext preservation at the operator level. This section specifies the regime at the detail level §13–§16 use for ciphertext.
16a.1 What the database holds
The operator's EFS database is a single PostgreSQL database. It is the active store of every record the operator maintains:
- wrapped-key records (§12.2), one per
(asset_id, package_role, recipient_party_id)triple the operator has accepted, plus revocation entries against those records; - the package registry (§13.8), tracking user package parts and their volume bindings, tape barcodes, FAST paths, audit history, and lifecycle position;
- the system registry (§13.8a), tracking system-package parts;
- the volume registry (§13.2), tracking the operator's labelled volumes;
- the audit log, append-only, recording every operation the operator performs (ingestion, retrieval, wrapped-key issuance, revocation, peer replication, mute application, error);
- availability-claim state, signed and renewable, for the federation to read against;
- schema and DDL state for the database itself.
The Catalog-side policy half (RightsExpression, AccessCondition, AccessDecision) is the CMS's responsibility, not EFS's. AssetID licence ownership and authorisation mappings are CPR's responsibility. Catalog.ID identity records and efs_encrypt key state are Catalog.ID's responsibility. EFS holds none of those; it holds the records named above and preserves them.
16a.2 Plain-text log redundancy of the active database
The preservation streams (§16a.3) provide durability for records that have already been preserved — that is, records committed to a closed WAL segment that has been emitted as a system package. They do not protect against loss of records the active database has committed but has not yet written into a closed WAL segment. Without further protection, a disk failure on the PG primary in that window would lose every EFS write — wrapped-key issuance, revocation, package-registry update, audit-log entry — committed since the last segment closure, making the corresponding ciphertext potentially uninterpretable for any record whose only durable copy lived in the lost segment. User-package ingestion takes the opposite posture: a part is on two media within an hour and three within four hours (§15.2). The EFS database must match that posture, on commit, without depending on a single technology stack.
The EFS database provides this redundancy through a parallel plain-text log written in real time to a drive separate from the PG data directory. The log is upstream of the PG commit in the durability ordering, so every EFS record is durable on the log drive before the PG primary acknowledges the commit. Recovery from PG drive loss reconstructs the database from the most recent efs-basebackup snapshot plus replayed WAL plus the on-disk log entries that postdate the last archived WAL segment.
16a.2.1 Commit ordering
For each operation the EFS service performs:
- Prepare the canonical record (wrapped-key issuance, wrapped-key revocation, package-registry insert, package-registry update, volume-registry change, system-registry insert, audit-log entry, schema migration).
- Append the record to the current log file as a JSON-Lines entry.
fsyncthe log file; wait for return.- Issue the PostgreSQL commit; wait for return.
- Only on PG-commit success, acknowledge the operation to the client (for user-facing operations) or to the internal caller (for operator-internal operations).
The log fsync is a hard ordering point: nothing committed to PG is unacknowledged on the log drive. If the EFS service crashes between log fsync and PG commit, the next start replays the log forward and reconciles against PG; any entries in the log that PG never committed are detected and either committed (idempotently) or recorded as committed-aborted in the audit log. If the log drive fails entirely, the EFS service enters write-rejection mode: ingestion, wrapped-key issuance, wrapped-key revocation, and registry updates return a log_drive_unavailable, intake suspended error; read-only retrieval against existing records continues from the PG primary unaffected. Operator runbook restores the log drive and the EFS service exits write-rejection mode.
16a.2.2 Drive separation
The log drive and the PG data drive are physically distinct disks. Operators are required to place them on different physical drives and strongly encouraged to place them on different controllers and different power rails where the operator's hardware permits. The intent is to defeat correlated single-component failures (controller fault, drive-firmware bug, power surge).
The log drive is operator infrastructure, like INCOMING (§13.5); it is not catalogued as a labelled volume and is not a system-package store.
16a.2.3 Log file partitioning
The log writes one file per calendar day in a partitioned directory layout that mirrors the EFS bucket folder model (§13.4):
{log root}/{YYYY-MM-DD}/{seq}.log
{seq} is a five-digit zero-padded sequence number starting at 00001. Most days end with one file (00001.log); high-volume days roll to 00002.log, 00003.log, and so on. The default per-file size cap is 1 GiB; if a write would push the current writable file past the cap, the EFS service closes it (read-only at the filesystem level) and opens the next seq for the same date. A new date always starts a new directory and a new 00001.log.
Each log entry is canonical JSON-Lines:
{"ts": "...", "type": "WrappedKey|WrappedKeyRevocation|PackageRegistryInsert|PackageRegistryUpdate|VolumeRegistryChange|SystemRegistryInsert|audit|schema", "id": "...", "record": {...canonical record...}}
record carries the canonical serialisation that was signed at issuance time (for signed records) or the canonical state-change descriptor (for registry updates), suitable for re-verification and replay.
16a.2.4 Tape capture and the two-rotation retention rule
The rolling-rotation cartridges (§15.6) capture both the efs-wal system-package parts (which travel through the standard ingestion fan-out as system packages) and, by operator-level extension, the closed plain-text log files from the log drive. Each entry written today is appended within 24 hours to whichever rolling-slot cartridge is currently loaded in the library (its first tape copy, in Rolling Online state).
Rotation event ordering. Each weekly rotation event runs in a fixed order, which the rolling rotation requires for the three-tape guarantee below to hold:
- Catch-up first. The newly arrived rolling-slot cartridge is loaded, and the operator runs catch-up against it: every pending chunk on disk that is not yet on this cartridge is appended.
- Outbound shipping. The previously loaded rolling-slot cartridge is ejected and shipped to off-site rolling holding.
- Disk-release evaluation. Entries that have now passed their retention condition are released from disk.
A WAL segment or a closed log file is retained on online (or standby) disk until at least two rotation events have completed (through catch-up) since the entry was first appended. Concretely, for an entry X first appended during slot A's library week:
- Rotation 1 (within at most 8 days from writing) runs catch-up on the arriving slot, placing X on a second rolling-slot cartridge. The previously-loaded slot A cartridge ships out, placing the first cartridge into Rolling Offline.
- Rotation 2 (within at most 15 days from writing) runs catch-up on the third arriving slot, placing X on the third rolling-slot cartridge. The second slot cartridge ships out, placing it into Rolling Offline.
- After rotation 2's catch-up, X is on all three rolling-slot cartridges (1 Rolling Online plus 2 Rolling Offline). Disk release is then safe.
The earliest deletion of an on-disk WAL segment or log file is therefore approximately 15 days after writing, and at that point the entry sits on three rolling-slot tapes. Recovery for any incident in the last 2 weeks runs entirely from on-disk data; only incidents older than 2 weeks require reading an off-site cartridge.
The rule is condition-based, not clock-based: if a rotation is delayed (logistics interruption, library hardware issue, off-site holding access problem), disk release is delayed too. An entry is only released from disk once it has demonstrably reached all three rolling-slot cartridges through actual catch-up events. The "approximately 15 days" figure is the expected lower bound under normal weekly cadence, not a hard deadline at which disk release is forced.
The two-rotation retention rule is a v1 minimum. Operators may retain longer at their discretion; future protocol versions may extend it as disk economics evolve. The rule is a redundancy floor with a margin: by the time an entry is removed from disk, it is on the same three-tape cohort that user-package parts reach at the same point in their lifecycle. The efs-wal entry is additionally on labelled disk volumes and eventually on sealed Nearline and Vault cartridges under the §16a.5 system-package lifecycle; the log file is on the rolling-rotation cartridges only.
Permanent preservation of EFS records is the responsibility of the WAL preservation stream (§16a.3 Stream A) and the base-snapshot stream (Stream B). The plain-text log is a real-time redundancy and human-readable cross-check layer. Its content is permanently preserved on the eventual sealed Nearline and Vault cartridges that the rolling rotation produces: a log file written today rides the three rolling-slot cartridges through their accumulation life, and is captured into one Nearline copy (sealed slot-C cartridge) plus two Vault copies (sealed slot-A and slot-B cartridges) as those cartridges fill and seal under §15.6. There is no separate wipe-and-disappear date for the on-tape log.
16a.2.5 Wrapped-key retention in the log
A wrapped-key entry in the plain-text log contains the original wrapped-key envelope at issuance time. Under the two-rotation retention rule, those bytes remain on online disk in the log file for approximately 2 weeks after issuance, regardless of when the record enters a terminal state. The terminal-state hygiene of §16a.5a (zeroizing wrapped-key bytes in the active PG database on entry to a terminal state) is unaffected — the active database steady-state still contains no terminal-state wrapped-key bytes — but the log retains the historical wrapped keys for the duration of the 2-week disk window.
On tape, the wrapped-key envelope bytes captured by the log are permanent: the rolling cartridge carrying the log file eventually seals into Nearline (slot C) or Vault (slots A and B), and is retained forever (§13.9). This does not weaken the §16a.5a hygiene posture, because the WAL preservation stream (§16a.3 Stream A) already retains historical envelope bytes permanently in any case — both at the moment of issuance INSERT and as captured by any base snapshot taken before revocation. The on-tape log copy of historical envelope bytes therefore adds a third permanent on-tape pathway alongside WAL and base snapshots, all three living on the same rolling-rotation substrate.
The accepted trade-off: ~2 weeks of disk-resident historical wrapped keys for records that became terminal-state inside the window, in exchange for real-time redundancy at PG commit time and permanent human-readable cross-check on tape. The cost is small: the wrapped keys live only on operator-controlled disk and operator-controlled tape, never on a public surface, and the on-tape copies sit behind the same Nearline/Vault clearance gates that protect the rest of the operator's tape archive.
16a.2.6 Why this rather than synchronous PostgreSQL standby
A synchronous PG hot-standby on a separate machine would also provide commit-time redundancy and would additionally survive whole-machine failure (which the same-machine log drive does not). The log approach is chosen for v1 because: (a) the redundancy is technology-diverse — a PG bug or schema corruption that replicates to a standby is detectable by the independent log writer; (b) the log is human-readable and verifiable without a working PG stack; (c) operations are simpler — one PG instance, no replication topology, no failover runbook; (d) long-horizon technology independence is improved — recovery in 30 years does not depend on a PG version compatible with archived WAL. The whole-machine failure threat is mitigated at the protocol level by the mirror-set redundancy (§12.9, §16.4): if the operator's machine is destroyed, the mirror-set peers at other operators continue serving for the AssetIDs they share, and the operator-cessation regime (§16a.6) transfers responsibility.
16a.3 The two preservation streams
Each EFS operator emits its database preservation as a continuous feed of two parallel streams, written into the operator's storage substrate as system packages. System packages use the same volume, bucket, storage-state, redundancy, and migration architecture as regular user packages (§13–§16) and are catalogued in the system registry (§13.8a) separately from the package registry. They do not use the EFS network ingestion or retrieval surfaces: the database-preservation pipeline writes parts directly into INCOMING via local IPC, the write coordinator (§13.6) places them onto labelled volumes by the same fan-out as user packages, and the system registry binds the part to its volume(s). There is no HTTP ingestion endpoint for system packages, no HTTP retrieval endpoint, no federation availability claim, no wrapped-key record issued against the parts, and no Bitcash metering. Access to a system-package part is operator-internal only.
Stream A: PostgreSQL Write-Ahead Log (WAL) segments (system role efs-wal). The EFS operator's authoritative record store is a PostgreSQL database. WAL archiving is enabled; each completed WAL segment (16 MiB nominal) is closed, fsync'd, and written as a part of a streaming-published system package. The WAL is complete: it captures every committed transaction against the EFS database with full row-level detail — every wrapped-key issuance, revocation, package-registry insert, registry update, audit-log entry, and DDL migration. Replay of the WAL stream against a base snapshot reconstructs the database to any committed transaction within the preservation horizon.
Stream B: PostgreSQL base snapshots (system role efs-basebackup). The WAL stream is only meaningful relative to a base snapshot; without one, WAL records are differential changes against state that no longer exists. The EFS operator runs pg_basebackup on a configured cadence (default weekly, configurable per operator within bounds set at §21) and writes each base snapshot as a part of a streaming-published system package. Each base snapshot is a complete, restorable copy of the EFS database at a recorded transaction position. Point-in-time recovery is "most recent base snapshot before the target time, plus WAL segments from that snapshot's transaction position forward"; snapshot-only recovery to the most recent snapshot requires no WAL replay at all and brings a successor operator (§16a.6) operational immediately.
PostgreSQL is the basis of recovery. Tooling for both streams — pg_basebackup for production, pg_walreceiver for streaming WAL ingest, pg_waldump for inspection, recovery_target_time for point-in-time targeting — is mature, stable across major versions, and one of the most widely deployed durability stacks in existence. The protocol's long-horizon recovery assumption is that a working PostgreSQL of a compatible major version is available at recovery time; the assumption is conservative given Postgres's installed base and longevity.
The two streams are written in parallel via the same operator-internal write path and stored on the same volume infrastructure. Either alone is sufficient for some recovery scenarios — base snapshots alone restore to the most recent snapshot moment; WAL alone allows forward replay from any base snapshot — and together they provide point-in-time recovery to any committed transaction.
16a.4 The preservation streams are not encrypted
Both streams are emitted unencrypted. Two reasons.
First, no confidentiality is given up by plaintext emission. The wrapped-key envelopes embedded in wrapped-key records are themselves hybrid KEM ciphertext under recipients' efs_encrypt keys (§12.3); their bytes carry no confidentiality risk in plaintext at rest beyond what an EFS operator already sees (§12.8). The remaining payload — recipient party identifiers, AssetIDs, package roles, scopes, timestamps, signatures, ciphertext digests, volume bindings, audit-log entries — is the operational metadata an EFS operator already holds in its database. No additional information is exposed by writing it to a preservation tape.
Second, encrypting the streams would couple long-term recovery to the survival of an operator decryption key, which is exactly the dependency long-term preservation is supposed to avoid. The whole point of writing records to off-site air-gapped tape is that the tape survives the operator's online infrastructure; an encrypted tape that depends on a key the operator no longer holds is not a preservation copy, it is a brick.
This is a deliberate exception to the "always encrypt" rule that governs user packages (§3, design choice 1). The exception applies only to system packages whose payload is already KEM ciphertext or already-exposed operator metadata; it does not apply to any user-published content.
16a.5 Storage states, redundancy floor, and migration
Database-preservation system packages move through the storage states (§14) on the same lifecycle as regular packages, and ride the same rolling-rotation tape substrate (§15.6). The full redundancy floor applies: two disk-side copies on different storage units, one Nearline copy in the operator's robotic library (sealed slot-C cartridge), two Vault copies at two separate off-site vault sites (sealed slot-A cartridge at vault A, sealed slot-B cartridge at vault B). Five copies, three states, two media types. During the pre-seal phase the tape copies sit on rolling-rotation cartridges in Rolling Online and Rolling Offline states; the §16.1 floor is held continuously from roughly two weeks after writing. The fifteen-year tape migration cycle applies (§16.3), with legacy cartridges retained in archival custody rather than destroyed.
Three refinements to the lifecycle apply specifically to database-preservation system packages:
- Base snapshots stay disk-resident indefinitely. The five-year and ten-year idle thresholds (§15.3) that move user packages to Standby and then to tape-only do not apply to the
efs-basebackupstream. Base snapshots are the operator's snapshot-recovery floor; release to tape-only would force a tape mount on every recovery attempt and defeat their purpose. The Online/Standby split is at operator discretion (the default is Online for the last 30 days of base snapshots, Standby for older snapshots), but tape-only release does not happen for the base-snapshot stream. - WAL stays on disk through at least two rotation events, then falls to Nearline once superseded by a base snapshot. WAL segments are retained on online (or standby) disk until two rotation events of the rolling-rotation tape have completed (through catch-up) since the segment was first appended (§16a.2.4) — approximately 15 days from segment closure under normal cadence. After both retention conditions hold — two rotation events have completed and the segment's entire transaction range precedes a base snapshot whose own rolling-rotation cohort has reached the three-tape state — the segment may move from Online to Standby and then to Nearline at the next storage-unit migration, shrinking the working disk set without sacrificing recoverability. It remains on Vault tape under the unconditional floor.
- Migration in lockstep with paired ciphertext, where practical. The operator schedules tape writes so that, where practical, database-preservation tar-chunks share fifteen-year migration windows with the ciphertext packages they correspond to. The records and ciphertext for an AssetID then travel through media transitions on the same cadence and, where the operator can arrange it, on adjacent or same tapes.
The Vault pair remains the unconditional preservation floor for both streams. Sustained-pressure responses (§15.4) may compress online availability just as for any other package; what is unconditional is the off-site air-gapped Vault pair of each operator's WAL and base-snapshot streams.
16a.5a Hygiene for revoked wrapped-key records
A wrapped-key record whose lifecycle reaches the revoked state (§12.7) no longer has any operational value: retrieval verification fails against it, no further operation consumes its wrapped_key envelope bytes. The protocol moves the envelope bytes out of the EFS service's hot online store the moment the revocation transition is recorded.
Specifically, on entry to the revoked state the EFS service:
- Records the revocation in the audit log (revocation record, per §12.7).
- Zeroizes the
wrapped_key.classical_ephemeral_pk,wrapped_key.pq_ciphertext,wrapped_key.kdf_salt,wrapped_key.aead_nonce, andwrapped_key.aead_ciphertextfields in the WrappedKey row of its active PostgreSQL database in a single committed transaction. - Retains the row's metadata (
asset_id,package_role,recipient_party_id,recipient_pubkey_digest,scope,expires_at,issued_at,issuing_party_id,issuing_session_id,issuing_signature,revoked_at, and the revocation signature). Active-database queries for the record continue to return the metadata; they do not return wrapped-key envelope bytes.
The historical wrapped-key bytes are not lost. They survive in the two preservation streams (§16a.3) and, for the duration of the §16a.2.4 two-tape retention window, in the on-disk plain-text log:
- the WAL contains the original INSERT of the WrappedKey row at issuance time, including the envelope fields as they were written, and later captures the UPDATE that zeroes those fields on revocation;
- any base snapshot taken between issuance and the revocation transition contains the row with envelope fields intact;
- the plain-text log entry that recorded the record at issuance carries the envelope bytes; this on-disk log entry persists for approximately 2 weeks from issuance under the §16a.2.4 retention rule, regardless of whether or when the record is revoked.
A successor operator inheriting the database under §16a.6 inherits the same hygiene: the active state contains no envelope bytes for revoked records, and historical envelope bytes live in the preservation streams and (within the 2-week window) on the log drive. A legitimate need to recover a historical envelope value — a recipient's claim that a retrieval was wrongly denied, a forensic investigation of an operator dispute, a successor reconciling state across mirror operators — is served by reading the preservation streams from the operator-internal recovery path (§16a.8); it is not served from the live database.
This bounds the damage if the active EFS database is compromised at any moment in the future: an attacker who reads the active database cold finds no envelope bytes for revoked records, only their metadata. The plain-text log on the separate drive retains ~2 weeks of recent wrapped keys including records that were revoked within the window; the operator's threat model treats the log drive at the same security perimeter as the PG drive. Envelope bytes for currently-active wrapped-key records remain online in PG by necessity (they are needed for ordinary retrieval verification), and the steady-state population of online wrapped keys is bounded by the active record count plus the ~2-week recent-issuance window, not by the cumulative record count.
16a.6 Operator cessation and succession
When an EFS operator ceases business, a successor operator inherits ciphertext, wrapped-key records, registries, system packages, and operator registries together under the operator-cessation regime that applies to EFS hardware and registries (§13.1, §16.7). The successor mounts the ceasing operator's volumes, imports the ceasing operator's volume registry, package registry, and system registry, restores the EFS database from the inherited base snapshot and WAL streams (or directly mounts the PG data drive where physical handover is feasible), and resumes ingestion, retrieval, and wrapped-key gating for the inherited AssetIDs. Because the database and the ciphertext share a substrate, no half of the preservation arrives without the other.
If the ceasing operator was part of an AssetID's mirror set alongside another operator that continues to run, the successor takes over the ceasing operator's mirror-set position. The other mirror operator continues serving its share of the mirror set unchanged. The mirror-set claim at CPR is updated to name the successor in place of the ceasing operator.
If the ceasing operator was the sole mirror operator for an AssetID and no successor is willing to inherit, the AssetID's ciphertext, wrapped-key records, and operator registries pass to a federation-managed receivership of last resort, which preserves both under the same redundancy floor while a long-term home is identified. The receivership is operator infrastructure; the protocol does not define the institutional form, only that it must exist and must hold ciphertext and database together.
16a.7 Funding
Database preservation is funded out of the EFS ingestion fee for the ciphertext packages the records correspond to. There is no separate per-record preservation fee. The fee a publisher pays for ciphertext preservation under §18.1 covers preservation of the matching wrapped-key records, package-registry entries, and audit-log entries. The cost is small in absolute terms — wrapped-key records are kilobytes per package, base snapshots and WAL grow with operator-wide activity rather than per-record — and is absorbed inside the ingestion-fee budget without a separate line item.
Operational costs (retrieval, request handling, the Bitcash boundary for those operations) remain separately metered through Bitcash; the preservation regime described in this section covers only the durable storage of records the EFS service has already accepted.
16a.8 Recovery procedures
The EFS operator runs recovery preparedness drills on a configured cadence (default quarterly). All reads are operator-internal: the operator's recovery tooling reads parts directly from the system registry's volume bindings, with no HTTP retrieval and no wrapped-key gate.
- Log/WAL cross-check drill. Sample a window of recent EFS records (default: the previous 24 hours). For each sampled record, confirm that the entry exists in the on-disk plain-text log (§16a.2), in a closed WAL segment (or in the currently-open segment if recent enough), and in the active PG database with consistent canonical bytes. Any disagreement is investigated as a potential write-ordering bug or storage corruption.
- WAL-replay drill. Restore a recent base snapshot from the operator's own
efs-basebackupstream (read directly from the volumes that hold its parts), replay the WAL segments from the operator's ownefs-walstream forward to a chosen point in time, verify that the resulting database matches the operator's online state at that point in time (or a known historical state if a past point is chosen). - Snapshot-only recovery drill. On a clean machine, restore the most recent
efs-basebackupsnapshot into a fresh PostgreSQL instance with no WAL replay, verify the operator can serve retrieval verification queries against the restored snapshot for the AssetIDs covered, and confirm the snapshot is internally consistent. - Log-as-recovery-floor drill. On a clean machine with no PostgreSQL stack present, read a window of plain-text log files directly from the operator's log drive (or from a recent rolling-rotation cartridge if testing tape-based recovery), parse the JSON-Lines entries, verify signatures on a sample of records, and confirm the log's contents agree with the WAL-replay result for the same window.
- Two-rotation-retention drill. Sample a set of WAL segments and log files removed from disk in the previous 30 days; confirm each was retained until at least two rotation events of the rolling-rotation tape had completed (through catch-up) since its first append (§16a.2.4), by cross-referencing the operator's rotation-event log against the deletion timestamps, and confirm by registry lookup that each removed entry sits on all three rolling-slot cartridges at the moment of deletion.
- Revoked-state-hygiene check. Confirm that the active EFS database contains no wrapped-key envelope bytes for any record whose lifecycle state is revoked (§16a.5a), by sampling rows and checking the envelope fields are zeroized. Confirm separately that the on-disk plain-text log retention does not exceed the §16a.2.4 minimum by more than the operator's stated configured extension.
- Log-drive-failure drill. On a non-production replica, simulate a log-drive failure; confirm the EFS service enters write-rejection mode and returns
log_drive_unavailable, intake suspendedto clients, that read-only retrieval against existing records continues from the PG primary, and that recovery on a replacement log drive cleanly exits write-rejection mode.
Drill results are recorded in the operator's audit log and anchored to CPR alongside the operator's other periodic anchoring events (§16a.9).
16a.9 Registry anchoring and tamper-evidence
The preservation streams themselves are operator-internal (§16a.3): no party outside the operator can read their contents. What is public is the chain of anchoring commitments the operator publishes to CPR: a continuous, sequence-numbered series of cryptographic commitments to the Merkle roots of each preservation interval, anchored through to XRPL.
Each of the two preservation streams is independently anchored using the CPR registry anchoring mechanism (CPR Whitepaper §17). Anchoring runs on a per-stream cadence:
| Stream | System role | Anchoring cadence | What is hashed |
|---|---|---|---|
| WAL | efs-wal |
Hourly | Merkle root over the WAL segments closed during the hour, indexed by segment LSN |
| Base snapshot | efs-basebackup |
Per snapshot | Merkle root over the snapshot's part contents, indexed by part number |
Each anchoring event is submitted to CPR as a registry_anchor claim whose extensions object names the EFS operator, the stream's system role, the interval boundaries (start/end LSN or snapshot ID), the operation count, and the Merkle root. The claim is included in an CPR block, which is in turn anchored to XRPL — producing the layered timestamp chain EFS preservation interval → Merkle root → CPR claim → CPR block → XRPL.
The per-stream claims are chained using CPR's previous_claim_id and sequence_number extension fields (CPR Whitepaper §17.2). Each operator therefore maintains two append-only chains, one per stream. The chain construction makes it provable that no anchoring interval has been skipped, dropped, or reordered: the absence of a claim at a given sequence number is detectable, the presence of an out-of-order claim is detectable.
What the operator can verify against the chain. During recovery preparedness drills (§16a.8) and during an actual recovery, the operator (or a successor inheriting the volumes under §16a.6) recomputes the Merkle root of a preservation interval from the preserved bytes on its own volumes and compares it to the anchored root in the corresponding registry_anchor claim. A mismatch is unambiguous evidence that the bytes have been altered (whether by storage corruption, by a software bug, or by a malicious party) since the anchoring event.
What an external observer can verify without access to the bytes. Anyone with read access to CPR — which is the federation, not the operator's internal substrate — can verify, without ever seeing a single record of operator content:
- Anchoring cadence is honoured. Sequence numbers in the chain are monotonic and gap-free; no anchoring interval was skipped or dropped.
- Anchoring signatures are valid. Each
registry_anchorclaim is signed by an active key on the operator's Catalog.ID identity; an operator that loses its claim-signing capability cannot quietly stop anchoring. - The chain is XRPL-timestamped. Every Merkle root has a public timestamp through CPR → XRPL, so the operator cannot post-date or back-date commitments.
- Behavioural divergence between operators is detectable. Two operators in the same mirror set that produce different anchoring cadences for the same AssetID's records, or whose chain rates diverge against the operator's stated SLA, are observable from the outside.
What the external observer cannot do is read the wrapped-key bytes, the WAL contents, the snapshot contents, or any other preserved record. The bytes remain operator-internal. The chain commits the operator to whatever they preserved at each anchoring moment; if a later dispute requires the actual bytes to be produced, that production is governed by the operator's audit and discovery obligations under its operator license, not by an open read surface.
Relation to audit-log anchoring. Audit-log Merkle roots are separately anchored as part of the federation's audit-log discipline (§19, the federation reconciliation regime); the preservation-stream anchoring described here is in addition to, not in place of, audit-log anchoring. The two chains have different scopes: the audit log records operations the EFS service performed (ingestion, retrieval, wrapped-key issuance, wrapped-key revocation, mute application, error) and is publicly queryable for non-private metadata via the EFS API; the preservation streams hold the operator-internal state that those operations produced, and their content is not public.
17. Local Working Copies and Self-Hosted Hubs
EFS has two components. The server component is run by federation operators (§13–§16). The client component is a desktop application that gives a publisher a fully detailed overview of every file it has published and every file it keeps in its personal local repository on disk. The desktop application is the publisher's working surface; the federation is the durability counterpart.
A local working copy is the publisher's own complete encrypted catalog held by the desktop application on their own machine. It is not a network-serving node. It is not announced to the federation. It carries no availability claims. It is simply the publisher's primary copy of their own work, with the federation as the cloud-side preserved counterpart. Local working copies are an expected and supported part of the model.
The desktop application also exposes an opt-in self-hosted hub mode that turns one or more of the user's machines into EFS-speaking nodes that the user can synchronise their own portfolio across. In v1 a hub serves a strictly bounded population of packages and a strictly bounded population of clients (§17.1). The peer-to-peer distribution of encrypted packages from one user's hub to another user's hub is a planned post-v1 feature (§17.6); v1 hubs do not serve as public-distribution endpoints to other Catalog.ID members.
17.1 What a v1 hub may hold
A v1 hub may hold exactly two categories of encrypted package:
-
Self-produced packages. Packages whose publisher is the hub operator (the Catalog.ID identity that runs the hub). Federation registration still applies — the package must have been ingested to at least one federation operator and paid for under the ingestion fee (§18.1) — but no federation fetch is required, because the hub operator already holds the package locally as part of their working copy. For self-produced packages, the hub operator therefore pays only for storage; no retrieval cost is ever incurred for moving the package between the operator's own machines.
-
Packages for which the hub operator holds an effective wrapped-key record. A package the hub operator has lawfully retrieved from the federation under a WrappedKey record addressed to one of their
efs_encryptkeys (§12). The wrapped-key record is the proof of entitlement; the encrypted ciphertext is the proof of acquisition. The retrieval was paid for once at federation pricing (§18.2); subsequent peer-to-peer movement of the ciphertext between the operator's own hubs incurs no further retrieval cost.
A hub may not hold any other encrypted package. In particular, a hub may not pre-fetch material from the federation and re-serve it to arbitrary requestors, nor accept ciphertext peer-relayed from another operator's hub for re-service. The owned-or-licensed restriction is exhaustive; any acquisition mode that does not fall under one of the two categories above is outside scope.
17.2 Whom a v1 hub may serve
A v1 hub serves exactly the other hubs operated by the same Catalog.ID identity on machines that identity controls. A user with a desktop workstation, a laptop, a NAS at home, and a server in their studio may run a hub on each, and any of those hubs may answer inventory and retrieval requests from any of the others, after mutual authentication of the hub operator's Catalog.ID identity. This is multi-machine portfolio synchronisation, not public distribution.
Two consequences follow directly:
- Synchronisation without repeat retrieval costs. A user who pays once to retrieve a package they hold a wrapped-key record for can replicate it across all of their own hubs over the same-owner peer-to-peer link without paying again. For self-produced packages, they pay only the ingestion fee at the federation and never pay retrieval costs at all for keeping their own machines in sync.
- No public hub distribution in v1. A hub does not answer retrieval requests from Catalog.ID identities other than its operator. Even a requestor who holds a valid WrappedKey record from the publisher must retrieve the ciphertext from the federation, not from a third-party hub, in v1. The federation remains the universal distribution endpoint.
17.3 Exposure modes
Same-owner hub-to-hub communication can run over any reachable network path the operator chooses to configure. The desktop application supports three exposure modes; the choice affects only reachability between the operator's own hubs and is invisible to the federation:
- LAN-only. The hub answers requests on the local network only, suitable for households, studios, or operators whose machines all sit behind the same NAT or VPN.
- Public. The hub is reachable from the internet through the operator's router and nginx configuration, and answers requests from any address — but, per §17.2, only authenticates other hubs under the same Catalog.ID operator. Useful for operators whose machines are geographically distributed.
- IP-allowlist. Public reachability gated to a specified set of source IP addresses, configured in nginx and/or the router. Useful for operators with a small known set of remote machines.
In all three modes the hub authenticates incoming connections against the operator's own Catalog.ID identity using the operator's hybrid signing key pair (Ed25519 + ML-DSA-65). Connections that do not present a matching signature from the same Catalog.ID identity are rejected.
17.4 Hub identity, addressing, and discovery
A hub is identified by its operator's Catalog.ID username; the network address is unconstrained — a hub may be reachable at an IP address, a personal or company domain, a Tor hidden service, or any other URL the operator chooses. There is no Catalog-controlled subdomain; DNS is not in the critical path. The desktop application maintains a local roster of the user's own hubs and synchronises inventory and retrieval among them.
Because v1 hubs only serve the operator's own machines, there is no inter-operator discovery problem to solve at the protocol level: each user manages their own small roster of their own hubs. A small set of signed endpoints supports this synchronisation:
/efs/inventory. The list of PackageID (PartNr) entries this hub holds, plus the operator's wrapped-key records for non-self-produced packages./efs/ping. Current liveness and status of this hub (online, scheduled-offline, maintenance), optionally including current load./efs/info. Hub metadata: operator display name, machine label.
There is no /efs/peers endpoint: a v1 hub has no cross-operator hub relationships to advertise, only same-owner machines that the desktop application already enumerates locally. A peer-roster endpoint enters the picture under §17.6 if and when peer-to-peer distribution to other Catalog.ID members is added.
17.5 Free transport, gated content
A v1 hub serves transport between the operator's own machines for free; there is no Bitcash settlement on retrieval through a hub. The federation remains the sole BIT-metered retrieval surface (§18.2). Free transport does not mean free content: the hub serves only ciphertext, and decryption still requires the operator's efs_encrypt private keys, exactly as on the federation. The hub layer is a self-synchronisation substrate; the wrapped-key records and the access rights they express remain with the publisher and with EFS.
A hub may host the publisher's own persistent-mode agent (a Catalog.ID agent under the publisher's principal, with a share_signing scope, §12.4) so that automated wrap-on-sale-settlement and pre-authorised share patterns continue when the publisher's primary workstation is offline. The agent posts wrapped-key records directly to federation EFS operators in the AssetID's mirror set; the hub merely hosts the runtime.
17.6 Planned post-v1: peer-to-peer distribution to other Catalog.ID members
A planned feature for a later protocol version extends the hub model so that a hub may also serve encrypted packages peer-to-peer to other Catalog.ID members — specifically, to other members who hold an effective WrappedKey record for the package, addressed to one of their own efs_encrypt keys. In this model the hub becomes a permitted alternative distribution surface for shared content, alongside (not replacing) the federation: requests served by a hub would not be logged by the federation, the hub operator would not collect BIT on retrieval, and a publisher's own hub would become a privacy-respecting distribution path for their own work.
This feature is not available in v1. The v1 hub population is the operator's own machines (§17.2). The post-v1 design must specify, at minimum:
- how a hub verifies a remote requestor's WrappedKey record (and the recipient's Catalog.ID standing) before serving ciphertext;
- how a recipient who has lawfully obtained a wrapped-key record is permitted to relay ciphertext to other recipients without violating the publisher's distribution model;
- the discovery model for cross-operator hub rosters (the
/efs/peersendpoint and the privacy implications of publishing one); - how the planned-feature flow interacts with mute (§20) and with revocation;
- whether the planned flow should support same-recipient re-distribution within an organisation (delegate identities serving each other) before opening to arbitrary Catalog.ID identities.
Until those questions are answered in a later protocol version, the hub layer is an operator-owned multi-machine portfolio mirror, nothing more.
17.7 Durability posture
EFS does not elevate self-hosted hubs as a first-class durability guarantee. A user may operate one for their own resilience or convenience, but availability across the network as a whole remains the responsibility of the institutional federation. A hub may go down, be turned off, lose its disk, or be confiscated; the federation's redundancy floors (§16) are unaffected. Self-hosting may also be sensitive: advertising oneself as a holder of certain material can carry consequences. EFS therefore treats self-hosted hubs as a permitted and supported but unguaranteed mode, distinct from the federation's funded durability commitments.
18. Bitcash Funding Model
EFS uses two payment events: ingestion and retrieval. Both flow through Bitcash.
18.1 Eternal storage on first payment
Ingestion is a one-time payment that funds eternal preservation of the package. The fee covers verification, the initial fan-out across storage states (§15.2), the five-copy full redundancy floor through the five-year idle threshold (§15.3), the same five-copy floor with both disk copies in Standby through the ten-year idle threshold, the three-copy tape-only floor that follows beyond ten years idle, and the two-copy vault-only floor that the operator may contract to under sustained pressure. Once paid, the operator preserves the package for the life of its infrastructure with no further charge for storage. The storage commitment has no expiry and is not renewable.
This is a deliberate inversion of the typical cloud-storage model. It aligns the publisher's incentive (preserve everything I have ever made, for the long term) with the operator's incentive (encourage retrievals, since they fund operations). The publisher is not paying for ownership of a file in the abstract. Ownership is a CPR and CMS concern. The publisher is paying for permanent storage of a defined byte sequence.
18.2 Retrieval
When a package is retrieved, the retrieval fee covers serving the request. Retrieval from Online or Standby is priced uniformly; the spin-up latency from Standby is operationally visible but not separately metered. Retrieval from FAST is priced as Online.
Retrieval from Nearline is priced in three components rather than purely per-byte, because each request consumes scarce library-robot capacity: the robot must fetch the cartridge from its slot, mount it on a drive, and seek to the requested tar chunk (§13.9). A per-tape penalty is charged for each cartridge mounted, a per-chunk penalty for each tar chunk seeked-to and read on each cartridge, and a per-byte component for the bytes delivered. The per-tape and per-chunk penalties put counterpressure on demand for the finite mount-and-seek budget, so requests are shaped to amortise mounts across many parts: parts that share a tape collapse the per-tape penalty, parts that share a chunk on a tape collapse both the per-tape and per-chunk penalty, and the per-byte component scales only with the bytes the requestor takes delivery of. Specific rates are in §18.3.
Retrieval from Vault uses the same three-component structure as Nearline but at higher rates, reflecting the additional physical work of off-site vault collection and return on top of mount and seek (§16.6).
Vault retrieval is requested through a dedicated batch API rather than through the routine retrieval protocol. The requestor submits a list of PackageIDs and PartNrs; the operator returns the per-tape, per-chunk, and per-byte breakdown along with an estimated time to fulfilment based on its current vault-collection schedule. After payment, the operator schedules the vault collection and the requestor either polls the API or receives an asynchronous notification when the parts are available, after which they are retrieved through the regular retrieval protocol.
Coalescing-window byproduct retrieval. Both Nearline and Vault retrievals operate at the granularity of the 100 GiB tar chunk, so a chunk read into INCOMING typically contains parts the requestor did not ask for. Those byproduct parts sit in INCOMING for the operator's coalescing window (default 24 hours, §13.9) and are then released without entering Online state or the package registry. While they are in INCOMING, however, a fresh retrieval request for any of them is served immediately at the Online retrieval rate, with no per-tape or per-chunk penalty. The reasoning is operational: the cartridge has been returned to its library or vault, but the chunk has already been read, so a second retrieval against a chunk-mate is data work and not a fresh mount-and-seek operation. Once the coalescing window expires, a retrieval against a former byproduct part requires a fresh Nearline or Vault operation and pays the full structured price.
An operator running under scheduled online availability (§15.4) serves Online and Nearline retrievals only during its declared windows; outside those windows requests queue. Vault retrieval, being scheduled work, is not subject to those windows but is subject to the operator's vault-collection cadence.
Retrieval is paid through Bitcash from the requestor's wallet at the time of access. Who ultimately bears that cost (the viewer directly, the publisher as a subsidy, or the Asset Market settlement layer through a license) is a Library and Market concern, varying by license model. EFS records the payment event and serves the bytes; the higher-layer attribution of the cost happens above EFS.
18.3 V1 platform pricing
EFS pricing is platform-wide: all operators charge the same rates for the same operations (§18.4). The following table is the v1 schedule.
| Operation | Rate | Approximate per GiB |
|---|---|---|
| Ingestion (eternal storage) | 0.005 BIT per KiB | ~5,243 BIT (~ EUR 1.05) |
| Online, Standby, or FAST retrieval | 0.001 BIT per KiB | ~1,049 BIT (~ EUR 0.21) |
| Nearline retrieval, per-tape penalty | 10,000 BIT per Nearline cartridge mounted | (n/a) |
| Nearline retrieval, per-chunk penalty | 5,000 BIT per tar chunk read | (n/a) |
| Nearline retrieval, per-byte component | 0.001 BIT per KiB | ~1,049 BIT (~ EUR 0.21) |
| Coalescing-window retrieval (chunk already in INCOMING) | 0.001 BIT per KiB | ~1,049 BIT (~ EUR 0.21) |
| Vault retrieval, per-tape penalty | 250,000 BIT per Vault cartridge mounted | (n/a) |
| Vault retrieval, per-chunk penalty | 2,500 BIT per tar chunk read | (n/a) |
| Vault retrieval, per-byte component | 0.0025 BIT per KiB | ~2,621 BIT (~ EUR 0.52) |
The BIT-to-euro reference rate is 1 BIT = EUR 1/5000 = EUR 0.0002 at standard retail. Per-transaction volume discounts on BIT purchases (BIT Issuance and Distribution Addendum §4.3) range from 5% at EUR 25 to 25% at EUR 10,000 and above, so the effective EUR cost of any operation in this schedule may be up to 25% lower for buyers purchasing in bulk.
A minimum billing size of 1 MiB applies to ingestion. Packages smaller than 1 MiB are billed as if they were 1 MiB at the ingestion rate. This covers the per-package operational overhead of acceptance, signing, replication, lifecycle bookkeeping, and tape-volume accounting, which is largely independent of package size and is incurred once per part at ingestion time. No minimum applies to retrievals: every Online, Standby, or FAST retrieval is billed on the per-byte component only, whether the client requests a whole part or a byte-range subrequest (§9.4). Subrange retrieval is not available for parts in Nearline or Vault: those states require a chunk read in any case, and the structured Nearline and Vault rates apply per §18.2.
Both Nearline and Vault rates are composed (§18.2): a batch retrieval that hits multiple parts on the same tape pays the per-tape penalty once for that tape, the per-chunk penalty once per distinct chunk read on that tape, and the per-byte component on the sum of bytes delivered. The data inside a chunk is essentially free once the mount has happened, so a request for one part and a request for twenty parts on the same chunk pay the same per-tape and per-chunk penalties and differ only in the per-byte component. There is no separate "catastrophic restoration" rate: an operator that has lost all higher-state copies and is restoring from Vault pays the same composed Vault rate. The batch-API submission flow (§18.2) is the sole interface for Vault retrieval; ad-hoc per-part Vault requests through the routine retrieval protocol are not supported.
Worked examples. A single 16 GiB part from a single Nearline tape pays 10,000 + 5,000 + 16 x 1,049 = 31,784 BIT (~ EUR 6.36); the same part from Vault pays 250,000 + 2,500 + 16 x 2,621 = 294,436 BIT (~ EUR 58.89). Fifty parts from one chunk on one tape, totalling 80 GiB, pay 10,000 + 5,000 + 80 x 1,049 = 98,920 BIT (~ EUR 19.78) on Nearline or 250,000 + 2,500 + 80 x 2,621 = 462,180 BIT (~ EUR 92.44) on Vault. At institutional scale the schedules amortise heavily: a publisher restoring its entire 20 TiB repository, spread across 40 tapes and roughly 205 tar chunks, pays 40 x 10,000 + 205 x 5,000 + 20,480 x 1,049 = 22,908,520 BIT (~ EUR 4,581.70) via Nearline or 40 x 250,000 + 205 x 2,500 + 20,480 x 2,621 = 64,190,580 BIT (~ EUR 12,838.12) via Vault. Both totals sit well below the original 20 TiB ingestion cost of ~EUR 21,475.33.
The same three scenarios across all four operations, at standard retail:
| Scenario | Ingestion | Online/Standby/FAST | Nearline | Vault |
|---|---|---|---|---|
| 16 GiB single part | EUR 16.78 | EUR 3.36 | EUR 6.36 | EUR 58.89 |
| 80 GiB on one chunk | EUR 83.89 | EUR 16.78 | EUR 19.78 | EUR 92.44 |
| 20 TiB / 40 tapes / 205 chunks | EUR 21,475.33 | EUR 4,296.70 | EUR 4,581.70 | EUR 12,838.12 |
And the same scenarios assuming the buyer purchases just enough BIT for each operation in a single transaction, with the §4.3 volume discount applied (cells without a tier label fall in Standard, no discount):
| Scenario | Ingestion | Online/Standby/FAST | Nearline | Vault |
|---|---|---|---|---|
| 16 GiB single part | EUR 16.78 | EUR 3.36 | EUR 6.36 | EUR 55.95 (Bronze, 5%) |
| 80 GiB on one chunk | EUR 79.70 (Bronze, 5%) | EUR 16.78 | EUR 19.78 | EUR 87.82 (Bronze, 5%) |
| 20 TiB / 40 tapes / 205 chunks | EUR 16,106.50 (Enterprise, 25%) | EUR 3,437.36 (Platinum, 20%) | EUR 3,665.36 (Platinum, 20%) | EUR 9,628.59 (Enterprise, 25%) |
The schedule is calibrated to leave headroom over the operator's underlying cost of running the full redundancy floor and the tape migration cycle, with hosting and access infrastructure excluded. The exact cost basis depends on operator scale, library utilisation, tape generation, and energy posture. Whether the platform-wide rate can hold across operators in materially different cost-structure jurisdictions is an open question (§21).
The prices are uniform across roles. Source, preservation, preview, access, edition, text, metadata, and submission packages all cost the same per byte to ingest and to retrieve. Roles differ in expected access patterns and in retrieval frequency, not in pricing structure.
18.4 Platform pricing and operator differentiation
EFS pricing is platform-wide, not operator-set. All operators charge the same single ingestion rate and the same retrieval rates at each state, and they all honour the same lifecycle (§15) and redundancy floors (§16). Publishers choose operators on non-price dimensions: jurisdiction, geographic and political risk profile, capacity, reputation, customer service, and the operator's record on audit and migration.
Uniform pricing prevents a race to the bottom that would undermine the durability promise, makes cross-operator placement (§16.4) a clean redundancy decision rather than a price-shopping exercise, and gives publishers predictable costs across the federation. It also reflects the wider Catalog economics, in which BIT is a standard unit and pricing for protocol services is set at the protocol layer rather than by individual operators.
An operator publishes a service contract that confirms its adoption of the platform pricing schedule, names its jurisdiction and facilities, and states its operator-specific commitments on capacity, support, and notice periods. The contract is the binding instrument between publisher and operator, but the prices it carries are the protocol's, not the operator's.
The publisher does not select a storage state for a specific package; state membership is governed by the lifecycle. Publishers do not pay separately for redundancy; the redundancy floors of §16 are part of what the ingestion fee buys.
18.5 Expected access patterns
Most EFS packages are not expected to be retrieved often. The publisher's primary working copy is local: the EFS client on the publisher's own machine holds a complete encrypted copy of their catalog, and routine work (adding new material, browsing, rendering editions for publication to static websites) runs against the local copy without generating EFS retrievals. Edition packages are produced from the local copy and uploaded to EFS as part of publication; they are typically not fetched back from EFS by the publisher who created them.
EFS retrievals after first ingestion are dominated by:
- restoration of source or derivative material when the publisher's local copy is unavailable, lost, or being rebuilt on a new machine;
- access by licensed Catalog.ID viewers through the CMS, where retrieval cost is borne by the viewer's license terms;
- audit, repair, or migration.
In practice most parts sit on Online or Standby with infrequent FAST-cache promotions, and the EFS network behaves more like a deep archive with selective online cache than like a content delivery network.
18.6 EFS does not hold funds
EFS does not hold funds. Bitcash holds wallet balances; EFS reads metering data and accepts settlement events from Bitcash to gate ingestion and retrieval. Vault retrieval is settled through Bitcash on the per-tape, per-chunk, and per-byte breakdown returned by the batch API (§18.2), in a single up-front settlement at the time the batch is accepted, rather than as streaming micropayments per part.
18.7 Founder funding of the first operator
The first EFS operator, Stichting Outpapier, is bootstrapped through a single pre-purchase by the founder. The founder pre-purchases 442,000,000 BIT from Stichting Outpapier at the Enterprise volume tier (§4.3 of the BIT Issuance and Distribution Addendum), paying EUR 66,300 against a standard retail price of EUR 88,400. The 25% volume discount is applied as the standard Enterprise rate, not as a bespoke concession. This pre-purchase capitalises Stichting Outpapier and gives the founder a working balance against which the first archive can be ingested.
Of the pre-purchased balance, 172,000,000 BIT is allocated to populating EFS with the founder's personal archive at the v1 ingestion rate. At 0.005 BIT per KiB this funds approximately 32 TiB of eternal storage. The remaining balance is held against future ingestion, retrieval, and sharing operations.
The 25% discount is the standard Enterprise tier available to any purchaser of EUR 10,000 or more in a single transaction. What distinguishes this purchase is its size and timing, not its rate: a single transaction at operator launch that simultaneously capitalises Stichting Outpapier and funds the founder's initial archive. Subsequent ingestion against any operator, including Stichting Outpapier, is priced at the v1 schedule of §18.3 with the same §4.3 volume tiers available to any buyer.
19. Audit, Repair, and Availability
Long-term preservation requires more than storing bytes once. EFS operators verify continued possession and integrity, and repair from healthy replicas when corruption is detected.
19.1 Audit
At minimum, an operator can be challenged to provide proof of possession of a stored package part, and the response can be verified against the public ciphertext digest. The whitepaper does not specify a particular proof-of-storage construction; a pragmatic challenge-response scheme is sufficient for v1.
A routine retrieval also serves as a client-side integrity audit: the retrieved bytes can be verified against the canonical ciphertext digest immediately and against plaintext digests in the encrypted manifest after decryption. Publishers and recipients therefore do not need to trust operator-side audit alone; their own access patterns generate independent integrity evidence. Partial retrieval (§9.4) lets this audit operate on a sample of frames from a large package without restoring the whole package.
19.2 Availability
Each EFS node publishes signed claims about its own holdings: which package parts it holds at the full redundancy floor, which it has released to the tape-only floor, which it has contracted to the vault-only floor, and which it has withdrawn. A node also publishes its current online-availability posture, including any scheduled-online-hours regime in force under operator-pressure response (§15.4). A node is authoritative only for claims about itself; there is no global consensus.
Other nodes and clients aggregate these claims into local availability indexes. Availability claims are renewable assertions with an expiry time; if a node does not refresh a claim, peers treat it as stale. This avoids permanent accumulation of outdated information when nodes lose parts, lose funding, or leave the network.
The detailed wire format for availability propagation, including snapshot cadence, delta encoding, and subscription mechanics, is specified in a separate availability protocol document. The principles the whitepaper commits to are:
- nodes sign claims about themselves;
- claims expire and must be renewed to remain effective;
- mute state from CPR overrides availability (§20);
- there is no single canonical global state.
19.3 Metadata distribution
The metadata that supports lookup and indexing, including the AssetID-to-PackageID-to-PartNr mapping, is replicated across the federation so that any node can answer queries about which packages exist for a given AssetID. In v1 each node carries metadata for the entire network, including parts it does not itself store. Lighter participation modes that maintain metadata only for assets a node stores are an optimisation deferred to a later iteration.
19.4 Legal export
Availability metadata circulating across the network is not the same as transferring encrypted bytes to another node. A node learning that a part exists is not a node receiving the part. Storage of a part on a node in another jurisdiction occurs only through an explicit storage instruction, typically the publisher making a Bitcash payment to that node. The instruction together with the payment is the contractual act that places the part with that operator.
20. Muting and Publication Control
CPR can mute an AssetID. A mute is publication control, not deletion of history. The AssetID's claims, licence-ownership records, and sealed packages remain in evidence. What changes is that compliant Catalog services stop publishing, advertising, resolving, or serving the asset.
20.1 Propagation
CPR publishes a signed mute feed. EFS nodes subscribe to that feed, either from a CPR node directly or from another EFS node carrying the feed. When a subscription is interrupted, a node pulls the missed entries on reconnection.
A CPR mute event names an AssetID and carries a signed effective-from timestamp. EFS nodes apply the mute on receipt.
20.2 EFS suppressions
When an AssetID is muted, EFS suppresses, for every PackageID and part rooted at that AssetID:
- public listing of its packages and parts;
- public availability claims advertising the package;
- public retrieval;
- acceptance of new wrapped-key records (§12) addressed against the AssetID;
- repair and republication workflows that would publicly advertise the package.
EFS may retain internally:
- the signed registration records;
- the encrypted bytes;
- the audit trail;
- payment and accounting records;
- existing wrapped-key records;
- the muted-state mapping itself.
20.3 Override semantics
An active mute state overrides any prior availability claim. A client lookup proceeds in this order:
- Determine the AssetID from the query, the PackageID prefix, or Library state.
- Check the current mute state for that AssetID.
- If muted, suppress the public response.
- If not muted, evaluate availability and retrieval normally.
20.4 Container assets
Muting a container asset cascades through the CPR-recorded asset hierarchy to its children. EFS receives mute events for the affected AssetIDs from the CPR feed and suppresses each accordingly. EFS does not itself walk the hierarchy.
20.5 What muting cannot reach
Muting suppresses publication by compliant Catalog services. It cannot retract material that has already been retrieved, decrypted, copied to USB, screenshotted, or republished outside the network. The whitepaper acknowledges this limit explicitly: muting is publication control, not recall.
21. Open Questions
The following questions remain open for later iterations of this whitepaper and for the specifications that will accompany implementation:
- the precise replication protocol for wrapped-key records and revocations across the mirror set, including acknowledgement semantics on operator-side forwarding (§12.9) and the local-cache policy for Catalog.ID standing checks on the retrieval verification path (§12.6);
- the wire format of CFC v1 and the introduction of chunked, indexed, or stream-oriented format profiles in later versions;
- the detailed specification of the availability protocol, including snapshot cadence, delta encoding, and subscription semantics;
- the exact interface and validity rules for signed CPR claims presented at write time;
- the specification of
set_currentrecords and the rules under which a non-latest serial may be designated current; - the audit construction beyond pragmatic challenge-response, including whether a future profile adopts proof-of-storage cryptography;
- the rules under which an EFS node may transition a stored part through suppressed, quarantined, or purged states;
- the precise wire-level handoff between Asset Market settlement and EFS when a buyer's wrapped-key record is produced at sale time (§12.4): the order of operations between Catalog-side terms capture, Bitcash settlement, the seller's wrap, and the post to the mirror set;
- the format of the availability-claim signal raised when an operator has contracted to the vault-only floor and the standard client behaviour when a retrieval is requested against such a claim;
- the criteria, hysteresis, and capacity rules governing the operator's FAST cache heuristic, and whether any of these become protocol-normative rather than purely operator-local;
- the lighter metadata-distribution mode in which a node carries metadata only for assets it stores, as an alternative to v1's full-network replication;
- the long-term economic model of eternal-storage-on-first-payment, including operator obligations on technology migration over decades, the impact of cost evolution on the v1 platform pricing, and what happens when an operator exits the federation;
- the formal specification of the tape migration protocol: the verification procedures applied during each fifteen-year rewrite, the integrity-audit cadence between migrations, and the contingency plan if an LTO generation falls out of manufacturer support sooner than current trends suggest;
- the formal specification of the disk-volume migration protocol: when an operator brings up a new generation of volume capacity, the verification and retirement procedures applied to the migrated children, and how the parent-pointer rewrite is staged so that no part-lookup ever resolves through a partially-migrated chain;
- whether platform-wide pricing can hold across operators in materially different cost-structure jurisdictions, or whether jurisdiction-aware pricing tiers will be needed in a later protocol version;
- the cross-operator redundancy model: whether the protocol should support coordinated replication across multiple operators (so that a publisher seeking redundancy beyond a single operator's redundancy floor does not need to upload independently to each), and how the ingestion fee composes when a single submission lands at multiple operators;
- who ultimately bears the cost of retrieval (viewer, publisher, Asset Market settlement) and how the routing between Bitcash wallets is resolved at the Library and Market layer;
- whether the canonical lifecycle timings (4 hours to second disk copy, 24 hours to first rolling-rotation cartridge, ~2 weeks to all three rolling-slot cartridges) are protocol-normative or operator-suggestive, and the conditions under which an operator may deviate from them;
- the bucket model for tape writes: bucket size and bucket-flush policy under the rolling-rotation mechanism, and whether any of these become protocol-normative;
- the precise definition of "customer-initiated retrieval" that resets the idle clock, including how to classify edge cases such as recipient access through a Library-mediated bundle, agent-driven retrievals against a wrapped-key record, and bulk-restoration sequences after local-copy loss;
- the operator's storage-unit migration cadence: whether the protocol sets a normative maximum interval between migrations, or leaves the cadence purely operator-local subject to the requirement that the idle thresholds are eventually applied;
- the operator's process for declaring a "new-generation LTO library attached" event under §16.3, including the chosen value of
N(informed by the new generation's backward-compatibility window), the announcement to publishers, the scheduling of the migration project, the rate at which the project progresses, and the registry artefacts that record the event and its completion; - the rolling rotation operational specifics: the precise weekly cadence, the catch-up-before-disk-release sequencing rule under §16a.2.4, the rules governing off-site rolling holding (single vs multiple facilities), and any normative requirements that the three rolling-slot cartridges be split across geographically distinct off-site holding locations;
- the v1 Nearline retrieval rate and whether it should remain pegged to the ingestion rate or float separately as the cost basis of tape rehydration evolves;
- the Vault batch retrieval API: the exact request and response formats, the validity of an issued quote, the cancellation and refund behaviour if the operator misses the estimated time to fulfilment, and the rules for repeat batches against the same parts;
- the precise definition of the conditions under which an operator may invoke vault-only contraction or scheduled online availability under operator-pressure response: what counts as "sustained energy-cost or supply-disruption pressure", how operators declare and announce it, what notice publishers receive, and whether a coordinated federation-level signal is needed when multiple operators face correlated regional pressure;
- the eligibility rules for vault-only contraction (which packages an operator may transition under pressure, in what order, and whether publishers may opt packages into or out of contraction in advance);
- the wire format for scheduled-online-hours availability announcements, the queuing and expected-service-time semantics for retrievals submitted outside an operator's declared windows, and the refund or cancellation behaviour when a window slips;
- whether a future version supports packages that intentionally span multiple AssetIDs without relying on a container-asset construction;
- whether a future version introduces layered encryption with an outer ingress-operator wrapper and what the operational and wrapped-key consequences would be;
- whether the v1 16 GiB part-size limit holds in the face of evolving residential bandwidth and operator hardware, or should be raised in a later protocol version;
- the bound on the volumeID width (4 hex characters supports 65,536 volumes per operator) and the migration path if a single operator ever approaches that limit;
- the v1 100 GiB tar-chunk size: whether it is the right balance between mount-amortisation and over-fetch on tape retrieval, and whether a future protocol version should adopt a different size or per-tape-generation sizing;
- the handling of the parts that travel out of tape with a requested chunk but were not themselves requested: whether they are kept available in INCOMING for a 24-hour coalescing window only, whether they may be promoted to Online state if the retriever asks for them within that window, whether a chunk-mate batching discount applies, and whether any of these become protocol-normative;
- the FAST cache path schema and the criteria, hysteresis, and capacity rules governing the cache heuristic, and whether any of these become protocol-normative rather than purely operator-local;
- the second Nearline copy reserved for a future hardware-certification level, and the conditions under which the certification will be activated;
- the precise scope of the certified-platform license under which operators run (operating system, filesystem, registry conventions, directory layout, and the closed list of approved hardware suppliers and models for storage units, tape libraries, and tape drives): what is mandatory, what is recommended, the cadence at which the hardware schedule is revised as new generations supersede earlier ones, the procedure by which an operator on a previous generation migrates forward, and how a candidate operator demonstrates compliance before accepting customer ingestions;
- the wire-level metadata exchange under which an operator's hardware and registries can be transferred to another certified operator on cessation of operation, and the verification procedure under which the receiving operator confirms continuity of every package's redundancy floor;
- whether the five-year and ten-year idle thresholds are the right fixed values, or whether a future protocol version should adjust them as disk-cost economics evolve.
22. Design Principles
- EFS is the storage vertical. It preserves and serves encrypted file bytes. It does not describe, claim, identify, or sell.
- AssetID is the namespace, not the key. Identifiers are public names; encryption uses fresh random per-package keys.
- One package, one asset. Each package refers to exactly one AssetID. Hierarchies live in CPR as container assets.
- Packages are the unit of submission and retrieval. Multi-part packages exist when total ciphertext exceeds the platform part-size limit, or when parts are produced incrementally; parts carry a uniform five-digit zero-padded PartNr counter, and the package's part count is asserted via an optional signed seal marker.
- Packages are immutable. Revisions create new generations; old generations remain as evidence.
- EFS records currentness; Library reads it. The latest serial is current by default; explicit overrides are signed.
- One part, one canonical ciphertext. All replicas store the same encrypted bytes.
- Active Catalog.ID required on both sides. Writers and retrievers are both active Catalog.ID identities at the moment of every operation. Pseudonymous publication remains available within the Catalog.ID model; identity-free publication is not a v1 path.
- Authorisation is rooted at CPR and Catalog.ID. EFS does not maintain its own authorisation model; for every write it asks CPR whether the signing key is authorised for the AssetID and Catalog.ID whether the principal's account is active.
- Federation by signed local truths, not global consensus. Each node is authoritative only for itself.
- Pay once, stored forever. The ingestion fee funds eternal preservation. Storage is not renewed and does not lapse to deletion.
- Storage states, not tiers. Online, Standby, Controlled Offline, Rolling Online, Rolling Offline, Nearline, and Vault are physical states a copy lives in; a package normally has copies in several of them simultaneously, presented in the publisher's UI as state checkboxes. The two Rolling states are transitional tape states under the three-slot rolling rotation (§15.6); Nearline and Vault are the terminal sealed tape states.
- Vault is the unconditional floor. All other states are conditional on the operator's ability to maintain online infrastructure; the air-gapped Vault pair is what carries the eternal-preservation commitment under any pressure short of physical destruction of both vault sites.
- Convenience is best-effort, preservation is unconditional. Online availability is a normal-conditions commitment that may be compressed under sustained pressure. The Vault floor and the fifteen-year migration cycle that maintains it are not.
- Volumes are labelled, registered, and migrated. Every labelled volume has a unique permanent label, a row in the operator's volume registry, and a controlled migration path forward. Buckets inside a volume are write-once on close, with at most one writable bucket per volume at a time. Each storage unit hosts its own writable bucket; multiple storage units produce multiple writable buckets in parallel.
- Operators run a certified platform. XFS, the operating system, the directory layout, the registry conventions, and a closed list of approved hardware suppliers and models for storage units, tape libraries, and tape drives are specified by license, so operator infrastructure is portable between operators on cessation of operations down to controller firmware and library robotics.
- INCOMING and FAST are volume-agnostic. All inbound traffic enters through INCOMING; all outbound tape rehydration also passes through INCOMING. FAST is the operator's frequency-driven cache. Neither counts toward the redundancy floor.
- Tape boundaries do not follow disk boundaries. Tape writes are organised into 100 GiB tar chunks that aggregate parts from multiple buckets across multiple storage units; the tar chunk is the unit of tape work and the unit of Nearline retrieval pricing.
- Self-hosting is permitted, not relied upon. Institutional storage carries the durability guarantee. In v1 a self-hosted hub serves only the hub operator's own machines, holding the operator's self-produced packages plus packages for which the operator holds an effective wrapped-key record; hub-to-other-member peer-to-peer distribution is a planned post-v1 feature (§17.6).
- Muting suppresses publication, not history. Mute propagates from CPR; EFS suppresses listing, availability, retrieval, and the acceptance of new wrapped-key records against the muted AssetID.
- Post-quantum by default. Hybrid classical and post-quantum primitives apply at every authenticated layer.
- Wrapped keys live with the ciphertext. EFS holds each package's wrapped-key records in its own database alongside the ciphertext; one record per
(asset_id, package_role, recipient_party_id)triple, recipient-bound from inception, no handshake. The active-Catalog.ID and active-efs_encrypt-key rechecks on retrieval bound the damage of any key compromise to material already retrieved at the moment of revocation. - EFS's own database is preservation-critical. The wrapped-key records and registries that make ciphertext interpretable are preserved on the same media regime as the ciphertext itself (§16a): two preservation streams (WAL and base snapshots) carried as system packages through the full storage-state lifecycle including the unconditional Vault pair, plus a parallel plain-text log on a separate drive for commit-time redundancy.
23. Closing
Encrypted Filestorage is the storage layer of the Catalog ecosystem.
CPR mints AssetIDs and records the durable claims that bind an asset to its ownership and provenance. The CMS expresses what assets currently mean and how they are organised, described, and licensed. Asset Market brokers offers and records the bilateral agreements through which licenses and decryption grants are exchanged. Bitcash meters and settles the operations that make all of it run. Catalog.ID identifies the people behind the keys; in v1, every writer to EFS and every retriever from EFS is an active Catalog.ID identity, and revocation of that identity's efs_encrypt keys stops further retrievals immediately through EFS's wrapped-key verification path.
Encrypted Filestorage stores the bytes and the wrapped keys. Sealed, addressable, replicated, auditable, retrievable, metered. Recipient-bound from inception, no handshake.
The preservation problem EFS targets is real and structural: digital data has no durable medium with a hundred-year reader story, no equipment line with a hundred-year support window, and no energy or supply outlook stable enough to underwrite an unconditional 24x7 online promise across the relevant horizon. EFS responds by separating the unconditional preservation commitment (rooted in the air-gapped Vault pair and a fifteen-year tape-migration cycle) from the convenience commitment of online retrieval (held under normal conditions, scaled back through named pressure responses when conditions are not normal), and by combining a one-time ingestion payment with metered retrieval rather than a subscription that nobody can credibly price across a century.
By rooting storage identity in AssetIDs, by aligning write authorisation with CPR, by deferring currentness queries to a thin lookup that the CMS consumes, by exposing a single small CFC container format under a clear post-quantum encryption model, by co-resident storage of ciphertext and the wrapped-key records that admit recipients, by stating its storage states and lifecycle plainly, and by documenting the volume, bucket, and database-preservation architecture that the long-term migration story will run on, EFS aims to be one of the simplest Catalog verticals: a paid, encrypted, immutable, append-only file store with a small wrapped-key gate, doing its job and staying out of the way of the modules above it.