Encrypted Filestorage
Encrypted File Storage and Retrieval for the Catalog Ecosystem
Working title: Encrypted Filestorage
Working URL: catalog.org/efs/
Author: Roberto Bourgonjen
Last updated: 2026-05-07
1. Introduction
Encrypted Filestorage (EFS) is the storage service of the Catalog ecosystem. Where the other Catalog modules describe what assets mean, who they belong to, and how they are exchanged, EFS preserves and serves the actual encrypted file bytes.
The Catalog ecosystem comprises:
- Catalog.ID: pseudonymous identity for participants and signed actions.
- Asset Registry: durable signed claims about digital assets, AssetID allocation, and ownership records.
- Asset Library: mutable description, relationships, articles, editions, and active licensing state.
- Asset Market: bilateral sales agreements through which licenses, deliverables, and decryption grants are exchanged.
- Bitcash: prepaid micropayment and metering layer for service consumption.
- Encrypted Filestorage: preservation, replication, retrieval, and key management for encrypted files.
EFS is intentionally narrow. It does not mint AssetIDs, describe assets, broker sales, or define what content is. It accepts encrypted files that an authorized party has registered against an existing AssetID, replicates them across operators and jurisdictions, serves them upon authorized retrieval, and meters its services through Bitcash.
EFS is positioned as long-term preservation infrastructure rather than as a working file share. The expected use case is a publisher (an artist, photographer, musician, writer, archivist, or institution) depositing encrypted copies of their work for preservation and distribution, with the publisher's primary working copy remaining on their own machine. Catalog positions EFS explicitly as a candidate standard for the long-term preservation of digital assets, including assets that should outlive their authors. Section 2 develops the preservation problem this is intended to address; section 3 describes the approach EFS takes to it.
EFS imposes no Catalog.ID requirement. The only credentials it requires are an AssetID registered with Asset Registry, a funded Bitcash wallet, and a signing key that Asset Registry recognises as authorised for the AssetID. This makes anonymous publication possible while leaving identity-bound publication available for those who want it.
2. Problem Statement
EFS is intended for the long-term preservation of digital assets. Long term here means decades to centuries: a horizon over which the author may no longer be alive, the publishing organisation may no longer exist, and the technology landscape will certainly have changed several times. Catalog positions itself explicitly as a new standard for this kind of preservation. The remainder of this section examines why long-term preservation of digital data is genuinely hard, and why the conventional answers are not adequate to the horizon Catalog targets.
2.1 No durable medium
Unlike paper, whose printed contents can be preserved for hundreds of years under reasonable conditions, digital data depends on a continuously evolving stack of media, formats, drives, and software. Every layer of this stack is subject to obsolescence on a timescale much shorter than the preservation horizon.
The most durable optical medium currently available is the 100 GB M-DISC Blu-ray. NIST's digital evidence preservation guidance lists M-DISC as acceptable for archival use, with a manufacturer-claimed longevity in the range of one hundred years. Two structural problems still apply.
First, the medium is not the limiting factor: the reader is. The optical-disc ecosystem has been contracting for years. Sony ended production of Blu-ray Disc media in February 2025 with no successor models. Drive manufacture and recordable-media supply now depend on a small number of vendors. Whether reading hardware will still be manufactured, supported, and serviceable in fifty or one hundred years is unknown, and the trend points the wrong way.
Second, 100 GB is too small for serious archival use. A professional archive of image and video material commonly runs to tens of terabytes. A 30 TB archive needs three hundred discs, which makes routine deposit and retrieval impractical without a robotic library, and three hundred discs is one publisher's body of work, not the volume an archival service must handle.
2.2 Tape: long media life, short equipment availability
Linear Tape-Open (LTO) is the industry's dominant cold-archive medium. Manufacturer-rated archival life of modern LTO cartridges is around thirty years under controlled conditions, and tape stored unpowered consumes no energy.
The medium-life figure understates the actual constraint, which is again the equipment. Each LTO generation has a finite production and support window. The LTO consortium's read-back specification has historically extended two generations back, so a cartridge of any given generation has a practical read window bounded not by the medium but by drive supply for itself and for the two generations that follow it. LTO-4, released in 2007, has no current production in 2026; a contemporary LTO-4 deployment depends entirely on the refurbished-drive market. The constraint is also tightening: LTO-10, released in 2023, broke with the historical pattern by reading only LTO-10 cartridges, which means the practical read window for current generations is narrower than for earlier ones. A tape committed to the vault today will outlive several generations of drive availability over a century-scale horizon, on a curve that has recently bent against the publisher.
Cost evolution is also non-monotonic. The capacity ratio between LTO-10 (30 TB native) and LTO-9 (18 TB native) is a factor of about 1.7, but new-tape pricing in 2026 is roughly 250 EUR for an HPE LTO-10 cartridge against around 78 EUR for an HPE LTO-9 cartridge. The two generations are not backward compatible, so adopting LTO-10 means buying both new media and a new drive. Patent disputes have in the past blocked manufacture of certain LTO generations for years, and similar disputes can recur, affecting either tapes or drives.
In summary, tape has the lowest cost per gigabyte for cold storage, the longest unattended lifetime of magnetic media, and effectively zero idle energy. But over a horizon aimed at eternity the total cost is substantial, the cost trajectory is unpredictable, and the supply chain may face disruptions that cannot be foreseen.
For routine retrieval, tape is unsuitable on its own because of mount and seek latency. Tape is part of an answer, not the whole answer.
2.3 Spinning disk and SSD: short life, energy-bound
The next candidate is online disk: spinning hard drives or solid-state drives. These offer immediate retrieval and conventional service models, but for preservation they introduce two problems that grow worse the longer the horizon.
Energy. Powered drives consume electricity continuously. Energy prices in much of Europe have risen sharply in recent years, the supply mix is shifting, and the continuity of cheap and abundant electricity is itself an open question for the coming decades. The cost of keeping a multi-petabyte fleet powered 24x7 over a century is not just high, it is unknown, and committing to that profile in a fixed-fee preservation service is a commitment to costs that cannot be predicted.
Drive availability. The number of independent hard-drive manufacturers has shrunk to three. Manufacturing depends on increasingly complex processes and tightly-held patents. As of 2026, retail channels in the Netherlands restrict drive purchases per customer (one drive per order at Azerty, four at Alternate), and orders that exceed the limit are cancelled. Replacing a sixteen-drive RAID set has become an exercise in working around purchase quotas. SSDs face similar concentration in NAND fabrication. A long-term commitment to disk-based preservation is a commitment to a supply chain that is narrowing, not broadening.
2.4 Cost predictability
A service that promises preservation without recurring fees is a service that must price in all future costs at acceptance time. With tape, that means future migration and equipment cycles. With disk and SSD, that means future energy. Both involve forecasts of decades to a century, against media, equipment, energy, and supply-chain trajectories that have shown they can move in surprising directions.
Even with an absurd price tag, a service that offered a hard guarantee of free 24x7 online availability indefinitely would not in fact be able to keep the promise across a deep recession in supply, a sustained energy crisis, or a regional disruption. The honest design choice is not to make a promise that cannot be kept.
2.5 Distribution versus control
Even with perfect storage media, long-term preservation faces a logical problem that traditional publishing systems compound rather than solve. Survival is a numbers game. The probability that a work survives a century is dominated by how many copies of it exist and how widely they are distributed. A 1750 book printed in 10,000 copies, scattered across libraries, private collections, and used-book shelves, is almost certain to have descendants alive today. A 1750 book printed in 100 copies, kept in one workshop, faces overwhelming odds of being lost: fire, flood, disinterest, a single bad actor in the chain of custody, a war that incinerates the city. Archivists and book historians can read survival rates off issue counts directly. Few copies and narrow custody mean extinction risk; many copies and wide custody mean survival.
Publishing systems that aim to control access to their material are forced to be strict in three directions, all of which fight redundancy:
- Access authorisation. Only paying or licensed readers should reach the content, so wide redistribution undermines the access regime: every copy in unauthorised hands is a leak.
- Copyright. Many works cannot lawfully be redistributed in digital form, so a custodian who forwards bytes to a third party may be infringing on the publisher's behalf.
- Provenance. Readers and licensees need confidence that the bytes they receive really originate from the claimed publisher and have not been altered or substituted, and loose distribution creates substituted-in-transit risk.
To honour these three, traditional systems narrow their custody: a small number of trusted machines, a small number of trusted operators, no redistribution rights, no third-party caching. The unintended consequence is direct and severe. The protections built to keep the work safe make it more likely to be lost. A small custody surface is structurally more vulnerable to single-point catastrophe than a wide one, and the very mechanisms that enforce 1–3 increase preservation risk. Honouring access control and honouring durability pull in opposite directions, and a long-horizon preservation system that does not resolve this conflict will eventually fail at one or the other.
2.6 Summary
Long-term digital preservation must be designed against the realities of:
- no durable digital medium with a hundred-year reader story;
- tape that is cheap to hold but expensive and risky to migrate forever;
- disk that is fast to retrieve but energy-bound and supply-bound;
- a cost trajectory that cannot be predicted over the relevant horizon;
- a structural conflict between wide distribution (which preservation requires) and access control (which publishers require), where traditional systems have resolved against distribution and therefore against survival.
A serious preservation service has to combine media, has to plan for migration, has to decouple the unconditional preservation commitment from the convenience commitment, has to price its service so that it can survive cost shocks without breaking faith with the publishers who paid into it, and has to resolve the distribution-control conflict structurally rather than by choosing one side. Section 3 describes the approach EFS takes.
3. Catalog's Approach
EFS responds to the preservation problem with seven connected design choices.
1. Encryption at the format layer, with access control separated from storage. A package accepted by EFS is sealed once at ingestion using authenticated symmetric encryption. The ciphertext is the unit of storage and transport throughout the system. Storage operators, peer mirrors, and self-hosted hubs hold and serve ciphertext only; none ever see plaintext or hold keys to recover it. Authorisation, licensing, and revocation live in the Key Registry (§12), which holds wrapped key material that only intended recipients can unlock with their own private keys.
This separation is what allows EFS to chase wide redundancy without breaking the publisher's grip on access. Encrypted bytes carry no information to anyone who lacks the decryption key, so the publisher and the legal regime around the work have no preservation-driven reason to constrain where the ciphertext lives. Federation operators across jurisdictions, self-hosted hubs at the user level (§17), peer mirrors, and archive caches can all hold copies of the same package without breaching the constraints of §2.5. Wide distribution is now compatible with strict control rather than at war with it.
Encryption also delivers cryptographic provenance. Every package is signed at ingestion with the publisher's signing key and timestamped against the ingestion event the federation records. A reader can independently verify that the bytes they hold are the bytes the publisher signed, and that they were sealed no later than the recorded timestamp. Substitution and forgery are detected by signature failure rather than by trust in the storage operator.
The encryption is end-to-end at the user layer, not the application layer. The decryption key is held by the recipient outside the storage system, and EFS never sees it. This distinction matters. Most consumer "end-to-end" services, WhatsApp included, generate and manage keys inside the application itself; the user is never asked to write down the key, and the app holds it. The practical result is an app-to-app channel, secure against external eavesdroppers but defeated by anyone who can access the app's key store on either device, including the app vendor under legal compulsion. EFS asks the user to hold the key, and the protections rest on the user holding it.
A decryption key is also small enough to fit on a piece of paper, and paper, unlike digital media, costs nothing to preserve and survives for centuries under ordinary conditions. A user can print the key as a hex string or QR code on paper, or write it down by hand, then place it in a safe or distribute it as secret shares across trusted parties. Modern inks are highly durable, so the resulting paper inherits the survival horizon of the analog medium for the small artefact that gates access. The expensive durability problem set out in §2 applies to the encrypted bytes, which are large and need active preservation; the key that decrypts them sidesteps that problem entirely, on a medium that has already proven itself across centuries.
The rule is always encrypt, not "encrypt confidential material and skip encryption for public content". Even archives that hold predominantly public material carry caveats: copyright wrappers around third-party items, embargo periods, donor access conditions, redactions for living individuals, jurisdictional clearance differences. An architectural rule that encrypts everything absorbs the entire class of edge cases without forcing the archive to maintain two pipelines. The verifiability benefit alone justifies the small overhead, particularly in an era of trivially generated forgeries: an institution that publishes through EFS gives its readers an instrument they can independently use to verify what was actually published, and when. That assurance is at least as valuable to public archives as it is to commercial ones, and arguably more so, since public archives are exactly the targets attackers most want to forge.
2. Multiple storage states, with offline tape as the unconditional floor. A package accepted by EFS is held on multiple physical copies across multiple media. Online disk is the primary serving copy. Tape inside an automated library serves as an online recovery copy. Offline tape, ejected from the library and stored off-site or in a data-safe vault, is the unconditional preservation copy: physically unreachable from the operator's network, immune to ransomware, credential compromise, malicious or mistaken commands, and replication-cascade errors. Section 14 develops the storage states; section 16 develops the redundancy floors.
3. Separately stated convenience and preservation commitments. Online retrieval is a convenience commitment, offered under normal operating conditions. Preservation on offline tape is an unconditional commitment, kept regardless. When sustained external pressure (energy cost spikes, supply disruption, regional crisis) threatens an operator's ability to keep online infrastructure running, the operator may scale back the convenience commitment, retire online copies into lower-energy states, or in the extreme suspend online availability for the duration of the pressure event, while continuing to honour the offline-tape preservation commitment. The publisher's reasonable expectation is therefore not "always online", but "always preserved, online under normal conditions". Section 15 develops the lifecycle and pressure responses.
4. Pay once for ingestion, pay per use for retrieval. A single ingestion fee funds preservation indefinitely. Retrieval is metered separately. There is no subscription, no renewal, no expiry. The lifecycle is governed by two fixed idle thresholds: a package that has not been retrieved for five years becomes eligible to have its disk-side copies moved to Standby (low-power disk) at the next storage-unit migration, and a package that has not been retrieved for ten years becomes eligible to have its disk-side copies released entirely at the next storage-unit migration, leaving it on tape only. A retrieval rehydrates the package back to full availability and resets the idle clock. Sections 15 and 18 develop the lifecycle and pricing.
5. Best-effort online, guaranteed preserved. Online availability is best-effort: the operator commits to the convenience tier under normal conditions, with named pressure responses for the degraded conditions in which it cannot. The preservation commitment, by contrast, is unconditional and rests on the offline-tape pair. This separation makes it possible to honour the long-horizon promise without a forecast of energy and supply costs that nobody can credibly make.
6. Continuous migration as a planned operational cost. Tape is rewritten on a fifteen-year migration cycle (well inside the manufacturer-rated medium life, well inside the historical drive-availability window). Disk volumes are migrated forward as new generations of capacity arrive. Migration is planned, scheduled, and budgeted from the ingestion fee. Section 13 develops the volume migration model.
7. A certified operator platform for interoperability and handover. EFS operators do not run arbitrary infrastructure. The EFS software is licensed under terms that specify a certified platform end-to-end: the operating system, the filesystem (XFS), the volume labelling and registry conventions, the on-disk directory layout, and a closed list of approved hardware suppliers and models for storage units, tape libraries, and tape drives. Approved hardware is named per-generation in a versioned hardware schedule that accompanies the license; an operator builds new capacity from the schedule current at the time of acquisition and migrates forward as later generations supersede earlier ones.
The certification has two purposes. First, operator infrastructure built today is interoperable with infrastructure built by other operators today, down to controller firmware and library robotics, so cross-operator placement is not complicated by silent format or behaviour differences. Second, if an operator ceases operation, its hardware can be transferred to another certified operator and integrated without on-disk conversion and without firmware-quirk reverse-engineering: the volumes mount, the labels resolve, the registries import, the tape libraries accept the cartridges, and the archive continues to be served.
A class-of-hardware specification (any RAID controller of class X, any tape library of class Y) leaves enough room for compatibility surprises that long-horizon preservation cannot afford; pinning to specific suppliers and models removes that room. Section 13 develops the architectural specifics; the hardware schedule itself is maintained as a versioned annex to the license.
The remainder of this whitepaper specifies these mechanisms and the protocol surfaces that expose them.
4. Scope
Encrypted Filestorage is a paid, encrypted, append-only storage network for digital files associated with AssetIDs registered in Asset Registry.
EFS is concerned with:
- packaging files into immutable encrypted containers;
- assigning stable AssetID-scoped identifiers to those containers;
- storing them across operators, jurisdictions, and storage media;
- replicating, auditing, and repairing them;
- maintaining the key material that authorised parties use to decrypt them;
- propagating availability information across the federation;
- serving them on authorised retrieval;
- metering storage and retrieval through Bitcash.
EFS is not responsible for:
- AssetID minting or registration (Asset Registry);
- ownership records or attribution claims (Asset Registry, Asset Library);
- mutable description, tags, articles, collections, or editorial state (Asset Library);
- offers, acceptance, payment routing, or sales agreements (Asset Market);
- user identity (Catalog.ID, optional);
- adjudicating what content is.
5. Identifiers
EFS uses three identifiers, each with a distinct role.
5.1 AssetID
An AssetID identifies a logical asset registered with Asset Registry. It is the namespace under which all EFS storage for that asset is rooted. AssetIDs are minted by Registry, sold for BIT, and authorise subsequent storage actions. EFS does not allocate AssetIDs and treats them as opaque tokens for namespacing and authorisation.
5.2 PackageID
A PackageID identifies one logical package generation under an AssetID. A package is the unit of submission and retrieval (§6). Its form is:
{assetID}.{operatorID}.{role}.{serial}
Where operatorID identifies the EFS operator that accepted the package, role is the package role (§6.4), and serial is a per-role generation number that increases monotonically.
Example:
qjrm4821xwpa.b4np.source.000001
qjrm4821xwpa.b4np.preview.000003
qjrm4821xwpa.b4np.edition.000002
5.3 PartNr
A package whose total ciphertext size exceeds the package part limit (§6.2), or whose parts are produced incrementally rather than as a complete batch, is split into parts. PartNr is a five-digit zero-padded counter, platform-uniform within a protocol version:
p00001 (first part)
p00472 (four-hundred-seventy-second part)
p99999 (last permitted part of the generation)
The fully-qualified address of a part appends the PartNr to the PackageID:
qjrm4821xwpa.b4np.source.000001.p00007
The PackageID alone is the address of the package; the part address requires the PartNr suffix in all cases, including single-part packages where it is always p00001. Higher-level references (Asset Library locators, public package URLs) use the PackageID and resolve to part addresses on retrieval; operator-level operations (storage, replication, audit, retrieval) act on part addresses.
PartNr is a structural attribute of a part inside its package, not a global identifier. Numbering starts at p00001, so the five-digit width caps a package generation at 99,999 parts. Combined with the 16 GiB part-size limit (§6.2), this is roughly 1.5 PiB of ciphertext per generation: about 87 years of continuous capture at a 5 Mbps security-camera bitrate, or 3.8 years at 100 Mbps high-bitrate cinema capture. A publisher whose package approaches the limit seals the current generation and continues under the next serial (§7).
The current part count of a package is the number of parts EFS has accepted under its PackageID. A publisher may submit an optional signed package-level marker that asserts a final part count, sealing the generation against further parts. Submitting the marker at first-part time gives a batch package a fixed, advertised part count from upload onward; deferring it (or never submitting it) leaves the package open to further parts. Each part binds to exactly one canonical ciphertext digest (§10) under the same rules that apply to single-part packages.
5.4 Identity rules
Each package refers to exactly one AssetID. Multi-asset packages are not supported. Assets that group other assets are recorded as container assets in Asset Registry (§6.4) rather than as packages spanning multiple AssetIDs.
The AssetID portion of an identifier is a namespace, not a cryptographic key. Encryption uses fresh random per-package keys (§11), wrapped through the Key Registry (§12). The identifier names the object; the cryptographic key protects it.
5.5 Identifiers and digests
EFS distinguishes between naming and verification:
- PackageID and PartNr name the object.
- Ciphertext digest verifies the encrypted bytes.
- Plaintext digests in the encrypted manifest verify individual files after decryption.
Any party with the encrypted bytes can verify the ciphertext digest. Plaintext-level verification requires possession of the decryption material.
5.6 On-disk filenames
The bare address is the canonical identifier of a part. When parts are materialised in a general-purpose filesystem (self-hosted hubs §17, client-side download caches, legal exports §19.4), the on-disk layout is a directory per package and a file per part:
qjrm4821xwpa.b4np.source.000001/
├── p00001.cfc
├── p00002.cfc
└── p00003.cfc
The package directory name is the PackageID; each part file is named by its PartNr with a .cfc extension. A signed seal marker (§5.3), if submitted, lives alongside the parts in the same directory. Single-part packages get a directory containing one p00001.cfc file; the directory is retained as a uniform shape rather than collapsed to a single file at the package's path.
This convention applies only to general-purpose filesystem materialisation. Operators store parts internally on volumes, buckets, and tar chunks under their own conventions (§13).
6. Packages
The unit of submission and retrieval in EFS is a package. A package is a sealed encrypted container of files associated with a single AssetID under a single role (§6.4). Files travel into and out of EFS as packages. A package is what the publisher pays to ingest and what a retriever receives on retrieval.
6.1 Why packages
A package is the right unit of operation for several reasons. Most preservation submissions are not single files: a source deposit is a master file plus its sidecars, an edition deposit is a browser-ready bundle with payloads and signatures, a preview deposit is a small set of derivatives. Packaging them together gives them a single ciphertext digest, a single signature, a single Key Registry entry, a single audit unit, and a single price. It also keeps the encryption boundary tight: the public header carries identifiers and digests, and everything else (file names, file structure, file count, plaintext) is inside the encryption.
6.2 Single-part and multi-part packages
A package whose total ciphertext fits within the platform part-size limit is a single-part package (p00001). A package whose total ciphertext exceeds the limit is split into parts, each addressable independently for replication, audit, and transport, but only the complete package can be decrypted: the encrypted manifest binds all parts together.
The v1 part-size limit is 16 GiB (17,179,869,184 bytes) per part. The choice balances several considerations:
- Client reliability. A part is the granularity at which a retry happens. At 16 GiB, a residential 1 Gbps link transfers a part in roughly 140 seconds, a 100 Mbps link in roughly 22 minutes. Both are short enough that a transient network failure costs at most one part-retry rather than a multi-hour upload.
- Server-side buffering. Each part is received into a temporary location before being moved into a bucket (§13.4). 16 GiB is comfortable for typical operator hardware without requiring large dedicated upload buffers.
- Tape efficiency. An LTO-9 cartridge holds 18 TB native, an LTO-10 cartridge 30 TB. At 16 GiB per part, more than a thousand parts fit on one tape, leaving ample room for the bucket structure (§13.4) and for batched writes.
- Round-number practicality. 16 GiB is a power-of-two byte count, which sits cleanly in filesystem allocation, in network transfers, and in part counts: a 1 TiB package is 64 parts, a 30 TB archive is roughly 1900 parts.
The boundary is configurable per protocol version. A future version may raise it as residential bandwidth and operator hardware improve, or lower it if reliability data argues for finer granularity. It is platform-uniform within a version: operators do not set their own part-size limit.
A package may also be published incrementally, with parts produced over time rather than as a single batch. Long desktop or task recordings, security-camera streams, and live event captures fit this pattern. A streaming-published package shares the same per-part format and per-part operational rules as a batch multi-part package; what differs is that the part count is not known when the first part is written and the seal marker (§5.3) is deferred or omitted. The streaming TOC arrangement (§9.5) supports navigation while the part list is still growing. EFS treats a streaming package as the same logical object as a batch package: a single PackageID, a single package key (§11), parts arriving and replicating under the same lifecycle (§15) and redundancy floors (§16) regardless of whether they arrived in one upload session or over months.
6.3 What a package contains
A package contains:
- A public sealed header (visible to operators without decryption): PackageID, AssetID, role, serial, format version, encryption parameters, ciphertext digest, public ciphertext size, part count, part number, and the issuer's signature over the header.
- An encrypted index and an encrypted framed body, encrypted with a fresh per-package symmetric key. The body holds the file contents as one or more authenticated frames; the index lists those frames and the files they hold.
- The internal manifest (inside the encrypted index): per-file entries giving the file number, plaintext digest, plaintext size, optional original path or name, optional media type, and the frame indices that hold each file's bytes. The manifest is what a recipient uses to verify and extract individual files after decryption.
- A signature block: the publisher's signature over the public header and ciphertext digest, using a key authorised under §8.
Section 9 specifies the on-the-wire and on-disk container format.
6.4 Package roles
Files belonging to an AssetID are grouped into packages by role. The v1 role vocabulary is closed and small:
source. A source package contains the original or master files associated with an asset: RAW image files, project files, master video or audio renders, manuscript files, high-resolution scans, and their sidecars. Source packages are optimised for preservation rather than for browsing; they are typically large and the body usually has few frames since partial retrieval of a master file is rarely meaningful.
preview. A preview package contains derivative material intended for browsing: thumbnails, low-resolution previews, poster frames, contact sheets. Preview packages are typically much smaller than source packages and are optimised for low-cost retrieval.
edition. An edition package is a publication artifact generated by Asset Library's publishing engine from an Edition Spec (Asset Library §9). It contains a browser-ready bundle: structured data, thumbnails, image tiles, media chunks, browser runtime files, payment markers, and signatures sufficient for the bundle to be opened in a Catalog client or unpacked to a static web host.
Container assets. Asset Registry records the parent-child relationships that compose hierarchical works (a music album of tracks, a book of pages, an encyclopedia of volumes). An asset whose role is to group child assets, with no source files of its own, is a container asset. A container asset has no source package. It may have a preview package containing thumbnails and structural overview material, and an edition package representing a published view of its contained assets. EFS treats container assets identically to other assets; they simply happen to lack a source role.
The role registry is closed and protocol-versioned. Operators may not invent operator-local roles. Additional roles may be introduced in future protocol versions when concrete need is established, so that clients and indexers can validate and route packages on role without operator-specific knowledge.
7. Generations and Currentness
Each PackageID carries a serial that orders generations of the same role under an AssetID. Serials are monotonically increasing and never reused. Once published, a PackageID is immutable; revisions create a new serial.
By default the latest published serial of a role is the current one for that role. A publisher who wishes to override this default, for example to keep an older generation current while staging a successor, or to roll back to an earlier generation after a bad upload, does so by submitting a signed set_current record naming the role and the chosen serial. Absent such a record, the latest serial wins.
EFS is the source of truth for what generations exist and which generation is current. Asset Library queries EFS to resolve current generations when constructing public views. Version-agnostic locators of the form {assetID}/preview resolve through EFS to the current preview generation for that asset.
EFS does not interpret what a generation contains. It records existence, ordering, and currentness; the meaning of a particular generation belongs to Asset Library.
A streaming-published generation (§6.2, §9.5) accumulates parts over time under a single PackageID and serial. Each new part adds to the same generation rather than creating a new serial; serials still order generations, not part appends. The publisher may close the generation at any time by submitting the signed package-level marker described in §5.3, which fixes the part count and rejects further parts. EFS records and serves the marker alongside the part list.
8. Authorisation for Writes
EFS accepts a write against an AssetID only when the writing key is recognised by Asset Registry as authorised to act on that AssetID.
8.1 Authorised writer
The authorised writer is the keypair that Asset Registry recognises as currently holding the right to file actions against the AssetID. By default this is the buyer key bound to the AssetID at purchase. If the asset has subsequently been bound to a Catalog.ID owner, the authorised writer is the owner key resolved through Catalog.ID.
EFS does not maintain this authorisation mapping itself. It poses the question to Registry: is this signing key authorised to write against this AssetID? Registry answers, consulting Catalog.ID where the asset has been bound to a Catalog.ID owner.
8.2 Optimisation: presenting a signed Registry claim
Registry issues a signed registration claim to the client at AssetID registration time. To avoid an EFS-to-Registry round trip on every write, the client may present a copy of this claim alongside the upload. The claim is signed by the Registry operator and names the authorised key. EFS verifies the operator signature and accepts the claim as evidence of the binding for the duration of its declared validity.
The Registry query path remains available as a fallback and as the canonical source of truth when current ownership state is in question. Operators may set local policy on how stale a presented claim may be before a fresh Registry consultation is required.
8.3 Anonymous publication
Because the only requirements are an AssetID and a funded Bitcash wallet, a publisher may operate without a Catalog.ID account. The signing key in this case is the buyer's keypair, and the AssetID purchase receipt held by Registry establishes the binding. Identity-bound publication is available for participants who want it; identity-free publication is available for participants who do not.
8.4 Bitcash and authorisation
Bitcash funds storage and retrieval (§18) but does not authorise writes. A funded wallet without an authorised signing key cannot publish to an AssetID. An authorised signing key without a funded wallet cannot pay for storage. Both are required.
9. The Catalog File Container (CFC)
EFS packages use a single container format: the Catalog File Container (CFC), version 1. CFC is intentionally minimal. It defines what an EFS package part looks like on the wire and on disk; it does not define application-level packaging conventions.
A CFC v1 part has the following on-disk layout:
[ public sealed header ] fixed size
[ encrypted index ] variable size, declared in header
[ encrypted framed body ] variable size, declared in header
[ signature block ] fixed size
Public sealed header. Visible to operators without decryption. Carries:
- PackageID, AssetID, role, serial, PartNr (§5.3)
- format version
- encryption parameters
index_size: length in bytes of the encrypted index regionbody_size: length in bytes of the encrypted framed body regionindex_profile: flags describing the cumulative TOC window (zero, the full known part count for the part 1 of a batch package, or a fixed sliding-window value for streaming)- canonical ciphertext digest, public ciphertext size
- the issuer's signature over the header
The header is fixed size so that a client can fetch it with a single ranged GET of the first N bytes of the part.
Encrypted index. A single AEAD frame, encrypted with K_index = HKDF(K_pkg, "index"), containing:
- Local frame index. Per-frame entries for the frames in this part: ciphertext offset within the body, ciphertext length, plaintext digest, frame nonce.
- Local file metadata. Per-file entries for files (or file fragments) held in this part: file number, plaintext path or name, plaintext size, plaintext digest, optional media type, and the local frame indices that hold the file's bytes.
- Cumulative part-range TOC, optional, declared in
index_profile. For each part covered by this part's TOC window: part number, frame range, timestamp range (if applicable), and that part's canonical ciphertext digest. Approximately 40 bytes per entry.
Encrypted framed body. A concatenation of AEAD frames. Each frame is one chunk of plaintext content encrypted under K_pkg (§11) with a deterministic per-frame nonce derived from the package key and the frame's local index. Frames are independent: a client that has the package key and one frame's ciphertext can decrypt and verify that frame without reading neighbours. A small archival package may have a single frame containing all plaintext bytes; the format does not require multiple frames, only that frame boundaries exist where partial retrieval is meaningful.
Signature block. The publisher's signature over the public header and ciphertext digest, using a key authorised under §8.
The public header does not expose plaintext file names, file count, file sizes, frame count, or frame sizes. Operators see only what they need to identify, store, and audit the part.
9.1 Multi-part packages
A multi-part package consists of multiple CFC parts. Each part is a complete CFC object with its own public header, ciphertext digest, signature, and replication state. Parts share the PackageID prefix and are linked by their PartNr in their public headers (§5.3). A single part may be retrieved, audited, and replicated independently of its siblings; each part is decryptable for the frames it carries, without requiring any sibling. The package-level navigation index, mapping files to the parts that hold them, lives in part 1 for batch packages and in cumulative form across parts for streaming packages (§9.5).
9.2 Forward compatibility
The CFC format is versioned. Version 1 specifies the layout above. Future versions may introduce media-stream profiles, alternative cipher suites, or other format-level changes. The format version is declared in the public header of each part. Operators that do not support a given format version refuse storage of packages in that version rather than store them opaquely.
9.3 Edition packages
An edition package is a CFC package like any other; its role is edition and its content is the resolved output of an Asset Library Edition Spec. The publishing engine that resolves the spec is responsible for producing the CFC; EFS accepts and stores it under the same rules as any other package.
9.4 Partial retrieval
A client can retrieve any subset of a part's encrypted bytes without decrypting the whole part. This is what makes CFC suitable for workloads where most content may never be retrieved at all: edition packages with image tiles or media chunks (§6.4), long task or session recordings, security-camera streams. Archival packages where partial retrieval is not meaningful simply produce a single body frame and the same flow degenerates to a whole-part fetch.
Client retrieval flow.
- Issue a ranged GET for the first
header_sizebytes of the part. Parse the header and learnindex_size. - Issue a ranged GET for the encrypted index region. Decrypt using
K_index = HKDF(K_pkg, "index")from the package key obtained from the Key Registry (§12). - From the local file metadata (and the cumulative TOC, if present), identify the frame ranges that hold the wanted content.
- Issue a ranged GET for each needed frame and decrypt with
K_pkg. Verify each plaintext block against the plaintext digest in the index.
Once the encrypted index has been fetched and decrypted for a part, it is cached client-side for the lifetime of the part: the part is immutable (§10), so the index never changes.
Package-level TOC in a batch multi-part package. Part 1 carries the package-level table of contents in its encrypted index: total file count, total ciphertext size, total frame count, and the global file map file_number -> [(part_nr, local_frame_idx), ...]. Parts 2..N carry only their local frame index and local file metadata. A client fetches part 1's header and index once per session, caches the TOC, and from then on issues targeted byte-range retrievals against the parts that actually hold the content of interest. Streaming packages distribute the TOC differently; see §9.5.
Canonical ciphertext digest. Computed over the concatenation of header input, encrypted index, and encrypted body. Replication, audit, and repair operate on the part as a whole; partial retrieval changes nothing about how operators verify a part's bytes.
Subrange retrieval and operator pricing. A partial retrieval is one or more byte-range GETs against a part already resident on the operator's storage. The per-package operational overhead that motivates the §18.3 minimum-billing floor (acceptance, signing, replication, lifecycle bookkeeping) is incurred at part ingestion, not at each subrange retrieval. Subrange retrievals are billed on the per-byte component only, with no minimum.
9.5 Streaming-published packages
A streaming-published package is a package whose parts are produced and ingested incrementally rather than as a complete batch. Examples include long desktop or task recordings, security-camera streams, live event captures, and any workload that produces ciphertext at a steady rate over hours, days, or years.
The packaging differences from a batch multi-part package are minimal:
- Cumulative TOC carriage. Each part's encrypted index carries a cumulative part-range TOC covering the latest W parts ending at and including this one. W is fixed for the package and declared in each part's public header (
index_profile). Bounded streaming may set W to the expected lifetime part count; unbounded streaming sets W to a sliding-window size such as 1000. - Ingestion cadence. Parts arrive over time. Each arriving part follows the same authorisation (§8), redundancy fan-out (§15.2), and ingestion payment (§18.1) rules as any other part. The single ingestion fee per part, the eternal-storage commitment, and the lifecycle treatment are unchanged.
- Deferred or omitted seal marker. A streaming publisher submits the §5.3 seal marker only when ready to close the generation, or never. Until then, the package's part list is open and operators continue to accept further parts under the same PackageID and serial.
Client navigation, cold start.
- Ask EFS for the part list of the PackageID. EFS returns the highest accepted PartNr (and the full list of part numbers if requested).
- Fetch the highest-numbered part's header and encrypted index. The index includes the cumulative TOC for the latest W parts.
- Decrypt the TOC. Use it to map a desired chunk, timecode, or file to a
(part_nr, local_frame_idx)pair. - Fetch that part's header and encrypted index, then the targeted frame range, following the retrieval flow in §9.4.
Reaching parts older than the TOC window. In an unbounded stream where the desired part is older than W parts back from the head, the client either walks back through earlier parts' cumulative TOCs in W-sized hops, or estimates the target part from elapsed time and binary-searches via ranged header GETs. EFS itself remains structurally blind: it serves bytes and counts parts, and does not need to know the frame or timecode boundaries that live inside the encryption.
EFS service surface. EFS adds two small operations to support streaming-published packages:
- List parts. Given a PackageID, return the set of accepted PartNr values and the highest one. Does not require decryption.
- Subscribe to part arrivals. Optional. A consumer that wants to follow a live stream may subscribe to receive notifications as new parts arrive. The notification carries only the PartNr and the part's canonical ciphertext digest; the consumer fetches and decrypts on its own.
Neither operation gives EFS visibility into the encrypted index or the package contents.
10. Canonical Ciphertext
A package part binds to exactly one canonical ciphertext digest. All replicas of a part across the network store the same encrypted byte sequence. Operators may apply local at-rest encryption to their own disks, but this internal protection must not change the public bytes returned on retrieval.
Canonical ciphertext is the foundation for replication, repair, audit, self-hosted mirroring, and verification. A client that retrieves a part from any operator can verify it against the public digest registered at the original ingress and detect corruption, substitution, or partial damage without access to decryption keys.
11. Encryption Model
EFS stores encrypted bytes. Plaintext access requires possession of decryption material; EFS itself never sees plaintext.
Each package is encrypted with a fresh symmetric key, the package key, generated at packaging time and never reused. The package key is unique to the package and immutable for the lifetime of the ciphertext. AES-256-GCM is the v1 symmetric primitive; a 256-bit symmetric key retains 128 bits of effective security against a quantum adversary under Grover's algorithm, so bulk encryption at this layer is independent of any asymmetric primitive whose hardness might later be revised.
Access to a package key is mediated by the Key Registry (§12). The package key itself never appears in plaintext outside the publisher's and authorised recipients' decryption boundaries; what travels through EFS is wrapped key material, addressed either to a scope or to a recipient.
12. Key Registry
EFS maintains a Key Registry that holds the wrapped key material required for package access. Keeping the registry inside EFS keeps the storage service operationally self-contained: a publisher can upload, retain a recoverable copy of decryption material, compose access scopes, and grant access to others without depending on any other Catalog service.
EFS holds no decryption power and no authority to grant access. The Key Registry stores key material only in wrapped form: each record is itself encrypted and can only be opened by its intended recipient, using a private key that the recipient holds and EFS never sees. EFS cannot decrypt any package it stores, cannot read the Key Registry records it propagates, and cannot issue, alter, or substitute a grant on a publisher's behalf. New recipients are admitted only through signed records produced by the publisher (or, in commercial flows, the seller named in an Asset Market agreement, §12.4), using a private signing key that likewise never leaves the issuer's control; existing recipients access content only by unwrapping their own grant with their own private key. EFS's role is operational rather than custodial: it stores ciphertext, propagates the signed wrap records that publishers and sellers produce, verifies their signatures, and refuses access to revoked records. This is zero-knowledge brokerage. An EFS operator that is compromised, coerced, or acting in bad faith can refuse to serve, lose data, or expose the wrapped records it holds, but it cannot read those records, cannot read the content they unlock, and cannot fabricate a grant to a chosen recipient.
Subject to further definition. This section describes the Key Registry at the level of purpose and primitives. Detailed wire formats, signature constructions, propagation semantics, and revocation algorithms belong to a separate Key Registry specification that will accompany implementation. Whether the Key Registry remains a permanent EFS responsibility or migrates to a separate module as the licensing path matures is an open design question (§21).
12.1 Purpose
The Key Registry is the mechanism by which EFS supports scope-based licensing without re-encrypting packages when licensing state changes. A package is sealed once and never re-encrypted. Joining it to a new licensing scope is the addition of one wrap record. Removing it from a scope is the revocation of that record. Granting access to a recipient is the issuance of one record addressed to their public key. Revoking a recipient is the revocation of that record. None of these operations disturb the underlying ciphertext.
12.2 Primitives
Three categories of record sit in the registry:
- Package key wrap records. A package key wrapped under a scope's session key. Records that this package participates in this scope.
- Session key wrap records. One scope's session key wrapped under another's, composing scopes into hierarchies. Records that this child scope is unlocked through this parent scope.
- Recipient grants. A scope's session key, or in some cases a package key directly, wrapped under a specific recipient's public key. Records that this recipient is authorised for this scope.
A recipient who holds a grant on a scope's session key transitively reaches every package whose key is wrapped under that scope. A single license on a collection therefore resolves to a single recipient grant, regardless of how many packages lie beneath it. This is the EFS session-key hierarchy that Asset Library §15.1 refers to when it speaks of scope-based licensing.
12.3 Cryptographic stance
Symmetric wrapping uses AES-256-GCM. Recipient grants use a hybrid Key Encapsulation Mechanism combining a classical primitive (X25519) and a post-quantum primitive (ML-KEM-1024); an adversary must break both to recover the wrapped key. Authenticated records carry a hybrid signature combining Ed25519 and ML-DSA-65; verifiers accept a record only if both signatures verify.
These choices align with the Catalog post-quantum stance defined in Asset Registry §2.2.
12.4 Issuers
Two categories of party issue recipient grants in v1:
- the publisher, for self-grants, free shares with named recipients, group grants, or bulk grants to a subscriber list;
- the seller in an Asset Market agreement, for commercial grants issued at settlement of a sale. The Market agreement names a
wrapped_keydeliverable that resolves to a freshly issued grant in the Key Registry.
Layered encryption involving an outer ingress operator wrapper is not part of v1 (§21).
12.5 Revocation and rotation
An issuer may revoke any of its own records by publishing a signed revocation. A revoked record cannot be used to retrieve its wrapped key through compliant nodes. Revocation affects future retrieval; it does not invalidate past access and does not retract material that a recipient has already decapsulated.
When a scope must be retained but a single recipient must be cut off, the issuer mints a fresh session key, re-issues the downstream wrap chain under it, and re-issues recipient grants under the new session key for retained recipients. No package is re-encrypted. Revocation cost scales with the scope's wrap-record population, not with the number of recipients multiplied by the number of packages.
12.6 Privacy
A recipient grant is addressed to a specific public key. An observer of the registry can see which public keys have been granted access to which scopes, but cannot recover any wrapped key without the corresponding private keys. Public keys may be long-lived identity keys, linkable across grants, or ephemeral per-grant keys, unlinkable at the cost of complicating recovery. The choice is the recipient's.
13. Storage Architecture
This section specifies how an EFS operator organises stored package parts on its own infrastructure: how filesystem volumes are labelled, registered, and migrated; how parts are placed into bucket folders; and how one writable bucket per volume coordinates concurrent writes. The architecture is the substrate that the storage states (§14) and the storage lifecycle (§15) operate over.
13.1 Volumes and storage units
The unit of storage allocation inside an operator is a volume. A volume is an XFS-formatted filesystem on a RAID set, on a single drive, on an LTO tape, or on any other block device the operator manages. XFS is mandatory under the certified-platform commitment introduced in §3 (design choice 6), which also fixes the operating system, the labelling and registry conventions of this section, and the directory layout used by buckets (§13.4) and migrated volumes (§13.3).
The hardware unit underlying a typical Online or Standby volume is a storage unit: a drive enclosure with a controller, presenting one volume to the operating system. A server may host several storage units; large operators host many. Each storage unit is independently addressable, independently powerable, and independently mountable, so the failure of one storage unit does not affect the others. Storage units are sourced from the closed list of approved suppliers and models in the licensed hardware schedule; tape libraries and tape drives are likewise sourced from the schedule. Other operators receiving an operator's hardware on cessation rely both on the on-disk specifics covered in this section and on the model-level identity of the storage units, libraries, and drives, so that no firmware-quirk surprise sits between the receiving operator and a working archive.
A volume is identified by a volume label of the form:
EFS{operatorID}{volumeID}
where operatorID is the operator's identifier as it appears in PackageIDs (§5.2), and volumeID is a four-character lowercase hexadecimal string assigned in monotonically increasing order at the operator. Four hex digits accommodate 65,536 volumes per operator, sufficient for very large fleets at present-generation drive capacities. Examples:
EFSb4np0001
EFSb4np00a3
EFSb4npffff
A volume label is never reused, even after the underlying physical media has been retired or destroyed. A retired label remains in the operator's volume registry as evidence of past existence and migration history.
13.2 The volume registry
The operator maintains a PostgreSQL volume registry with one row per volume. Each row carries at minimum:
- volume label (primary key);
- nominal capacity in bytes;
- current package-part count (starting at zero, incremented when a part is written, never decremented in routine operation);
- writable flag (boolean): true while the volume can accept new writes, false once it has reached its planned fill level or been frozen for migration;
- state: one of the storage states defined in §14;
- mount point or device path when the volume is currently online;
- parent volume label, if the volume has been migrated into another volume (§13.3);
- creation timestamp, last-state-change timestamp, last-audit timestamp.
The volume registry is operator-internal and is not part of the federation's published metadata. The package registry, separately, records which volume(s) hold which parts (§13.8) and can resolve a part to its current physical location through the chain of parent-volume entries.
13.3 Volume migration
Storage technology evolves. The operator's first volumes might be 20 TB RAID arrays; later it will run 60 TB volumes, then larger. To carry a long-horizon archive forward across this evolution, EFS treats migration as a first-class operation rather than as an ad-hoc copy.
A volume can hold either:
- package parts (the leaf case), or
- migrated child volumes, with each child appearing as a subdirectory whose name is the child's volume label, containing the child's complete on-disk structure.
A migrated volume is always one level deep at any given time. When three 20 TB volumes are migrated into a single 60 TB volume, the new volume has three subdirectories named after the three children; the children's volume registry rows now point to the parent. When that 60 TB volume is later migrated into a 100 TB volume alongside others, the chain is flattened: the child volumes' rows are rewritten to point directly at the new parent, and the intermediate volume is retired. The parent-pointer field always names the volume that physically holds the part on disk now, never an intermediate.
Package-part lookups resolve through the parent pointer transparently: the package registry records the volume label that originally received the part; the volume registry tells the lookup which mounted path corresponds to that label, whether through a direct mount or through a parent volume's subdirectory.
13.4 Buckets
Inside a volume, package parts live in bucket folders:
{volume root}/{YYYY-MM-DD}/{bucket-seq}/
Where YYYY-MM-DD is the calendar date the bucket was created (operator local time), and bucket-seq is a five-digit zero-padded sequence number that increases within the date. Examples:
/mnt/EFSb4np00a3/2026-05-06/00001/
/mnt/EFSb4np00a3/2026-05-06/00002/
/mnt/EFSb4np00a3/2026-05-07/00001/
A bucket has:
- a maximum size, set by operator policy. The v1 default is 1 TiB. A bucket fills up when the next part to be written would push it over the limit.
- a writable flag: at most one bucket per volume is writable at any moment.
When the writable bucket on a volume is closed (because adding the next part would exceed its size limit, or because the operator schedules a flush), it is set read-only at the filesystem level and a new bucket is created with the next sequence number.
A server with several storage units (§13.1) has several volumes, and therefore several writable buckets in parallel, one per volume. The bucket folders on different volumes may carry the same date-prefixed name without conflict, because each folder lives under its own volume root. A given calendar date can therefore produce, for example, /mnt/EFSb4np00a3/2026-05-06/00001/ and /mnt/EFSb4np00b1/2026-05-06/00001/ simultaneously, each receiving parts independently of the other under its own coordinator.
Bucket boundaries are organisational, not structural. They group parts on disk into a chronological browsing order, limit write concurrency to a single coordinator per volume, eliminate filesystem-level contention, and make the read-only/writable distinction explicit at the directory level so that operator scripts and audits can reason about it without consulting the volume registry. They do not dictate the boundaries of tape writes: tape writes (§13.9) are organised into tar-chunk batches that may aggregate parts from multiple buckets across multiple storage units.
13.5 INCOMING and FAST: volume-agnostic stores
Two volumes inside an operator are special: they carry no permanent data, do not participate in the labelled-volume scheme of §13.1, and serve transient or cache roles. Both are SSD-backed.
INCOMING. The INCOMING volume is the entry and exit point for byte traffic. Every part that the operator receives over the network lands in INCOMING first, where it is verified (signature, ciphertext digest, format) before any labelled volume ever sees it. Every part that the operator restores from tape is extracted into INCOMING (§13.9), from where it is treated as a fresh intake into a new bucket on a labelled volume. INCOMING also holds the transient state of parts that have been copied to one labelled volume but not yet to a second: a part stays catalogued in INCOMING until two-medium redundancy across distinct storage units has been confirmed, at which point its INCOMING copy is released. INCOMING therefore tracks two populations:
- parts in the process of being uploaded by clients or restored from tape;
- parts uploaded and copied to one labelled volume but not yet to a second.
INCOMING is operator-internal and volume-agnostic: clients and the federation never reference an INCOMING path, and the package registry does not bind a part's permanent location to INCOMING.
FAST. The FAST volume is an SSD-backed cache for frequently retrieved packages. The operator's caching heuristic copies a package into FAST when access frequency justifies it, and reclaims FAST entries when they cool. The path schema inside FAST is operator-local; the package registry row for a cached package carries the FAST path so that the retrieval pipeline can locate the cached copy without searching FAST itself. FAST is volume-agnostic in the same sense as INCOMING: it is not part of the labelled-volume scheme, and a missing FAST entry is not a repair condition.
Neither INCOMING nor FAST counts toward the redundancy floor (§16). Their presence does not raise the floor, and their absence does not lower it.
13.6 The write coordinator
A write coordinator owns the writable-bucket lock for each labelled volume. Disk-side ingest of a part goes through it. The flow for placing a verified part on disk:
- The part has been received over the network and verified into INCOMING (§13.5). The INCOMING entry remains during the entire flow below.
- The ingestion pipeline asks the coordinator on the first target volume for a target path. The pipeline picks a volume on a different storage unit from any subsequent target it intends to use, so the two disk-side copies live on physically distinct storage units (§13.1).
- The coordinator acquires the volume's writable-bucket lock.
- The coordinator inspects the volume registry and the writable bucket: if writing this part would push the bucket over its maximum size, it closes the current bucket (sets read-only at the filesystem level), opens a new bucket with the next sequence number, and updates the volume registry.
- The coordinator returns the target path inside the writable bucket.
- The pipeline copies (does not move) the verified part from INCOMING to the target path. The INCOMING source is preserved, so two-medium redundancy is in place from the moment the disk write completes: SSD (INCOMING) plus the labelled disk volume.
- The pipeline increments the volume's package-part count in the registry, and writes the part's volume binding into the package registry.
- The coordinator releases the lock.
- The pipeline now repeats steps 2 through 8 against a second labelled volume on a different storage unit. The second copy may land on an Online volume or a Standby volume, depending on operator policy.
- Once the second labelled-volume copy is registered, the INCOMING entry for the part is released.
If a volume reaches its overall planned fill level (a configured fraction of nominal capacity, leaving headroom for filesystem metadata and for migration target space), the coordinator sets the volume's writable flag to false, declines further part requests for that volume, and routes new ingests to the next writable volume.
13.7 Read-only volumes and hardware enforcement
A volume that has been filled to its planned capacity is set read-only in the volume registry, mounted read-only at the operating-system level, and where the operator's RAID controller supports it, marked read-only at the controller level so that even a misbehaving operating-system command cannot write to it. This is the disk-tier complement of the air-gap discipline applied to vault tapes (§14): once a volume's writable life has ended, the operator's online infrastructure cannot accidentally or maliciously rewrite it.
Read-only does not mean immutable forever: when the volume reaches the end of its operational life, it is migrated into a successor volume (§13.3) and the original is retired. Until then, however, the contents are stable bytes.
13.8 The package registry
Separately from the volume registry, the operator maintains a package registry that records, per package part:
- the part's address (PackageID and PartNr);
- the canonical ciphertext digest;
- the labelled volume(s) currently holding the part, with the storage state (§14) of each;
- the tape barcode and tar-chunk index of any tape copies (§13.9);
- the FAST path, if a copy is currently held in FAST;
- audit history (last challenge, last successful verification);
- the part's lifecycle position (§15).
The package registry is the operator's primary lookup structure for retrieval. It does not duplicate the federation-wide availability claims (§16); those are derived from it.
13.9 Tape volumes and tar-chunk writes
A tape cartridge (LTO) is a volume with the same registry obligations as a disk volume: it has a row in the volume registry, a unique label, a recorded capacity, and a parent-pointer if it has been migrated to a successor cartridge. Tape volumes are physically labelled with LTO barcodes that the tape library reads on every mount; the registry row binds the operator's volume label to the barcode.
Tape boundaries do not follow disk-volume or bucket boundaries. Tape writes are organised into tar chunks: a tar chunk is a TAR archive of fixed nominal size that aggregates parts collected from one or more buckets across one or more storage units. The v1 nominal tar-chunk size is 100 GiB. A tape volume holds many tar chunks written sequentially, so an LTO-9 cartridge (18 TB native) holds roughly 180 tar chunks, an LTO-10 cartridge (30 TB native) holds roughly 300.
When the operator schedules a tape write, the writer assembles the next tar chunk by drawing parts from sealed (read-only) buckets on disk, packs them into the TAR archive, and streams the archive to the next sequential position on the target tape. The chunk header records the parts it contains, with their PackageIDs, PartNrs, sizes, and ciphertext digests. The package registry is updated, per part, with the tape barcode and the tar-chunk index of its new tape copy.
A tape is identified by its barcode for physical purposes (mount, eject, audit, migration) and by its volume-registry label for logical purposes (lookup, parent-pointer chains). The two are bound by the registry row.
Retrieval from tape works on whole tar chunks rather than on individual parts. To retrieve a part the library mounts the tape, the head seeks to the chunk index, the entire 100 GiB chunk is read in one streaming operation, and the chunk is extracted to INCOMING (§13.5) as if it were a fresh intake. From there the requested part flows into the regular intake pipeline (§13.6), landing in a fresh bucket on a writable disk volume and getting its Online state recorded in the package registry. A retrieval triggered for service to a customer therefore returns the package to higher-state availability automatically, restarting the idle clock (§15.3).
A tar chunk normally contains parts other than the one a given retrieval requested, since 100 GiB holds many parts. The handling of those incidental parts is operator-local: typically they are kept available in INCOMING for a short coalescing window (a default 24 hours) so that further requests against parts in the same chunk are served from INCOMING without a second tape mount, and are then released. Section 18 describes how this affects retrieval pricing and the protocol open question on incidental promotion is recorded in §21.
Tape volumes used for Nearline and Vault copies are treated as if they were WORM (Write Once Read Many) media, regardless of whether the underlying cartridge is mechanically WORM-only. Genuine WORM cartridges exist (LTO supports a WORM variant) but are substantially more expensive and less universally available than standard cartridges, so the protocol does not require them. The operational discipline is the equivalent of WORM in any case: a Nearline or Vault tape accumulates tar chunks by sequential append during its writing phase, but once finalised (its fill threshold reached or its maximum age elapsed, §15.2), it is sealed: no further chunks are appended, no chunk is ever edited or rewritten, no part is selectively deleted, the cartridge is never wiped, and the cartridge is never returned to the writable pool. The only path to retiring its contents is the fifteen-year migration cycle (§16.3) under which the tape's tar chunks are read out, repacked, and rewritten to a fresh-generation cartridge; the source cartridge is then physically destroyed rather than wiped. Tape volumes go through Nearline and Vault states under §14.
At the moment a Nearline or Vault tape is finalised, the operator additionally slides the cartridge's hardware write-protect tab to the ON position before the cartridge is moved to its long-term location (back into the library shelf for Nearline, or out to the off-site vault for Vault). The hardware tab is a mechanical safeguard on the cartridge itself, independent of the operator's software, the drive firmware, and the library's control plane: a drive that detects the tab in the protected position will refuse to write, regardless of what any higher layer instructs. This is the physical companion to the operational WORM discipline. Combined with the air-gap of Vault cold storage and the destruction-not-wipe rule at migration, it removes every routine and accidental path by which a finalised tape could lose data.
The single exception to this discipline is the daily-backup tape rotation under §15.6, which is explicitly reusable: daily-backup tapes accumulate chunks incrementally, are ejected weekly, are retained off-site for fourteen months, and are then returned to the operator, wiped, and rotated back into the pool. The daily-backup rotation is operator infrastructure, not a per-package preservation copy, and is the only context in which an EFS tape is wiped and reused.
14. Storage States
A copy of a package part lives in one of five storage states, distinguished by the medium, by whether the medium is powered, and by whether the medium is reachable from the operator's online control plane. Storage states are states, not tiers: a single part normally has multiple copies, each in its own state, simultaneously. The operator's user interface presents a part's storage profile as a row of state checkboxes, one box per copy, so that a publisher can see at a glance that, say, two copies are Online, two are Nearline, and two are Vault.
14.1 The five states
Online. The copy is on a powered, mounted disk or SSD volume reachable from the operator's online control plane. Retrieval is served immediately, in milliseconds for SSD, in tens of milliseconds to seconds for spinning disk. Online is the routine serving state.
Standby. The copy is on a disk volume whose spindle or device has been powered down, but the volume remains catalogued by the operator's online control plane. On a retrieval request the volume is automatically powered up; once spun up (typically tens of seconds for a single drive, around a minute for a RAID array), every part on it can be served at Online speed. Standby reduces idle energy without surrendering reachability or retrievability. Standby is the energy-aware sibling of Online and is the state into which Online copies migrate when their corresponding packages have not been accessed for a long time.
Controlled Offline. The copy is on a disk volume that has been removed from the operator's online control plane. The drive may be physically disconnected, parked in a separate rack, or held in a controlled-access cabinet; the common property is that the operator's online software cannot reach it. A retrieval from Controlled Offline requires a documented operator action (mount the drive into a recovery host, run the verification script, copy the part out). Controlled Offline serves two functions: as a hot-spare replacement for an Online or Standby volume that has failed or degraded (allowing the operator to substitute a healthy disk-resident copy without waiting on tape rehydration), and as a last-resort defence against compromise of the online control plane.
Nearline. The copy is on an LTO tape cartridge held inside the operator's robotic tape library. Retrieval requires the library to mount the tape and read the bucket containing the part (typically minutes per mount). Nearline is the operator's online tape backstop: it is reachable through automation, sits in a controlled environment, and is the source from which Online copies are repaired when a disk volume fails or fails an audit.
Vault. The copy is on an LTO tape cartridge that has been ejected from the operator's robotic tape library and physically moved off-site or into a data-safe vault. Vault is air-gapped: the cartridge is physically unreachable from the operator's network, and any retrieval requires a documented manual workflow (operator collects the cartridge from the vault, mounts it in a recovery host, reads the bucket, copies the part out, returns the cartridge to the vault). Vault is the unconditional preservation state; sections 15 and 16 discuss what this means for the durability commitment.
The five states span a continuous range from "powered, mounted, instantly served" to "ejected, off-site, manual recovery". Energy consumption falls roughly monotonically along this range; expected retrieval latency rises along it; reachability from the online control plane disappears at Controlled Offline.
14.2 Many copies, many states
A package part normally exists in several states simultaneously, one row in the package registry per state. The default initial profile after the ingestion fan-out (§15.2) is:
- 1 copy on a labelled disk volume in Online state (powered, mounted, serving immediately).
- 1 copy on a labelled disk volume on a different storage unit, in Online or Standby state per operator policy (the second storage unit may be in the same facility as the first or in a different facility).
- 1 copy in Nearline state (one tape cartridge inside the operator's robotic library). A second Nearline cartridge is reserved for a future hardware-certification level and is not part of the v1 floor.
- 2 copies in Vault state (two tape cartridges, in two separate off-site vaults).
That is five copies across two media types. The choice between Online and Standby for the second disk copy is a state question, not a copy-count question: the copy is always present. Operator pressure responses (§15.4) move copies between states without dropping below the floor commitment of §16.
The Vault tapes are written into incrementally as new tar chunks accumulate, and finalised (ejected from the library and moved off-site) when the cartridge fills or after a maximum age from first-write of twelve months, whichever comes first. During the accumulation window before a Vault tape is finalised, the daily backup rotation (§15.6) provides the offsite coverage; the daily-backup tape is not counted toward the redundancy floor because its retention is bounded and its purpose is transitional.
14.3 What state transitions are routine, what are bespoke
Some state transitions are part of normal operation:
- Online to Standby and Standby to Online: driven by access frequency and operator energy policy. Standby-to-Online is automatic on retrieval. Online-to-Standby for a specific package happens at storage-unit migration time (§15.3); operator-wide Standby moves under pressure response are also possible (§15.4).
- Disk to tape-only release: a package whose idle time has crossed the ten-year threshold (§15.3) is no longer copied to a disk volume on the next storage-unit migration, leaving it on tape only.
- Nearline to Online (rehydration): triggered by retrieval of a tape-only-floor package (§15.3). The chunk containing the part is read into INCOMING and the part flows back to disk through the regular intake pipeline.
Other transitions are bespoke:
- Vault retrieval: requires the operator to physically retrieve a cartridge from the vault. Section 16.6 describes this flow.
- Controlled Offline restoration: requires the operator to mount the disconnected drive into a recovery host. This is used either as repair of a failed Online or Standby copy, or as a controlled disaster-recovery event.
14.4 The vault is the unconditional state
Of the five states, only Vault is structurally defended against the failure modes that can take down the operator's online infrastructure as a whole. Online, Standby, Nearline, and to a lesser extent Controlled Offline all sit close enough to the operator's control plane that a single sufficiently bad event (ransomware that crosses replication boundaries, a credential compromise paired with a malicious automation, a fat-fingered bulk command, a regional energy or hardware crisis) can in principle reach them. A cartridge that has been ejected from the library and placed off-site cannot be reached by any of these. Section 16 develops the durability argument that rests on this property.
15. Storage Lifecycle
This section describes how a package part moves through the storage states (§14) over its preservation life. The lifecycle has four parts: ingestion fan-out, the steady state, idle-driven release, and operator pressure response.
15.1 Lifecycle terminology
Three terms denote points in the lifecycle without renaming the storage states:
- Full redundancy floor: the v1 five-copy state that follows ingestion fan-out. Two disk-side copies on different storage units (the first Online, the second Online or Standby per operator policy), one Nearline, two Vault. Whether the second disk copy is in Online or Standby state is a question of state, not of copy count: the copy is present in either case.
- Tape-only floor: the three-copy state that remains after the disk-side copies have been released at a storage-unit migration following the ten-year idle threshold (§15.3). One Nearline plus two Vault.
- Vault-only floor: the two-copy state that remains when the operator has released the Nearline copy as well, under sustained pressure (§15.4). Two Vault.
These terms describe redundancy floors, not service tiers. The operator's user interface continues to present the storage profile as state checkboxes, not as a tier label.
15.2 Ingestion fan-out
When an EFS operator accepts a package, each part follows a deterministic fan-out across the storage states. The fan-out is firm: once complete, the five-copy full redundancy floor is held for at least the next storage-unit migration cycle (§15.3). Tape writes use the tar-chunk batch model (§13.9): parts accumulate from buckets across multiple storage units into a 100 GiB tar chunk, and the tar chunk is written to tape as a single sequential stream. Effective fan-out times are often much shorter than the bounds below.
| Time after acceptance (maximum) | Action |
|---|---|
| T+0 | Part received and verified into INCOMING (SSD). |
| Within 1 hour | Part copied from INCOMING to a writable bucket on the first labelled disk volume (first disk copy, Online). The INCOMING source is preserved, so the part is on two media (SSD and disk) from this point. |
| Within 4 hours | Part copied to a writable bucket on a second labelled disk volume on a different storage unit. Second disk copy is held Online or in Standby per operator policy. INCOMING entry is released. |
| Within 24 hours | Part written to the operator's current daily-backup tape inside the library (§15.6). |
| Within 7 days | Tar chunk containing the part is written to a Nearline tape inside the operator's robotic library. |
| Within 8 days | The daily-backup tape covering the part is ejected from the library and moved off-site under the weekly rotation (§15.6). The part is now on at least one off-site tape. |
| Within 30 days, or up to 12 months in low-volume operation | Tar chunks containing the part are written to two Vault tapes. Each Vault tape is finalised (ejected from the library and physically moved off-site) when it reaches its fill threshold or after a maximum of twelve months from first write, whichever comes first. The 30-day target applies to high-volume operation where Vault tapes fill quickly; low-volume operators may take up to twelve months to finalise a Vault tape pair, with the daily-backup rotation providing offsite coverage in the meantime (§15.6). |
The bucket size, the tar-chunk size, the Vault fill threshold, and the daily-backup tape rotation specifics are operator-local within the bounds set out at the protocol level (§21).
Once both Vault tapes are finalised, the part is fully archived in the steady state: five copies, three states, four physical locations.
15.3 Idle thresholds and storage-unit migration
The lifecycle is governed by two fixed idle thresholds, applied at storage-unit migration time. The thresholds are protocol-defined and are not configurable per package or per operator.
- Five years idle without customer-initiated retrieval makes a package eligible to have its disk-side copies moved to Standby state at the next storage-unit migration. Up to five years, the first disk copy is required to be in Online state (immediately serving); the second disk copy may already be in Standby per operator policy under §14.2.
- Ten years idle without customer-initiated retrieval makes a package eligible to be released from disk entirely at the next storage-unit migration. Its disk-side copies are not carried over to the new storage unit; the package drops to the three-copy tape-only floor (one Nearline plus two Vault).
A customer-initiated retrieval of a package resets the idle clock from the time of retrieval, regardless of which state served the retrieval. Operator-internal activities (audits, fixity checks, repair, tape migrations, RAID rebuilds) do not count as customer access and do not reset the timer.
Append-only volumes and migration-time release
EFS labelled volumes are append-only. A part written to a volume is not selectively deleted; the part stays on its volume until the entire volume is migrated and retired (§13.3). Idle thresholds therefore do not trigger an immediate state change for a specific package. They define eligibility, not action; the action happens at the next storage-unit migration.
A storage-unit migration is the operational event in which the operator moves the contents of an old storage unit onto a new storage unit. Migrations are triggered by capacity expansion (a new generation of larger drives lets the operator consolidate several old units into a new one) or by enclosure end-of-life. Storage-unit enclosures are typically good for around twenty years (with continuous drive replacement on failure), but operators commonly run a faster capacity-driven migration cycle of every five to ten years as new disk generations arrive. The exact cadence is operator-local and depends on hardware availability and economics.
At each storage-unit migration the operator reapplies the idle thresholds:
- packages with less than five years idle keep both disk copies in their current Online states on the new storage unit;
- packages with five to ten years idle have their disk copies placed in Standby on the new storage unit;
- packages with more than ten years idle are not carried over to the new storage unit; they drop to the tape-only floor and the FAST cache, if it held them, is reclaimed.
Because the actual transition happens at migration time rather than at the exact idle anniversary, a package's effective Online retention depends on the operator's migration cadence. A package eligible for Standby at year five may remain Online until year seven if that is when the next migration runs; a package eligible for tape-only release at year ten may remain on Standby disk until year twelve. Operator pressure response (§15.4) can shorten this in either direction.
A subsequent customer retrieval of a tape-only-floor package rehydrates it. The operator reads the chunk containing the part from a Nearline tape into INCOMING, the part flows through the regular intake pipeline into a fresh bucket on a writable disk volume (§13.6), it acquires Online state again, and the idle clock resets. The retrieval price reflects the chunk-read cost (§18.3).
15.4 Operator pressure response
Sustained external pressure can threaten an operator's ability to keep its full online infrastructure running: order-of-magnitude electricity-price spikes, prolonged grid instability, regional fuel or hardware disruption, an episode that materially affects staffing. The operator has a graduated set of responses, none of which compromises the unconditional Vault floor.
Energy moves: Online to Standby. The operator may move Online copies into Standby. Idle energy drops sharply; reachability from the online control plane is preserved; retrieval picks up a one-minute spin-up latency. This is the lightest response and is largely invisible to publishers.
Lifecycle compression. Under more sustained pressure the operator may transition idle packages out of the disk states ahead of the five-year and ten-year thresholds, scheduling an out-of-band storage-unit migration that applies tighter eligibility cuts (for example three years to Standby, six years to tape-only release). The protocol specifies how much compression an operator may apply and how it must announce it (§21).
Scheduled online availability. Under heavier pressure the operator may power its online tier (disk volumes and tape-library frame) only during declared windows. Outside the windows, retrieval requests queue and are served when the next window opens. The redundancy floor is not changed by scheduled availability; what changes is when retrievals can be served.
Vault-only contraction. Under the most severe pressure the operator may release the Nearline tape copies as well, leaving only the two Vault tapes. The Vault tapes continue under the fifteen-year migration cycle (§16.3). Retrieval becomes a §16.6 manual operation. The operator's service contract identifies under what conditions this contraction may occur, how publishers are notified, and how Vault retrievals are priced.
The four responses compose. The operator chooses the rung that matches the severity of the pressure event and the duration of its expected persistence, and steps back down the ladder when the pressure eases. Movement back up is also operator discretion.
15.5 What pressure response can affect, and what it cannot
Pressure response can compress the five-year and ten-year idle thresholds, schedule out-of-band storage-unit migrations to apply those tighter thresholds sooner, narrow or close the windows in which online retrieval is offered, contract packages to a smaller set of states ahead of the lifecycle schedule, and in the extreme suspend online retrieval entirely while a pressure event runs its course.
What pressure response cannot affect is the Vault pair: the two air-gapped vault tape copies and their fifteen-year migration cycle (§16.3) remain in force regardless. The eternal-preservation commitment rests on the Vault pair and is therefore unconditional; everything above the Vault pair, including the idle thresholds, the routine 24x7 online retrieval path, and the published convenience commitments, is a normal-conditions optimisation that the operator may scale back to keep the floor intact.
15.6 Daily backup tape rotation
The ingestion fan-out (§15.2) finalises a Vault tape pair when the cartridges fill or after up to twelve months from first write, whichever comes first. During that window, the part lives on disk and inside the operator's tape library, but its dedicated off-site Vault copies do not yet exist. The operator runs a separate daily backup tape rotation that closes this window without forcing wasteful tape consumption at low ingestion volumes.
The daily backup rotation is incremental, not per-day-per-tape:
- Every day, the chunks containing that day's new ingestions are appended to the operator's current daily-backup tape inside the library. The same tape continues to receive new chunks day after day until it fills.
- When a daily-backup tape fills, a new one starts. Tapes are rotated through the library's on-site pool.
- Each week, on a fixed schedule, the daily-backup tapes that have accumulated since the previous weekly eject are removed from the library and physically moved to an off-site location. A part is therefore written to a daily-backup tape within 24 hours of ingestion, and that daily-backup tape is off-site within at most a further seven days, so within at most eight days from ingestion every part is covered by an off-site daily-backup tape.
- Daily-backup tapes are retained off-site for at least fourteen months, which covers the twelve-month Vault-finalisation maximum (§15.2) plus a two-month safety margin. After fourteen months, daily-backup tapes are returned to the operator, wiped, and rotated back into the pool. By the time a daily-backup tape is wiped, the dedicated Vault tapes for every part it carried have long been finalised and their off-site placement confirmed, so the wiping does not reduce any package's redundancy floor. Daily-backup tapes are the only tapes in the operator's fleet that are wiped and reused; Nearline and Vault tapes are treated as WORM media (§13.9) and are physically destroyed at the end of their fifteen-year migration cycle rather than wiped.
The daily backup rotation is operator infrastructure, not a per-package commitment, and does not appear in availability claims at the part level. Its purpose is to ensure that even if the operator's on-site infrastructure is comprehensively compromised before a part's dedicated Vault tapes have been finalised, at least one off-site tape copy of that part exists at the operator's vault. Section 16.7 records this as part of the operator's disaster-recovery posture.
16. Permanence and Redundancy
Every part accepted by an operator is preserved indefinitely. The redundancy floor depends on the package's lifecycle position (§15) and on whether the operator is responding to sustained pressure, but the Vault pair is always present.
16.1 The three floors
The full redundancy floor consists of two disk-side copies on different storage units (the first Online, the second Online or Standby per operator policy; both copies may be in the same facility), one Nearline copy on a tape inside the operator's robotic library, and two Vault copies on separate tapes in two separate off-site vaults. Five copies, three states, two media types. The two disk copies on different storage units protect against the failure of any single storage unit; the off-site Vault pair protects against any single-facility loss and against compromise of the operator's online infrastructure (§16.2). This floor maps to NDSA Level 2 on the National Digital Stewardship Alliance's preservation scale: at least three complete copies, at least two storage media, copies in geographically distinct locations.
EFS does not promise that the two disk-side copies live in separate facilities. Multi-facility redundancy at the disk tier is not part of the within-operator floor; a publisher who wants disk-side copies in separate facilities should place the package at two operators (§16.4), where the disk copies of the second operator are by construction in a different facility, under different staffing, and under a different operating jurisdiction.
The tape-only floor consists of the three LTO tape copies (one Nearline, two Vault) with the disk copies released. Three copies, three locations (one in the library, two in vaults), one medium type. This still maps to NDSA Level 2.
The vault-only floor consists of the two Vault copies. Two copies, two locations, one medium type. This maps to NDSA Level 1, the absolute durability minimum, and is reached only under sustained operator pressure (§15.4).
In every floor, the off-site locations of the Vault pair are chosen so that no single fire, flood, facility loss, or online compromise can destroy both Vault copies, and the air-gap defences of the Vault pair (§16.2) remain in force.
16.2 The Vault pair as the unconditional floor
A tape that has been written, ejected from the library, and physically removed from the operator's network is unreachable by:
- operator mistakes: a fat-fingered command, a misconfigured script, a botched migration;
- bad automation: a process that propagates a single bad write across replicas;
- ransomware on the operator's online systems;
- malicious insiders with credentials but no physical access to the off-site location;
- cascading deletion or overwrite through the operator's replication topology;
- compromised credentials or stolen administrative tokens;
- prolonged loss of cheap electricity or of network reachability to the operator's facility.
Online, Standby, Nearline, and Controlled Offline copies, however well managed, share enough of the operator's control plane and energy posture to share its failure modes. A cartridge in a vault does not.
The lifecycle in §15.2 brings a part to its Vault protection in stages. The dedicated Vault tapes for a part are finalised when their cartridges fill or after a maximum of twelve months from first write, whichever comes first; for high-volume operators this is typically within 30 days, for low-volume operators it may take up to a year. During the accumulation window, the part is protected by its Online and Nearline copies inside the operator's infrastructure and by the daily-backup tape rotation (§15.6) off-site. Once both dedicated Vault tapes are off-site, the air-gap defences listed above engage in their full per-package form. If a fan-out is interrupted, finalising the dedicated Vault tapes takes precedence over rotating tapes that are merely older.
16.3 Tape migration
Tape volumes are write-once and selective deletion of a single part from a tape volume is impractical; operators do not undertake it. Tapes are migrated to fresh media every fifteen years. This sits inside the manufacturer-rated 30-year archival life of LTO media. The harder constraint is reader hardware: each LTO generation is manufactured for a finite production window, and the practical read window for cartridges of any given generation is bounded by drive supply rather than by media life. The LTO consortium's read-back specification has historically extended two generations back, giving cartridges of any generation a roughly fifteen-year practical read window between release and effective obsolescence. LTO-10, released in 2023, broke this pattern by reading only LTO-10, which means the read window for current generations is narrower than for earlier ones. As a concrete benchmark, LTO-4 (released 2007) is no longer in manufacture: a 2026 LTO-4 deployment depends entirely on the refurbished-drive market. The 15-year migration cadence is set so that no tape volume risks either medium decay or drive unavailability before being rewritten, against either the historic or the tightening pattern.
A tape migration writes the contents of an old-generation cartridge into a new-generation cartridge through verified read-and-rewrite; the new cartridge gets a fresh volume label, the old cartridge is retired, and the volume registry records the migration. Vault tapes are migrated in place at the vault, or temporarily brought back to a controlled migration host and returned to the vault, never through the operator's online tape library. The package registry's volume bindings are rewritten to point at the new volume.
16.4 Cross-operator redundancy
The redundancy floors above are within-operator commitments. Placing a package at two operators independently doubles every floor: eight to ten copies at the full-redundancy floor, six at the tape-only floor, four at the vault-only floor, across at least four to six locations under at least two organisational and legal regimes. Cross-operator placement is also the only path to disk-side multi-facility redundancy (§16.1) and the only protection against operator failure and against correlated regional pressure, neither of which within-operator redundancy covers. The choice of how many operators, in which jurisdictions, is the publisher's.
16.5 Permanence is not contingent on continuing payment
A part's permanence is a property of its having been accepted, not of any continuing payment. There is no expiry, no renewal, no lapse to deletion. A part enters EFS once and stays.
Mute (§20) removes a part from public service but does not erase its archival copies. Publication is revocable; history is not.
16.6 Retrieval from Vault
Retrieval from Vault is scheduled physical work, submitted through a dedicated batch retrieval API rather than through the routine retrieval protocol. It is the path used for catastrophic recovery (when an operator has lost all higher-state copies) and for routine retrieval of vault-only-floor packages (when the operator has contracted to that floor under sustained pressure, §15.4). The flow is:
- The requestor submits a batch of PackageIDs and PartNrs to the Vault retrieval API. Retrieval is not restricted to the publisher; any party with a funded Bitcash wallet may submit a Vault retrieval batch. (What the requestor receives is the encrypted ciphertext; decryption requires a Key Registry grant from the publisher under §12, which is a separate matter from the storage-side retrieval gated here.)
- The operator looks up the per-tape, per-chunk, and per-byte breakdown for the batch from the package registry (§13.8) and returns a quote against the structured pricing of §18.3, together with an estimated time to fulfilment based on its current vault-collection cadence.
- The requestor pays the quote through Bitcash (§18.6).
- The operator schedules the vault collection. When the cartridges have been collected, mounted, the relevant tar chunks read into INCOMING, and the requested parts verified, the operator notifies the requestor that the parts are available.
- The requestor retrieves the parts through the regular retrieval protocol. The requested parts are returned to higher-state availability as part of what the retrieval fee paid for: they flow through the regular intake pipeline into a fresh bucket on a writable disk volume, acquire Online state, and have their idle clock reset under §15.3. The non-requested parts that were extracted from the same tar chunks as a byproduct (§13.9) are kept available in INCOMING for a short coalescing window (operator-local, default 24 hours) and are then removed. They are not promoted to Online state and the package registry is not updated to reflect them, because their retrieval was not paid for. While they sit in INCOMING, however, a fresh retrieval request for any of them can be served immediately at the Online retrieval rate (0.001 BIT per KiB, §18.3), without the per-tape and per-chunk penalties: the cartridge is back in the vault, but the chunk has already been read, so a follow-up retrieval is data work, not physical work. After the coalescing window expires, a retrieval against a byproduct part requires a fresh Vault batch and pays the full structured price.
EFS marks a vault-only-floor package in availability claims (§19.2) so that clients know the only retrieval path is the batch API. Routine retrieval requests against vault-only-floor packages are rejected with a pointer to the batch API.
16.7 Operator disaster recovery
The operator's disaster-recovery posture combines two infrastructure-level elements that supplement, rather than replace, the per-package redundancy floor.
The first is the daily backup tape rotation (§15.6): every day's new ingestions are appended to the operator's current daily-backup tape, the daily-backup tapes are moved off-site under a weekly eject schedule, and they are retained off-site for at least fourteen months before being wiped and rotated back into the pool. A part is on a daily-backup tape inside the library within 24 hours of ingestion, and on an off-site daily-backup tape within at most a further seven days. This rotation closes the up-to-twelve-month gap during which a part's dedicated Vault tapes are still accumulating chunks inside the library and have not yet been finalised.
The second is system-level operator backup, in which an operator writes its volume registry, package registry, and other operational metadata to additional media for catastrophe recovery. This is operator-internal infrastructure: the metadata is what makes the operator's tapes interpretable, and an operator that lost its registries while keeping its tapes would be unable to resolve customer queries against them.
Neither element appears as availability claims at the per-part level. Both are part of the operator's certified-platform commitment under §13.1.
17. Local Working Copies and Self-Hosted Hubs
EFS has two components. The server component is run by federation operators (§13–§16). The client component is a desktop application that gives a publisher a fully detailed overview of every file it has published and every file it keeps in its personal local repository on disk. The desktop application is the publisher's working surface; the federation is the durability counterpart.
A local working copy is the publisher's own complete encrypted catalog held by the desktop application on their own machine. It is not a network-serving node. It is not announced to the federation. It carries no availability claims. It is simply the publisher's primary copy of their own work, with the federation as the cloud-side preserved counterpart. Local working copies are an expected and supported part of the model.
The desktop application also exposes an opt-in self-hosted hub mode that turns the user's machine into an EFS-speaking node accessible to other clients. The user picks one of three exposure modes:
- LAN-only. The hub answers requests on the local network only, suitable for households, studios, or institutional intranets.
- Public. The user's local nginx, used as proxy, is reachable from the internet through the user's router configuration, and the hub answers any request.
- IP-allowlist. Public reachability gated to a specified set of source IP addresses, configured in nginx and/or the router.
The hub feature is restricted to Catalog.ID members, but only as the identity anchor: clients verify signed inventories and peer rosters against the operator's Catalog.ID public key, while the network address itself is unconstrained. A hub may be reachable at an IP address, a personal or company domain, a Tor hidden service, or any other URL the operator chooses to advertise. There is no fixed Catalog-controlled subdomain, which keeps DNS out of the connection path and avoids any central record of which clients access which hubs. Each hub answers EFS retrieval requests in the same wire format as a federation node, and differs from federation nodes in two ways: it does not maintain the global federation registry (it only lists its own holdings), and it serves transport for free, with no Bitcash settlement on retrieval through a hub.
Free transport does not mean free content. The hub serves only ciphertext, and a requestor still needs a Key Registry grant from the publisher (or the seller in a commercial flow) to decrypt what the hub delivered (§12). Like a federation operator, a hub holds no decryption power and cannot issue grants on a publisher's behalf. The hub layer is a distribution and privacy substrate; licensing and access rights remain with the publisher.
A hub may only serve packages that have been registered with at least one federation operator. This precondition ties every hub-served file to an AssetID, a publisher signature, and a paid ingestion event, anchoring provenance and preventing the hub layer from drifting into an unregulated filesharing network. Files that have never been ingested into the federation cannot be hub-served, even by their author. Subject to that precondition, a hub may serve any encrypted file the operator has acquired, including files for which the operator does not hold a decryption key. Three acquisition modes are supported:
- Self-published. The operator is also the publisher and serves the file from their own local working copy. Federation registration still applies, but no federation fetch is required because the operator already holds the package locally.
- Pre-fetched from the federation. The operator retrieved the file through normal federation channels (paying BIT for that fetch, §18.2) and now caches it for downstream service through the hub.
- Peer relay. The operator obtained the file from another hub.
Discovery. The federation does not maintain a registry of which hubs hold which files; in v1 hub discovery is a client-side concern. The desktop application keeps a local roster of hubs the user has chosen to follow, polls each tracked hub for its inventory, and pings each one for liveness. A hub serves a small set of endpoints, each signed by the operator's Catalog.ID key so the client can verify them end-to-end:
/efs/inventory. The list of PackageID (PartNr) entries the hub holds./efs/peers. The operator's peer roster: other hubs this operator follows, given as URLs in whatever addressable form the peer operator advertises (IP address, domain, hidden service, etc.) alongside their Catalog.ID-key fingerprints. Optional and off by default; a privacy-conscious operator can omit it entirely./efs/ping. Current liveness and status (online, scheduled-offline, maintenance), optionally including current load./efs/info. Hub metadata: operator display name, declared topics, terms of service.
Because each hub may publish its peer roster, hub discovery becomes an organic graph traversal: a user follows one trusted hub, browses that hub's peer roster to find further hubs, and decides which to add to their own roster, mirroring how fediverse clients discover instances. Initial entry is out-of-band, by publishers promoting their hub URL through their own channels (website, Catalog.ID profile, mailing list, asset listings), or seeded by a small curated default list bundled with the desktop application that users can keep, prune, or extend.
The privacy benefit sits at the consumption layer: a request served by a hub is not logged by the federation, and the requestor incurs no Bitcash transaction that would tie the retrieval to a wallet. Privacy at the bootstrap layer depends on how the hub itself acquired the file. For a self-published file, the only federation contact is the original ingestion, and subsequent hub service runs entirely from the local working copy. A pre-fetched file involves one federation fetch, logged against the hub operator's wallet, that amortises across many downstream anonymous retrievals. Peer-relayed files inherit whatever bootstrap path the upstream hub used. The hub layer therefore enables decentralised distribution alongside the federated catalog.org network and gives privacy-sensitive readers a path that does not pass through federation logs or wallet-level traceability.
EFS does not elevate self-hosted hubs as a first-class durability guarantee. A user may operate one for their own resilience, for their community, or as a privacy substrate, but availability across the network as a whole remains the responsibility of the institutional federation. Self-hosting may also be sensitive: advertising oneself as a holder of certain material can carry consequences. EFS therefore treats self-hosted hubs as a permitted and supported but unguaranteed mode, distinct from the federation's funded durability commitments.
18. Bitcash Funding Model
EFS uses two payment events: ingestion and retrieval. Both flow through Bitcash.
18.1 Eternal storage on first payment
Ingestion is a one-time payment that funds eternal preservation of the package. The fee covers verification, the initial fan-out across storage states (§15.2), the five-copy full redundancy floor through the five-year idle threshold (§15.3), the same five-copy floor with both disk copies in Standby through the ten-year idle threshold, the three-copy tape-only floor that follows beyond ten years idle, and the two-copy vault-only floor that the operator may contract to under sustained pressure. Once paid, the operator preserves the package for the life of its infrastructure with no further charge for storage. The storage commitment has no expiry and is not renewable.
This is a deliberate inversion of the typical cloud-storage model. It aligns the publisher's incentive (preserve everything I have ever made, for the long term) with the operator's incentive (encourage retrievals, since they fund operations). The publisher is not paying for ownership of a file in the abstract. Ownership is a Registry and Library concern. The publisher is paying for permanent storage of a defined byte sequence.
18.2 Retrieval
When a package is retrieved, the retrieval fee covers serving the request. Retrieval from Online or Standby is priced uniformly; the spin-up latency from Standby is operationally visible but not separately metered. Retrieval from FAST is priced as Online.
Retrieval from Nearline is priced in three components rather than purely per-byte, because each request consumes scarce library-robot capacity: the robot must fetch the cartridge from its slot, mount it on a drive, and seek to the requested tar chunk (§13.9). A per-tape penalty is charged for each cartridge mounted, a per-chunk penalty for each tar chunk seeked-to and read on each cartridge, and a per-byte component for the bytes delivered. The per-tape and per-chunk penalties put counterpressure on demand for the finite mount-and-seek budget, so requests are shaped to amortise mounts across many parts: parts that share a tape collapse the per-tape penalty, parts that share a chunk on a tape collapse both the per-tape and per-chunk penalty, and the per-byte component scales only with the bytes the requestor takes delivery of. Specific rates are in §18.3.
Retrieval from Vault uses the same three-component structure as Nearline but at higher rates, reflecting the additional physical work of off-site vault collection and return on top of mount and seek (§16.6).
Vault retrieval is requested through a dedicated batch API rather than through the routine retrieval protocol. The requestor submits a list of PackageIDs and PartNrs; the operator returns the per-tape, per-chunk, and per-byte breakdown along with an estimated time to fulfilment based on its current vault-collection schedule. After payment, the operator schedules the vault collection and the requestor either polls the API or receives an asynchronous notification when the parts are available, after which they are retrieved through the regular retrieval protocol.
Coalescing-window byproduct retrieval. Both Nearline and Vault retrievals operate at the granularity of the 100 GiB tar chunk, so a chunk read into INCOMING typically contains parts the requestor did not ask for. Those byproduct parts sit in INCOMING for the operator's coalescing window (default 24 hours, §13.9) and are then released without entering Online state or the package registry. While they are in INCOMING, however, a fresh retrieval request for any of them is served immediately at the Online retrieval rate, with no per-tape or per-chunk penalty. The reasoning is operational: the cartridge has been returned to its library or vault, but the chunk has already been read, so a second retrieval against a chunk-mate is data work and not a fresh mount-and-seek operation. Once the coalescing window expires, a retrieval against a former byproduct part requires a fresh Nearline or Vault operation and pays the full structured price.
An operator running under scheduled online availability (§15.4) serves Online and Nearline retrievals only during its declared windows; outside those windows requests queue. Vault retrieval, being scheduled work, is not subject to those windows but is subject to the operator's vault-collection cadence.
Retrieval is paid through Bitcash from the requestor's wallet at the time of access. Who ultimately bears that cost (the viewer directly, the publisher as a subsidy, or the Asset Market settlement layer through a license) is a Library and Market concern, varying by license model. EFS records the payment event and serves the bytes; the higher-layer attribution of the cost happens above EFS.
18.3 V1 platform pricing
EFS pricing is platform-wide: all operators charge the same rates for the same operations (§18.4). The following table is the v1 schedule.
| Operation | Rate | Approximate per GiB |
|---|---|---|
| Ingestion (eternal storage) | 0.005 BIT per KiB | ~5,243 BIT (~ EUR 1.05) |
| Online, Standby, or FAST retrieval | 0.001 BIT per KiB | ~1,049 BIT (~ EUR 0.21) |
| Nearline retrieval, per-tape penalty | 10,000 BIT per Nearline cartridge mounted | (n/a) |
| Nearline retrieval, per-chunk penalty | 5,000 BIT per tar chunk read | (n/a) |
| Nearline retrieval, per-byte component | 0.001 BIT per KiB | ~1,049 BIT (~ EUR 0.21) |
| Coalescing-window retrieval (chunk already in INCOMING) | 0.001 BIT per KiB | ~1,049 BIT (~ EUR 0.21) |
| Vault retrieval, per-tape penalty | 250,000 BIT per Vault cartridge mounted | (n/a) |
| Vault retrieval, per-chunk penalty | 2,500 BIT per tar chunk read | (n/a) |
| Vault retrieval, per-byte component | 0.0025 BIT per KiB | ~2,621 BIT (~ EUR 0.52) |
The BIT-to-euro reference rate is 1 BIT = EUR 1/5000 = EUR 0.0002 at standard retail. Per-transaction volume discounts on BIT purchases (BIT Issuance and Distribution Addendum §4.3) range from 5% at EUR 25 to 25% at EUR 10,000 and above, so the effective EUR cost of any operation in this schedule may be up to 25% lower for buyers purchasing in bulk.
A minimum billing size of 1 MiB applies to ingestion. Packages smaller than 1 MiB are billed as if they were 1 MiB at the ingestion rate. This covers the per-package operational overhead of acceptance, signing, replication, lifecycle bookkeeping, and tape-volume accounting, which is largely independent of package size and is incurred once per part at ingestion time. No minimum applies to retrievals: every Online, Standby, or FAST retrieval is billed on the per-byte component only, whether the client requests a whole part or a byte-range subrequest (§9.4). Subrange retrieval is not available for parts in Nearline or Vault: those states require a chunk read in any case, and the structured Nearline and Vault rates apply per §18.2.
Both Nearline and Vault rates are composed (§18.2): a batch retrieval that hits multiple parts on the same tape pays the per-tape penalty once for that tape, the per-chunk penalty once per distinct chunk read on that tape, and the per-byte component on the sum of bytes delivered. The data inside a chunk is essentially free once the mount has happened, so a request for one part and a request for twenty parts on the same chunk pay the same per-tape and per-chunk penalties and differ only in the per-byte component. There is no separate "catastrophic restoration" rate: an operator that has lost all higher-state copies and is restoring from Vault pays the same composed Vault rate. The batch-API submission flow (§18.2) is the sole interface for Vault retrieval; ad-hoc per-part Vault requests through the routine retrieval protocol are not supported.
Worked examples. A single 16 GiB part from a single Nearline tape pays 10,000 + 5,000 + 16 x 1,049 = 31,784 BIT (~ EUR 6.36); the same part from Vault pays 250,000 + 2,500 + 16 x 2,621 = 294,436 BIT (~ EUR 58.89). Fifty parts from one chunk on one tape, totalling 80 GiB, pay 10,000 + 5,000 + 80 x 1,049 = 98,920 BIT (~ EUR 19.78) on Nearline or 250,000 + 2,500 + 80 x 2,621 = 462,180 BIT (~ EUR 92.44) on Vault. At institutional scale the schedules amortise heavily: a publisher restoring its entire 20 TiB repository, spread across 40 tapes and roughly 205 tar chunks, pays 40 x 10,000 + 205 x 5,000 + 20,480 x 1,049 = 22,908,520 BIT (~ EUR 4,581.70) via Nearline or 40 x 250,000 + 205 x 2,500 + 20,480 x 2,621 = 64,190,580 BIT (~ EUR 12,838.12) via Vault. Both totals sit well below the original 20 TiB ingestion cost of ~EUR 21,475.33.
The same three scenarios across all four operations, at standard retail:
| Scenario | Ingestion | Online/Standby/FAST | Nearline | Vault |
|---|---|---|---|---|
| 16 GiB single part | EUR 16.78 | EUR 3.36 | EUR 6.36 | EUR 58.89 |
| 80 GiB on one chunk | EUR 83.89 | EUR 16.78 | EUR 19.78 | EUR 92.44 |
| 20 TiB / 40 tapes / 205 chunks | EUR 21,475.33 | EUR 4,296.70 | EUR 4,581.70 | EUR 12,838.12 |
And the same scenarios assuming the buyer purchases just enough BIT for each operation in a single transaction, with the §4.3 volume discount applied (cells without a tier label fall in Standard, no discount):
| Scenario | Ingestion | Online/Standby/FAST | Nearline | Vault |
|---|---|---|---|---|
| 16 GiB single part | EUR 16.78 | EUR 3.36 | EUR 6.36 | EUR 55.95 (Bronze, 5%) |
| 80 GiB on one chunk | EUR 79.70 (Bronze, 5%) | EUR 16.78 | EUR 19.78 | EUR 87.82 (Bronze, 5%) |
| 20 TiB / 40 tapes / 205 chunks | EUR 16,106.50 (Enterprise, 25%) | EUR 3,437.36 (Platinum, 20%) | EUR 3,665.36 (Platinum, 20%) | EUR 9,628.59 (Enterprise, 25%) |
The schedule is calibrated to leave headroom over the operator's underlying cost of running the full redundancy floor and the tape migration cycle, with hosting and access infrastructure excluded. The exact cost basis depends on operator scale, library utilisation, tape generation, and energy posture. Whether the platform-wide rate can hold across operators in materially different cost-structure jurisdictions is an open question (§21).
The prices are uniform across roles. Source, preview, and edition packages cost the same per byte to ingest and to retrieve. Roles differ in expected access patterns, not in pricing structure.
18.4 Platform pricing and operator differentiation
EFS pricing is platform-wide, not operator-set. All operators charge the same single ingestion rate and the same retrieval rates at each state, and they all honour the same lifecycle (§15) and redundancy floors (§16). Publishers choose operators on non-price dimensions: jurisdiction, geographic and political risk profile, capacity, reputation, customer service, and the operator's record on audit and migration.
Uniform pricing prevents a race to the bottom that would undermine the durability promise, makes cross-operator placement (§16.4) a clean redundancy decision rather than a price-shopping exercise, and gives publishers predictable costs across the federation. It also reflects the wider Catalog economics, in which BIT is a standard unit and pricing for protocol services is set at the protocol layer rather than by individual operators.
An operator publishes a service contract that confirms its adoption of the platform pricing schedule, names its jurisdiction and facilities, and states its operator-specific commitments on capacity, support, and notice periods. The contract is the binding instrument between publisher and operator, but the prices it carries are the protocol's, not the operator's.
The publisher does not select a storage state for a specific package; state membership is governed by the lifecycle. Publishers do not pay separately for redundancy; the redundancy floors of §16 are part of what the ingestion fee buys.
18.5 Expected access patterns
Most EFS packages are not expected to be retrieved often. The publisher's primary working copy is local: the EFS client on the publisher's own machine holds a complete encrypted copy of their catalog, and routine work (adding new material, browsing, rendering editions for publication to static websites) runs against the local copy without generating EFS retrievals. Edition packages are produced from the local copy and uploaded to EFS as part of publication; they are typically not fetched back from EFS by the publisher who created them.
EFS retrievals after first ingestion are dominated by:
- restoration of source or derivative material when the publisher's local copy is unavailable, lost, or being rebuilt on a new machine;
- access by licensed Catalog.ID viewers through Asset Library, where retrieval cost is borne by the viewer's license terms;
- audit, repair, or migration.
In practice most parts sit on Online or Standby with infrequent FAST-cache promotions, and the EFS network behaves more like a deep archive with selective online cache than like a content delivery network.
18.6 EFS does not hold funds
EFS does not hold funds. Bitcash holds wallet balances; EFS reads metering data and accepts settlement events from Bitcash to gate ingestion and retrieval. Vault retrieval is settled through Bitcash on the per-tape, per-chunk, and per-byte breakdown returned by the batch API (§18.2), in a single up-front settlement at the time the batch is accepted, rather than as streaming micropayments per part.
18.7 Founder funding of the first operator
The first EFS operator, Stichting Outpapier, is bootstrapped through a single pre-purchase by the founder. The founder pre-purchases 442,000,000 BIT from Stichting Outpapier at the Enterprise volume tier (§4.3 of the BIT Issuance and Distribution Addendum), paying EUR 66,300 against a standard retail price of EUR 88,400. The 25% volume discount is applied as the standard Enterprise rate, not as a bespoke concession. This pre-purchase capitalises Stichting Outpapier and gives the founder a working balance against which the first archive can be ingested.
Of the pre-purchased balance, 172,000,000 BIT is allocated to populating EFS with the founder's personal archive at the v1 ingestion rate. At 0.005 BIT per KiB this funds approximately 32 TiB of eternal storage. The remaining balance is held against future ingestion, retrieval, and key-grant operations.
The 25% discount is the standard Enterprise tier available to any purchaser of EUR 10,000 or more in a single transaction. What distinguishes this purchase is its size and timing, not its rate: a single transaction at operator launch that simultaneously capitalises Stichting Outpapier and funds the founder's initial archive. Subsequent ingestion against any operator, including Stichting Outpapier, is priced at the v1 schedule of §18.3 with the same §4.3 volume tiers available to any buyer.
19. Audit, Repair, and Availability
Long-term preservation requires more than storing bytes once. EFS operators verify continued possession and integrity, and repair from healthy replicas when corruption is detected.
19.1 Audit
At minimum, an operator can be challenged to provide proof of possession of a stored package part, and the response can be verified against the public ciphertext digest. The whitepaper does not specify a particular proof-of-storage construction; a pragmatic challenge-response scheme is sufficient for v1.
A routine retrieval also serves as a client-side integrity audit: the retrieved bytes can be verified against the canonical ciphertext digest immediately and against plaintext digests in the encrypted manifest after decryption. Publishers and licensed viewers therefore do not need to trust operator-side audit alone; their own access patterns generate independent integrity evidence. Partial retrieval (§9.4) lets this audit operate on a sample of frames from a large package without restoring the whole package.
19.2 Availability
Each EFS node publishes signed claims about its own holdings: which package parts it holds at the full redundancy floor, which it has released to the tape-only floor, which it has contracted to the vault-only floor, and which it has withdrawn. A node also publishes its current online-availability posture, including any scheduled-online-hours regime in force under operator-pressure response (§15.4). A node is authoritative only for claims about itself; there is no global consensus.
Other nodes and clients aggregate these claims into local availability indexes. Availability claims are renewable assertions with an expiry time; if a node does not refresh a claim, peers treat it as stale. This avoids permanent accumulation of outdated information when nodes lose parts, lose funding, or leave the network.
The detailed wire format for availability propagation, including snapshot cadence, delta encoding, and subscription mechanics, is specified in a separate availability protocol document. The principles the whitepaper commits to are:
- nodes sign claims about themselves;
- claims expire and must be renewed to remain effective;
- mute state from Asset Registry overrides availability (§20);
- there is no single canonical global state.
19.3 Metadata distribution
The metadata that supports lookup and indexing, including the AssetID-to-PackageID-to-PartNr mapping, is replicated across the federation so that any node can answer queries about which packages exist for a given AssetID. In v1 each node carries metadata for the entire network, including parts it does not itself store. Lighter participation modes that maintain metadata only for assets a node stores are an optimisation deferred to a later iteration.
19.4 Legal export
Availability metadata circulating across the network is not the same as transferring encrypted bytes to another node. A node learning that a part exists is not a node receiving the part. Storage of a part on a node in another jurisdiction occurs only through an explicit storage instruction, typically the publisher making a Bitcash payment to that node. The instruction together with the payment is the contractual act that places the part with that operator.
20. Muting and Publication Control
Asset Registry can mute an AssetID. A mute is publication control, not deletion of history. The AssetID's claims, ownership records, and sealed packages remain in evidence. What changes is that compliant Catalog services stop publishing, advertising, resolving, or serving the asset.
20.1 Propagation
Asset Registry publishes a signed mute feed. EFS nodes subscribe to that feed, either from a Registry node directly or from another EFS node carrying the feed. When a subscription is interrupted, a node pulls the missed entries on reconnection.
A Registry mute event names an AssetID and carries a signed effective-from timestamp. EFS nodes apply the mute on receipt.
20.2 EFS suppressions
When an AssetID is muted, EFS suppresses, for every PackageID and part rooted at that AssetID:
- public listing of its packages and parts;
- public availability claims advertising the package;
- public retrieval;
- issuance of new key grants;
- repair and republication workflows that would publicly advertise the package.
EFS may retain internally:
- the signed registration records;
- the encrypted bytes;
- the audit trail;
- payment and accounting records;
- existing key-grant records;
- the muted-state mapping itself.
20.3 Override semantics
An active mute state overrides any prior availability claim. A client lookup proceeds in this order:
- Determine the AssetID from the query, the PackageID prefix, or Library state.
- Check the current mute state for that AssetID.
- If muted, suppress the public response.
- If not muted, evaluate availability and retrieval normally.
20.4 Container assets
Muting a container asset cascades through the Registry-recorded asset hierarchy to its children. EFS receives mute events for the affected AssetIDs from the Registry feed and suppresses each accordingly. EFS does not itself walk the hierarchy.
20.5 What muting cannot reach
Muting suppresses publication by compliant Catalog services. It cannot retract material that has already been retrieved, decrypted, copied to USB, screenshotted, or republished outside the network. The whitepaper acknowledges this limit explicitly: muting is publication control, not recall.
21. Open Questions
The following questions remain open for later iterations of this whitepaper and for the specifications that will accompany implementation:
- whether the Key Registry remains a permanent EFS responsibility or migrates to a separate Catalog module as the licensing and sales paths around it mature;
- the wire format of CFC v1 and the introduction of chunked, indexed, or stream-oriented format profiles in later versions;
- the detailed specification of the availability protocol, including snapshot cadence, delta encoding, and subscription semantics;
- the exact interface and validity rules for signed Registry claims presented at write time;
- the specification of
set_currentrecords and the rules under which a non-latest serial may be designated current; - the audit construction beyond pragmatic challenge-response, including whether a future profile adopts proof-of-storage cryptography;
- the rules under which an EFS node may transition a stored part through suppressed, quarantined, or purged states;
- the precise relationship between Key Registry records and Asset Market settlement events when a buyer's grant is issued;
- the format of the availability-claim signal raised when an operator has contracted to the vault-only floor and the standard client behaviour when a retrieval is requested against such a claim;
- the criteria, hysteresis, and capacity rules governing the operator's FAST cache heuristic, and whether any of these become protocol-normative rather than purely operator-local;
- the lighter metadata-distribution mode in which a node carries metadata only for assets it stores, as an alternative to v1's full-network replication;
- the long-term economic model of eternal-storage-on-first-payment, including operator obligations on technology migration over decades, the impact of cost evolution on the v1 platform pricing, and what happens when an operator exits the federation;
- the formal specification of the tape migration protocol: the verification procedures applied during each fifteen-year rewrite, the integrity-audit cadence between migrations, and the contingency plan if an LTO generation falls out of manufacturer support sooner than current trends suggest;
- the formal specification of the disk-volume migration protocol: when an operator brings up a new generation of volume capacity, the verification and retirement procedures applied to the migrated children, and how the parent-pointer rewrite is staged so that no part-lookup ever resolves through a partially-migrated chain;
- whether platform-wide pricing can hold across operators in materially different cost-structure jurisdictions, or whether jurisdiction-aware pricing tiers will be needed in a later protocol version;
- the cross-operator redundancy model: whether the protocol should support coordinated replication across multiple operators (so that a publisher seeking redundancy beyond a single operator's redundancy floor does not need to upload independently to each), and how the ingestion fee composes when a single submission lands at multiple operators;
- who ultimately bears the cost of retrieval (viewer, publisher, Asset Market settlement) and how the routing between Bitcash wallets is resolved at the Library and Market layer;
- whether the canonical lifecycle timings (4 hours to second disk copy, 7 days to Nearline, 30-day target with 12-month maximum to Vault) are protocol-normative or operator-suggestive, and the conditions under which an operator may deviate from them;
- the bucket model for tape writes: bucket size, bucket-flush policy, and the early-ejection trigger under which a Vault tape leaves the library before its 14-day or 30-day deadline (a "full tape" threshold, an aging trigger short of the deadline, or a combination), and whether any of these become protocol-normative;
- the precise definition of "customer-initiated retrieval" that resets the idle clock, including how to classify edge cases such as licensed-viewer access through a Library-mediated bundle, key-grant verification reads, and bulk-restoration sequences after local-copy loss;
- the operator's storage-unit migration cadence: whether the protocol sets a normative maximum interval between migrations, or leaves the cadence purely operator-local subject to the requirement that the idle thresholds are eventually applied;
- the Vault-tape finalisation rule: whether the 80% fill threshold and 12-month maximum age are protocol-normative, or operator-local within bounds the protocol sets;
- the daily-backup tape rotation: tape labelling and retention semantics, the exact wipe procedure, and the relationship between daily-backup retention and the Vault-tape maximum age;
- the v1 Nearline retrieval rate and whether it should remain pegged to the ingestion rate or float separately as the cost basis of tape rehydration evolves;
- the Vault batch retrieval API: the exact request and response formats, the validity of an issued quote, the cancellation and refund behaviour if the operator misses the estimated time to fulfilment, and the rules for repeat batches against the same parts;
- the precise definition of the conditions under which an operator may invoke vault-only contraction or scheduled online availability under operator-pressure response: what counts as "sustained energy-cost or supply-disruption pressure", how operators declare and announce it, what notice publishers receive, and whether a coordinated federation-level signal is needed when multiple operators face correlated regional pressure;
- the eligibility rules for vault-only contraction (which packages an operator may transition under pressure, in what order, and whether publishers may opt packages into or out of contraction in advance);
- the wire format for scheduled-online-hours availability announcements, the queuing and expected-service-time semantics for retrievals submitted outside an operator's declared windows, and the refund or cancellation behaviour when a window slips;
- whether a future version supports packages that intentionally span multiple AssetIDs without relying on a container-asset construction;
- whether a future version introduces layered encryption with an outer ingress-operator wrapper and what the operational and licensing consequences would be;
- whether the v1 16 GiB part-size limit holds in the face of evolving residential bandwidth and operator hardware, or should be raised in a later protocol version;
- the bound on the volumeID width (4 hex characters supports 65,536 volumes per operator) and the migration path if a single operator ever approaches that limit;
- the v1 100 GiB tar-chunk size: whether it is the right balance between mount-amortisation and over-fetch on tape retrieval, and whether a future protocol version should adopt a different size or per-tape-generation sizing;
- the handling of the parts that travel out of tape with a requested chunk but were not themselves requested: whether they are kept available in INCOMING for a 24-hour coalescing window only, whether they may be promoted to Online state if the retriever asks for them within that window, whether a chunk-mate batching discount applies, and whether any of these become protocol-normative;
- the FAST cache path schema and the criteria, hysteresis, and capacity rules governing the cache heuristic, and whether any of these become protocol-normative rather than purely operator-local;
- the second Nearline copy reserved for a future hardware-certification level, and the conditions under which the certification will be activated;
- the precise scope of the certified-platform license under which operators run (operating system, filesystem, registry conventions, directory layout, and the closed list of approved hardware suppliers and models for storage units, tape libraries, and tape drives): what is mandatory, what is recommended, the cadence at which the hardware schedule is revised as new generations supersede earlier ones, the procedure by which an operator on a previous generation migrates forward, and how a candidate operator demonstrates compliance before accepting customer ingestions;
- the wire-level metadata exchange under which an operator's hardware and registries can be transferred to another certified operator on cessation of operation, and the verification procedure under which the receiving operator confirms continuity of every package's redundancy floor;
- whether the five-year and ten-year idle thresholds are the right fixed values, or whether a future protocol version should adjust them as disk-cost economics evolve.
22. Design Principles
- EFS is the storage vertical. It preserves and serves encrypted file bytes. It does not describe, claim, identify, or sell.
- AssetID is the namespace, not the key. Identifiers are public names; encryption uses fresh random per-package keys.
- One package, one asset. Each package refers to exactly one AssetID. Hierarchies live in Registry as container assets.
- Packages are the unit of submission and retrieval. Multi-part packages exist when total ciphertext exceeds the platform part-size limit, or when parts are produced incrementally; parts carry a uniform five-digit zero-padded PartNr counter, and the package's part count is asserted via an optional signed seal marker.
- Packages are immutable. Revisions create new generations; old generations remain as evidence.
- EFS records currentness; Library reads it. The latest serial is current by default; explicit overrides are signed.
- One part, one canonical ciphertext. All replicas store the same encrypted bytes.
- Anonymous publication is a first-class case. Catalog.ID is optional; an AssetID and a funded Bitcash wallet suffice.
- Authorisation is rooted at Registry. EFS does not maintain its own authorisation model; it asks Registry.
- Federation by signed local truths, not global consensus. Each node is authoritative only for itself.
- Pay once, stored forever. The ingestion fee funds eternal preservation. Storage is not renewed and does not lapse to deletion.
- Storage states, not tiers. Online, Standby, Controlled Offline, Nearline, and Vault are physical states a copy lives in; a package normally has copies in several of them simultaneously, presented in the publisher's UI as state checkboxes.
- Vault is the unconditional floor. All other states are conditional on the operator's ability to maintain online infrastructure; the air-gapped Vault pair is what carries the eternal-preservation commitment under any pressure short of physical destruction of both vault sites.
- Convenience is best-effort, preservation is unconditional. Online availability is a normal-conditions commitment that may be compressed under sustained pressure. The Vault floor and the fifteen-year migration cycle that maintains it are not.
- Volumes are labelled, registered, and migrated. Every labelled volume has a unique permanent label, a row in the operator's volume registry, and a controlled migration path forward. Buckets inside a volume are write-once on close, with at most one writable bucket per volume at a time. Each storage unit hosts its own writable bucket; multiple storage units produce multiple writable buckets in parallel.
- Operators run a certified platform. XFS, the operating system, the directory layout, the registry conventions, and a closed list of approved hardware suppliers and models for storage units, tape libraries, and tape drives are specified by license, so operator infrastructure is portable between operators on cessation of operations down to controller firmware and library robotics.
- INCOMING and FAST are volume-agnostic. All inbound traffic enters through INCOMING; all outbound tape rehydration also passes through INCOMING. FAST is the operator's frequency-driven cache. Neither counts toward the redundancy floor.
- Tape boundaries do not follow disk boundaries. Tape writes are organised into 100 GiB tar chunks that aggregate parts from multiple buckets across multiple storage units; the tar chunk is the unit of tape work and the unit of Nearline retrieval pricing.
- Self-hosting is permitted, not relied upon. Institutional storage carries the durability guarantee.
- Muting suppresses publication, not history. Mute propagates from Registry; EFS suppresses listing, availability, retrieval, and grants.
- Post-quantum by default. Hybrid classical and post-quantum primitives apply at every authenticated layer.
23. Closing
Encrypted Filestorage is the storage layer of the Catalog ecosystem.
Asset Registry mints AssetIDs and records the durable claims that bind an asset to its ownership and provenance. Asset Library expresses what assets currently mean and how they are organised, described, and licensed. Asset Market brokers offers and records the bilateral agreements through which licenses and decryption grants are exchanged. Bitcash meters and settles the operations that make all of it run. Catalog.ID, optionally, identifies the people behind the keys.
Encrypted Filestorage stores the bytes. Sealed, addressable, replicated, auditable, retrievable, metered.
The preservation problem EFS targets is real and structural: digital data has no durable medium with a hundred-year reader story, no equipment line with a hundred-year support window, and no energy or supply outlook stable enough to underwrite an unconditional 24x7 online promise across the relevant horizon. EFS responds by separating the unconditional preservation commitment (rooted in the air-gapped Vault pair and a fifteen-year tape-migration cycle) from the convenience commitment of online retrieval (held under normal conditions, scaled back through named pressure responses when conditions are not normal), and by combining a one-time ingestion payment with metered retrieval rather than a subscription that nobody can credibly price across a century.
By rooting storage identity in AssetIDs, by aligning write authorisation with Registry, by deferring currentness queries to a thin lookup that Library consumes, by exposing a single small CFC container format under a clear post-quantum encryption model, by stating its storage states and lifecycle plainly, and by documenting the volume and bucket architecture that the long-term migration story will run on, EFS aims to be the simplest of the Catalog verticals: a paid, encrypted, immutable, append-only file store that does its job and stays out of the way of the modules above it.