Helix — The .dna Strand Format¶
Status: Draft v1 · Last updated: 2026-06-18 · Related: TSD §7 · Security · ADR-008
The .dna strand is the portable artifact that makes "take your memory anywhere" real. It is
signed, encrypted, versioned, and mergeable — a memory you can move like a
file and version like code, without a server or a blockchain.
Implementation status (Phase 4, shipped). The codec is built on PyNaCl/libsodium: XChaCha20-Poly1305 encryption, Ed25519 signatures, Argon2id KDF — matching this spec. Current deviations (ADR-032): the Merkle hash is BLAKE2b (stdlib; BLAKE3 is a future upgrade) and the container is a zip (not tar+zstd). Encryption uses 64 KiB XChaCha20-Poly1305 secretstream chunks (truncation-resistant), with a back-compatible
enc_modefield so any legacy single-blob strands still import. Export/import/ verify/merge/diff/push/pull are implemented and tested (round-trip, tamper, truncation, wrong-passphrase).logreads history;rollbackrestores from a prior.dna.
1. Container¶
A .dna file is a single archive (tar, optionally zstd-compressed) containing:
my-brain.dna
├── manifest.json # plaintext metadata + integrity root (signed)
├── manifest.sig # detached Ed25519 signature over manifest.json
└── strand.db.enc # the SQLite strand, encrypted (XChaCha20-Poly1305)
manifest.json is plaintext so a recipient can inspect what a strand is (schema, model,
counts, author key) and verify integrity before decrypting anything.
2. Manifest¶
{
"format": "helix.dna",
"format_version": 1,
"strand_id": "01J...", // stable identity across versions
"version": 7, // monotonically increasing
"created_at": "2026-06-18T...Z",
"created_by": {
"pubkey": "ed25519:base64...", // author's signing public key
"label": "abhay@laptop"
},
"schema_version": 1, // memory-model schema
"embedding": { // pin the embedding space
"provider": "local",
"model": "BAAI/bge-small-en-v1.5",
"dim": 384,
"normalized": true
},
"counts": { "memories": 1243, "edges": 880 },
"encryption": {
"cipher": "xchacha20poly1305",
"kdf": "argon2id",
"kdf_params": { "mem_kib": 65536, "iters": 3, "parallelism": 1, "salt": "base64..." },
"nonce": "base64..."
},
"integrity": {
"merkle_root": "blake3:hex...", // root over per-row hashes (see §5)
"db_sha256": "hex..." // hash of the ciphertext blob
},
"history_head": "blake3:hex...", // head of the op-history chain
"parents": ["blake3:..."] // prior version hash(es); enables diff/merge/rollback
}
The signature (manifest.sig) covers the entire manifest, including the integrity root,
so any tampering with content or metadata is detectable.
3. Encryption (ADR-019)¶
- Cipher: XChaCha20-Poly1305 (AEAD) via libsodium
secretstreamover 64 KiB chunks — authenticated, seekable, and truncation-resistant (theageSTREAM model), not a single monolithic blob. - Wrap-don't-encrypt: a random data key encrypts the payload; that data key is itself wrapped by the user's unlock factor(s). So re-keying never re-encrypts the whole strand and multiple factors can unlock the same file (see §4 / key management).
- Key derivation: Argon2id from the passphrase (
HELIX_PASSPHRASE) — desktop params (start m=64 MiB, t=3, p=1) — or the data key is wrapped by a device-keychain key when no passphrase is set; optional recovery code / Shamir / hardware key wrap the same data key (ADR-020). - Nonces: 192-bit (XChaCha) — random nonces are safe at this width.
- Threat posture: the disk/USB/cloud-drive holding the
.dnais treated as untrusted; the plaintext strand exists only in memory on a trusted device. See Security Model.
4. Signing & verification¶
- Each user has an Ed25519 identity keypair (generated on
helix init, stored in the OS keychain /helix-identity/, never committed, never inside a strand). exportsigns the manifest;importverifies the signature against the embedded pubkey and warns/blocks on mismatch or on an untrusted author (for shared strands).- This gives Walrus-style verifiable integrity without a blockchain or network.
5. Integrity & content addressing¶
- Each memory/edge row hashes to a stable digest (BLAKE3 over canonicalized fields).
- A Merkle tree over those digests yields
merkle_root, recorded and signed. - Two strands can be diffed cheaply by comparing subtrees instead of full contents — the
basis for fast
diff/mergeand for detecting exactly which facts changed.
6. Versioning & history¶
versionincrements on every committed change;parentsrecords prior version hash(es).- An append-only
historytable logs operations (op type, affected ids, before/after hashes, provenance), chained viahistory_head. - This enables:
helix log— the evolution of your memory, git-style.helix diff vA vB— what changed between versions/strands.helix rollback <version>— restore a prior state (e.g., undo a wrong learning).
7. Transfer operations¶
export / clone¶
Snapshot strand.db → compute row/Merkle hashes → write manifest → sign → encrypt → archive.
Atomic (temp file + rename); never overwrites the source strand.
import¶
- Read
manifest.json; checkformat_version/schema_versioncompatibility. - Verify
manifest.sig. Reject on failure (fail closed). - Decrypt
strand.db.enc(passphrase/keychain). - If
embeddingspace differs from the importing install, re-embed content into the local space (tracked operation) rather than mixing vector spaces. - Open as a new local strand (or stage for merge).
merge (A ⊕ B)¶
The hard, valuable operation — and it reuses the same engine as everyday learning. Full spec in
Sync & Merge (ADR-021):
1. Align nodes/edges by content hash + semantic match (content-addressed Prolly/Merkle store
makes this cheap).
2. CRDT convergence (Automerge-style, op-based with history) for concurrent edits, then
git-style 3-way semantic merge at the fact/field level (against the commit-DAG
merge-base) for contradictions — resolved with the bi-temporal model (close valid_to,
never delete). Not last-write-wins.
3. Preserve both provenances; never silently drop a contributor's fact.
4. Enforce the redaction invariant: secrets are never present to merge in the first place.
5. Produce a new version with both parents (reversible via rollback).
Merge is conflict-aware and reversible by construction; "two facts meet" has exactly one code path whether they meet over time (one user) or at once (two users/teammates).
rollback¶
Restore a prior version from history; the rollback is itself a new version (you can undo
an undo). No history is destroyed.
8. Compatibility rules¶
| Situation | Behavior |
|---|---|
Newer format_version than installed |
Refuse with upgrade guidance (fail closed) |
Older format_version |
Migrate forward on import |
Different schema_version |
Run schema migration |
| Different embedding space | Re-embed locally (tracked); never mix dims silently |
| Signature invalid / author untrusted | Block import (or warn for explicitly trusted shares) |
9. Why not a blockchain / decentralized store (for v1)¶
Walrus achieves portability + verifiability via decentralized, verifiable storage. Helix gets the same user-facing guarantees — portable, integrity-verified, owner-controlled — with a signed encrypted file: simpler, free, offline, and zero-infra. A decentralized/verifiable backend remains a pluggable option for users who want it (ADR-010), not a requirement for everyone.