Skip to content

Helix — API Reference (MCP · Daemon · SDK)

Status: Draft v1 · Last updated: 2026-06-18 · Related: TSD · MCP Integration · Decisions

Helix is a local-first, coding-agent-first, portable, $0-by-default AI memory layer. It exposes memory to agents over the Model Context Protocol (MCP), ships a CLI and Python/TS SDKs, and runs as a single local daemon. This document is the canonical reference for the MCP tool surface, the local Daemon REST API, and the SDKs.

Design authority: ADR-023 (MCP architecture — daemon + stdio shim, ~5 tools, token budget) and ADR-024 (security posture). See Decisions.


1. Architecture

Helix runs one local daemon and exposes it to agents through two MCP transports plus a REST surface. The daemon owns the single source of truth: one embedded store, one cache, one consolidation worker — shared safely across all concurrent agents.

            ┌─────────────────────────────────────────────────────────┐
            │                    Agents / Clients                     │
            │  Claude Code   Cursor   Windsurf   VS Code   Gemini CLI │
            └───────┬─────────────┬──────────────────────┬────────────┘
                    │ stdio        │ stdio                 │ Streamable HTTP
                    ▼              ▼                       ▼
            ┌───────────────┐ ┌───────────────┐   (direct, multi-client)
            │ helix-mcp     │ │ helix-mcp     │           │
            │ stdio shim    │ │ stdio shim    │           │
            └──────┬────────┘ └──────┬────────┘           │
                   │ proxy (HTTP)    │ proxy (HTTP)        │
                   ▼                 ▼                     ▼
            ┌─────────────────────────────────────────────────────────┐
            │   helixd  —  local daemon  (127.0.0.1:7878)             │
            │   • Streamable HTTP MCP endpoint  (multi-client)        │
            │   • REST API (/remember /recall /forget /graph ...)     │
            │   • Origin validation (DNS-rebinding defense)           │
            ├─────────────────────────────────────────────────────────┤
            │   ONE store · ONE cache · ONE consolidation worker      │
            │   SQLite/Lance/pgvector/Qdrant  +  embedding cache       │
            └─────────────────────────────────────────────────────────┘

Why one daemon

  • Shared state, no contention. Streamable HTTP can serve multiple clients from one process, so every agent reads/writes the same memory and shares the embedding cache and consolidation results. Running an MCP server per-agent would fork state and re-embed redundantly. (https://modelcontextprotocol.io/specification/2025-06-18/basic/transports)
  • stdio for portability. Many coding agents only speak stdio today (and the spec says clients SHOULD support stdio). Helix ships a thin stdio shim (helix-mcp) that does nothing but proxy framed JSON-RPC to the daemon's HTTP endpoint. The shim holds no state.
  • SSE is gone. The standalone HTTP+SSE transport was deprecated 2025-03-26; Helix implements only stdio + Streamable HTTP.

Transport security (local)

The daemon binds 127.0.0.1 only and validates the Origin header on every Streamable HTTP request to defend against DNS-rebinding attacks (a browser tricked into POSTing to localhost). Requests with an unexpected/absent Origin are rejected. See ADR-024 and Security Model.


2. MCP Tool Surface

Helix exposes a deliberately small tool set. Anthropic's guidance is explicit: too many tools distract the model and bloat context; existing memory servers either ship 4 flat tools (OpenMemory) or 9 graph tools (reference KG server) and none expose a token budget — the gap Helix fills. (https://www.anthropic.com/engineering/writing-tools-for-agents, https://mem0.ai/blog/introducing-openmemory-mcp, https://github.com/modelcontextprotocol/servers/blob/main/src/memory/README.md)

Tool Purpose Mutating Key budget controls
memory.search Semantic + keyword recall No response_format, limit, max_tokens
memory.context Assemble a token-budgeted context pack for a task No response_format, max_tokens
memory.write (alias memory.add) Persist a memory (dedup/supersede, idempotent) Yes idempotency_key
memory.get Fetch one memory by human-readable ID No response_format
memory.forget Delete / tombstone a memory Yes idempotency_key
memory.relate (optional) Create a typed edge between memories Yes idempotency_key

Cross-cutting conventions

  • Human-readable IDs, not UUIDs: mem_2026-06-18_auth-retry-policy_a1b2. Easier for the model to reference, dedupe, and cite. (Anthropic tool-design guidance.)
  • response_format: concise | detailed on every read tool. concise returns the minimum to act on (think 72 tokens); detailed returns provenance, scores, and edges (think 206 tokens). Concise is the default.
  • max_tokens / limit truncate and paginate. Claude Code caps a single tool response at ~25,000 tokens; Helix never emits more than the caller's max_tokens and defaults well under the cap.
  • Errors: business/tool errors (not found, budget exceeded, validation) are returned as a successful JSON-RPC result with isError: true so the model can see and recover from them; only protocol failures (malformed request, transport) become JSON-RPC errors. (https://modelcontextprotocol.io/specification/2025-11-25/server/tools)
  • tools/list_changed: tool names are stable; capability changes are announced via the tools/list_changed notification rather than renaming.

2.1 memory.search

Semantic + lexical recall over the store, re-ranked, gated, and token-budgeted.

Input

{
  "type": "object",
  "required": ["query"],
  "properties": {
    "query":          { "type": "string", "description": "Natural-language or keyword query" },
    "scope":          { "type": "string", "enum": ["project", "global", "session"], "default": "project" },
    "filters":        { "type": "object", "additionalProperties": { "type": "string" } },
    "limit":          { "type": "integer", "minimum": 1, "maximum": 50, "default": 8 },
    "max_tokens":     { "type": "integer", "minimum": 64, "maximum": 25000, "default": 1500 },
    "response_format":{ "type": "string", "enum": ["concise", "detailed"], "default": "concise" }
  }
}

Output

{
  "type": "object",
  "properties": {
    "results": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "id":       { "type": "string", "example": "mem_2026-06-12_db-pool_7c4e" },
          "text":     { "type": "string" },
          "score":    { "type": "number" },
          "source":   { "type": "string", "description": "detailed only" },
          "created":  { "type": "string", "format": "date-time", "description": "detailed only" }
        }
      }
    },
    "truncated":     { "type": "boolean" },
    "tokens_used":   { "type": "integer" },
    "next_cursor":   { "type": "string", "nullable": true }
  }
}

2.2 memory.context

Assembles a ready-to-inject context pack for the current task — the recall results already gated, deduped, ordered, and packed to fit a budget. This is the tool agents call most.

Input

{
  "type": "object",
  "required": ["task"],
  "properties": {
    "task":           { "type": "string", "description": "What the agent is about to do" },
    "scope":          { "type": "string", "enum": ["project", "global", "session"], "default": "project" },
    "max_tokens":     { "type": "integer", "minimum": 128, "maximum": 25000, "default": 4000 },
    "response_format":{ "type": "string", "enum": ["concise", "detailed"], "default": "concise" }
  }
}

Output

{
  "type": "object",
  "properties": {
    "context":      { "type": "string", "description": "Packed memory text, budget-fit" },
    "citations":    { "type": "array", "items": { "type": "string" }, "description": "Memory IDs used" },
    "tokens_used":  { "type": "integer" },
    "dropped":      { "type": "integer", "description": "Memories gated/dropped to fit budget" }
  }
}

2.3 memory.write (alias memory.add)

Persists a memory. Idempotent (via idempotency_key) and dedup/supersede-aware: a write whose content is near-duplicate of an existing memory is merged; a write the extractor judges to supersede a prior memory tombstones the old one and links supersedes →.

Input

{
  "type": "object",
  "required": ["text"],
  "properties": {
    "text":            { "type": "string" },
    "kind":            { "type": "string", "enum": ["fact", "preference", "decision", "snippet", "task"], "default": "fact" },
    "scope":           { "type": "string", "enum": ["project", "global", "session"], "default": "project" },
    "tags":            { "type": "array", "items": { "type": "string" } },
    "idempotency_key": { "type": "string", "description": "Retries with same key are no-ops" },
    "supersede_hint":  { "type": "string", "description": "Optional ID this write replaces" }
  }
}

Output

{
  "type": "object",
  "properties": {
    "id":         { "type": "string", "example": "mem_2026-06-18_auth-retry_a1b2" },
    "status":     { "type": "string", "enum": ["created", "merged", "superseded", "noop"] },
    "supersedes": { "type": "array", "items": { "type": "string" } }
  }
}

Dedup/supersede semantics. created = novel. merged = folded into an existing near-duplicate (returns that ID). superseded = replaced one or more prior memories (returned in supersedes). noop = idempotency key already seen.

2.4 memory.get

Input

{
  "type": "object",
  "required": ["id"],
  "properties": {
    "id":             { "type": "string" },
    "response_format":{ "type": "string", "enum": ["concise", "detailed"], "default": "detailed" }
  }
}

Output (detailed)

{
  "type": "object",
  "properties": {
    "id":        { "type": "string" },
    "text":      { "type": "string" },
    "kind":      { "type": "string" },
    "scope":     { "type": "string" },
    "tags":      { "type": "array", "items": { "type": "string" } },
    "edges":     { "type": "array", "items": { "type": "object",
                    "properties": { "rel": {"type":"string"}, "to": {"type":"string"} } } },
    "created":   { "type": "string", "format": "date-time" },
    "updated":   { "type": "string", "format": "date-time" }
  }
}

A miss returns a successful result with isError: true and a message, not a JSON-RPC error.

2.5 memory.forget

Input

{
  "type": "object",
  "required": ["id"],
  "properties": {
    "id":              { "type": "string" },
    "mode":            { "type": "string", "enum": ["tombstone", "hard"], "default": "tombstone" },
    "idempotency_key": { "type": "string" }
  }
}

Output

{ "type": "object", "properties": {
    "id":     { "type": "string" },
    "status": { "type": "string", "enum": ["forgotten", "noop"] } } }

2.6 memory.relate (optional)

Creates a typed edge in the memory graph.

Input

{
  "type": "object",
  "required": ["from", "to", "rel"],
  "properties": {
    "from":            { "type": "string" },
    "to":              { "type": "string" },
    "rel":             { "type": "string", "enum": ["supersedes", "depends_on", "contradicts", "refines", "relates_to"] },
    "idempotency_key": { "type": "string" }
  }
}

Output

{ "type": "object", "properties": {
    "edge_id": { "type": "string" },
    "status":  { "type": "string", "enum": ["created", "noop"] } } }

3. MCP Resources

Helix exposes read-only MCP Resources for clients that browse rather than call tools:

URI Description MIME
helix://graph The memory graph (nodes + typed edges) for the active scope application/json
helix://strand/manifest The "strand" manifest — the portable export descriptor (memories, embeddings provider, schema version) used for backup/transfer application/json

Resources are budget-aware: large graphs paginate via ?cursor=.


4. Local Daemon REST API

The daemon mirrors the MCP surface over plain HTTP for the CLI, SDKs, and non-MCP integrations. Same store, same gate, same budgeting.

Method & Path Mirrors Notes
POST /remember memory.write Body = write input; honors Idempotency-Key header
GET /recall?q=&limit=&max_tokens=&format= memory.search
POST /context memory.context
GET /memory/{id}?format= memory.get
POST /forget memory.forget
GET /graph?scope=&cursor= helix://graph
GET /healthz Liveness/readiness; no auth
GET /metrics Local Prometheus-style metrics (see Observability)

Example

curl -s 127.0.0.1:7878/recall \
  --get --data-urlencode "q=retry policy for the auth client" \
  --data "limit=5&format=concise&max_tokens=1200"
curl -s 127.0.0.1:7878/remember \
  -H "Idempotency-Key: write-2026-06-18-001" \
  -H "Content-Type: application/json" \
  -d '{"text":"Auth client retries 3x with jitter","kind":"decision","scope":"project"}'

All GET/POST requests are subject to the same Origin validation as the MCP HTTP endpoint.


5. SDKs

5.1 Python SDK

from helix import Helix

mem = Helix()  # connects to local daemon at 127.0.0.1:7878 (or HELIX_URL)

mem.write("Auth client retries 3x with jitter", kind="decision",
          idempotency_key="write-001")

hits = mem.search("retry policy", limit=5,
                  response_format="concise", max_tokens=1200)

pack = mem.context("debug the auth retry storm", max_tokens=4000)
print(pack.context, pack.citations)

m = mem.get("mem_2026-06-18_auth-retry_a1b2", response_format="detailed")
mem.forget(m.id, mode="tombstone")
mem.relate(m.id, "mem_..._b3", rel="supersedes")

Selected signatures:

class Helix:
    def __init__(self, url: str | None = None, *, scope: str = "project") -> None: ...
    def search(self, query: str, *, limit: int = 8, max_tokens: int = 1500,
               response_format: Literal["concise","detailed"] = "concise",
               scope: str | None = None, filters: dict | None = None) -> SearchResult: ...
    def context(self, task: str, *, max_tokens: int = 4000,
                response_format: Literal["concise","detailed"] = "concise") -> ContextPack: ...
    def write(self, text: str, *, kind: str = "fact", scope: str | None = None,
              tags: list[str] | None = None, idempotency_key: str | None = None,
              supersede_hint: str | None = None) -> WriteResult: ...
    def get(self, id: str, *, response_format: str = "detailed") -> Memory: ...
    def forget(self, id: str, *, mode: Literal["tombstone","hard"] = "tombstone") -> ForgetResult: ...
    def relate(self, frm: str, to: str, *, rel: str) -> RelateResult: ...

5.2 TypeScript SDK

import { Helix } from "@helix/sdk";

const mem = new Helix();                       // HELIX_URL or 127.0.0.1:7878

await mem.write("Auth client retries 3x with jitter",
  { kind: "decision", idempotencyKey: "write-001" });

const hits = await mem.search("retry policy",
  { limit: 5, responseFormat: "concise", maxTokens: 1200 });

const pack = await mem.context("debug the auth retry storm", { maxTokens: 4000 });
class Helix {
  constructor(opts?: { url?: string; scope?: "project" | "global" | "session" });
  search(query: string, opts?: SearchOpts): Promise<SearchResult>;
  context(task: string, opts?: ContextOpts): Promise<ContextPack>;
  write(text: string, opts?: WriteOpts): Promise<WriteResult>;
  get(id: string, opts?: { responseFormat?: "concise" | "detailed" }): Promise<Memory>;
  forget(id: string, opts?: { mode?: "tombstone" | "hard" }): Promise<ForgetResult>;
  relate(from: string, to: string, opts: { rel: EdgeRel }): Promise<RelateResult>;
}

6. Authentication & Authorization

Helix's posture is local = trivial, remote = strict (ADR-024, Security Model).

Mode Transport Auth
Local (default) stdio shim / 127.0.0.1 HTTP Env credentials, no OAuth. The MCP authorization spec explicitly scopes OAuth to remote servers; local stdio servers retrieve credentials from the environment. (https://modelcontextprotocol.io/specification/2025-06-18/basic/authorization)
Remote (optional) Streamable HTTP over the network OAuth 2.1 + PKCE, RFC 8707 resource indicators with audience validation; token passthrough is forbidden — Helix never forwards a client's token to an upstream API.

Local mode never opens a browser, never runs an OAuth dance, and never listens off-loopback. Remote mode (self-hosted team daemon) is opt-in and binds an authorization server per the MCP spec.


7. Versioning & Token-Budget Discipline

  • Stable tool names. memory.search etc. never get renamed; behavior changes are additive and signaled with tools/list_changed. (https://modelcontextprotocol.io/specification/2025-11-25/server/tools)
  • Token budget is a first-class contract. Every read tool takes max_tokens and returns tokens_used + truncated. Helix targets responses well under Claude Code's ~25,000-token tool-output cap, and response_format: concise is the default precisely to stay cheap. (https://www.anthropic.com/engineering/writing-tools-for-agents)
  • Pagination over truncation-blindness. Reads return next_cursor so the model can page deliberately instead of silently losing data.

Sources

  • MCP Transports (stdio + Streamable HTTP; SSE deprecated; multi-client; Origin validation) — https://modelcontextprotocol.io/specification/2025-06-18/basic/transports
  • MCP Authorization (OAuth 2.1 + PKCE, RFC 8707, no passthrough; local = env creds) — https://modelcontextprotocol.io/specification/2025-06-18/basic/authorization
  • MCP Server Tools (isError-in-result vs JSON-RPC error; tools/list_changed) — https://modelcontextprotocol.io/specification/2025-11-25/server/tools
  • Writing tools for agents (token budget ~25k, response_format, human IDs, fewer tools) — https://www.anthropic.com/engineering/writing-tools-for-agents
  • OpenMemory MCP (4-tool baseline) — https://mem0.ai/blog/introducing-openmemory-mcp
  • Reference KG memory server (9 tools) — https://github.com/modelcontextprotocol/servers/blob/main/src/memory/README.md

See also: TSD · MCP Integration · Plugins · Observability · Security Model · Decisions