Helix — Observability & Cost Telemetry¶
Status: Draft v1 · Last updated: 2026-06-18 · Related: TSD · MCP Integration · Decisions · Cost · Security
Helix's observability story exists to prove two claims: that it costs $0 by default, and that it never leaks. Telemetry is off by default, local-only when on, and secrets are never logged.
Design authority: ADR-007 (telemetry posture) and ADR-025 (cost dashboard / budget guardrail). See Decisions.
1. Principles¶
- OFF by default. No metrics, no logs leave the machine — and by default nothing is even collected beyond what
--verboseprints. Telemetry is an explicit opt-in. - Local-only when on. When telemetry is enabled it writes to a local store and a local
/metricsendpoint. There is no default phone-home. - Secrets never logged. The
Redactorruns before anything is persisted or printed. API keys, tokens, and detected PII are replaced with‹redacted›. (See Security Model.) - Redacted logs everywhere. Even at
DEBUG, memory content is summarized/hashed, not dumped, unless--unsafe-log-contentis set explicitly. - Cost is observable, not estimated after the fact. Every LLM/embedding call is metered at call time so the dashboard is always current.
2. Local Metrics¶
Collected in-process and exposed locally. Nothing here requires a network.
┌──────────────────────── helixd ────────────────────────┐
│ recall path write path consolidation worker │
│ p50/p95 lat gate drop-rate op-mix (merge/super…) │
│ cache hits tokens/cost store sizes │
└───────────────┬─────────────────────────────────────────┘
▼
local metrics registry ──▶ GET /metrics (Prometheus text)
│
▼
helix cost (terminal dashboard)
What's measured
- Recall latency —
p50/p95ofmemory.search+memory.context. - Gate drop-rate — fraction of candidate memories dropped by the relevance/budget gate (the lever that keeps token spend near zero).
- LLM usage — calls, prompt/completion tokens, estimated cost per provider/model.
- Embedding — calls, tokens, and cache hit rate (a cache hit is a
$0embed). - Store sizes — memory count, vector bytes, graph edges, on-disk bytes per store.
- Consolidation op mix — counts of
merge/supersede/tombstone/noopfrom the worker.
3. The Cost Dashboard (proving "$0")¶
helix cost renders the numbers that back the $0-default claim and a monthly token-budget guardrail (ADR-025, Cost Optimization).
$ helix cost --period month
HELIX COST · 2026-06 · provider: null (local fastembed)
────────────────────────────────────────────────────────────
LLM calls ............ 0 tokens ........ 0
Embedding calls ...... 1,284 tokens .. 412,909 (local, $0)
Embedding cache hits . 71% → 3,140 embeds avoided
Gate drop-rate ....... 63% → tokens kept out of context
Estimated spend ...... $0.00 budget: $0.00 / mo ✔ under
────────────────────────────────────────────────────────────
Guardrail: token budget 0 / 2,000,000 paid tokens ▱▱▱▱▱ 0%
When a paid provider is enabled, the same view shows real spend and trips the guardrail:
Estimated spend ...... $1.84 budget: $5.00 / mo ✔ under
Guardrail: token budget 730k / 2,000,000 paid tokens ▰▰▱▱▱ 37%
The gate drop-rate is the headline cost-control metric: a high drop-rate means Helix is aggressively keeping irrelevant memory out of the token budget, which is why $0/cheap is achievable. The monthly token-budget guardrail warns and can hard-stop paid calls when exceeded.
4. Structured Logging¶
JSON lines, leveled, redacted.
| Level | Use |
|---|---|
ERROR |
failed writes/recalls, plugin load failures, store errors |
WARN |
budget guardrail tripped, gate dropped unusually high, slow recall |
INFO |
daemon lifecycle, migrations, connector writes, consolidation summaries |
DEBUG |
per-call timings, gate decisions, cache hits (content redacted/hashed) |
TRACE |
wire-level JSON-RPC framing (loopback only; never includes secrets) |
{"ts":"2026-06-18T10:12:04Z","level":"INFO","evt":"recall",
"scope":"project","took_ms":18,"p":"p50","hits":8,"dropped":14,
"tokens_used":1180,"query":"‹redacted›"}
query and memory content are redacted by default; set --unsafe-log-content only for local debugging.
5. Metrics Catalog¶
| Metric | Type | Labels | Meaning |
|---|---|---|---|
helix_recall_latency_seconds |
histogram | op{search,context} |
recall p50/p95 |
helix_gate_dropped_total |
counter | scope |
memories dropped by the gate |
helix_gate_candidates_total |
counter | scope |
candidates considered (drop-rate = dropped/candidates) |
helix_llm_calls_total |
counter | provider,model |
LLM invocations |
helix_llm_tokens_total |
counter | provider,model,dir{prompt,completion} |
LLM tokens |
helix_llm_cost_usd_total |
counter | provider,model |
estimated spend |
helix_embed_calls_total |
counter | provider |
embedding calls |
helix_embed_cache_hits_total |
counter | provider |
cache hits (= $0 embeds) |
helix_store_bytes |
gauge | store,kind{vector,graph} |
on-disk size |
helix_memories_total |
gauge | scope |
live memory count |
helix_consolidation_ops_total |
counter | op{merge,supersede,tombstone,noop} |
worker op mix |
helix_budget_tokens_used |
gauge | period |
paid tokens vs monthly guardrail |
6. The /metrics Endpoint¶
The daemon optionally serves a Prometheus-style text endpoint — local only, bound to 127.0.0.1, subject to the same Origin validation as the rest of the daemon (see API Reference §1).
curl -s 127.0.0.1:7878/metrics | head
# HELP helix_recall_latency_seconds Recall latency
# TYPE helix_recall_latency_seconds histogram
helix_recall_latency_seconds_bucket{op="search",le="0.01"} 402
helix_gate_dropped_total{scope="project"} 9381
helix_embed_cache_hits_total{provider="fastembed"} 3140
helix_llm_cost_usd_total{provider="null",model="none"} 0
Disabled unless telemetry.metrics = true. There is no remote scrape target by default; you opt in by pointing your own local Prometheus at loopback.
7. Opt-in Aggregate Sharing¶
Helix never exfiltrates by default. If — and only if — a user runs helix telemetry share --aggregate, Helix may send coarse, non-content aggregates (e.g., recall p95 bucket, gate drop-rate band, store-size class). Rules:
- No memory content, no queries, no IDs, no keys ever leave the machine.
- Payloads are redacted and bucketed before send; reviewable via
helix telemetry preview. - Revocable at any time (
helix telemetry share --off); default remains off.
8. Opinionated Decisions¶
| # | Decision | Rationale |
|---|---|---|
| 1 | Telemetry OFF by default | Local-first trust; nothing leaks unless asked (ADR-007) |
| 2 | Metrics & logs are local-only | No default phone-home; you own the data |
| 3 | Redactor runs before persist & log | Secrets/PII never reach disk or stdout (ADR-024) |
| 4 | Gate drop-rate is a first-class metric | It's the lever that proves $0/cheap |
| 5 | Cost metered at call time, not estimated later | Dashboard is always current; guardrail can hard-stop |
| 6 | Monthly token-budget guardrail with hard-stop option | Bounds worst-case spend for paid providers (ADR-025) |
| 7 | Embedding cache hit rate surfaced | Cache hits are $0 embeds — the default-cost story |
| 8 | /metrics is opt-in, loopback, Origin-validated |
Observability without opening an attack surface |
Sources¶
- MCP Transports — local bind
127.0.0.1+ Origin validation (DNS-rebinding defense) — https://modelcontextprotocol.io/specification/2025-06-18/basic/transports - Writing tools for agents — token-budget discipline informs the cost dashboard — https://www.anthropic.com/engineering/writing-tools-for-agents
See also: API Reference · Plugins · TSD · MCP Integration · Cost Optimization · Security Model · Decisions