Documentation · v0.4 · Patent pending

Build with the memory layer.

MEMStorage routes every AI query through a three-tier decision before it reaches the model — instant memory hits, lightweight confirmations, or full inference with storage. This page walks through getting started in 5 minutes, the full HTTP API, and how to integrate at enterprise scale.

Last updated: May 2026 · Stable API · patrick@memstorage.com

Quick start

From zero to first memory hit in three steps. You can get started without writing any code via the Chrome extension, or drop the API in front of any existing OpenAI / Anthropic / Gemini call.

01

Get an API key

Sign up free at memstorage.com/#pricing. Your dashboard issues a key starting with mem_live_…. Free tier includes 10,000 routed queries/month and 1 GB of memory.

02

Point your app at MEMStorage

Either change one line — swap your OpenAI base URL for https://api.memstorage.com/v1 — or call the routing endpoint directly. Your existing model API key stays in your environment; we proxy the model call only when no memory hit exists.

03

Ship traffic

Watch your dashboard fill with hit-rate, savings, and latency curves. Most production workloads see meaningful hit rate within the first 1,000 queries.

Drop-in proxy (curl)

The fastest way to test routing on your own traffic — no SDK required.

# Replace OpenAI's URL with MEMStorage. Same request shape. curl https://api.memstorage.com/v1/chat/completions \ -H "Authorization: Bearer $MEMSTORAGE_API_KEY" \ -H "X-Provider-Key: $OPENAI_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4o-mini", "messages": [{"role":"user","content":"What is the rent escalation clause?"}], "memory": { "namespace": "lease-abstraction", "ttl_days": 90 } }' # Response includes routing telemetry { "id": "chatcmpl_…", "choices": [...], "memstorage": { "tier": "memory_hit", "confidence": 0.94, "tokens_saved": 3420, "cost_saved_usd": 0.0084 } }
Heads up

The memory.namespace isolates memories per workflow. Use one namespace per use case (support-tier1, lease-abstraction, etc.) to keep semantic matches tight.

How the three tiers work

Every query you send is scored against your namespace's memory store. Based on the similarity score and your configured thresholds, the router takes one of three paths.

Tier 1 — Memory hit (zero tokens)

If the incoming query scores above your hit_threshold (default 0.92 cosine similarity on a 768-dim embedding), the stored answer is returned immediately. No model API call. No tokens consumed. Latency: typically 8–25 ms.

Tier 2 — Confirmation call (~20 tokens)

When the score lands in the uncertainty band (0.78 ≤ score < 0.92 by default), the router issues a 20-token confirmation prompt to a small model — "Is question A semantically equivalent to question B? Answer yes/no." If yes, the stored answer is returned. If no, escalate to Tier 3.

Tier 3 — Full inference + storage

Below the uncertainty band, the query is sent to your configured model with full context. The response is embedded and stored in the namespace, indexed for future similarity search. Every subsequent semantically-equivalent query becomes a Tier 1 hit.

Configurable thresholds

You can tune the routing thresholds per namespace. Higher hit_threshold means stricter matches and higher accuracy at the cost of lower hit rate. We publish recommended starting values per use case in the dashboard.

ThresholdDefaultRangeEffect
hit_threshold0.920.85–0.98Above this score, return cached answer immediately.
confirm_threshold0.780.65–0.90Above this score, run a 20-token confirmation call.
ttl_days901–730How long memories live before re-validation.
max_tokens_savedunlimitedCap accounting field; set to 0 to disable.

API reference

Base URL: https://api.memstorage.com/v1. All endpoints accept and return JSON. All times are UTC ISO-8601.

Authentication

Send your MEMStorage API key as a Bearer token. If you use the OpenAI-compatible /v1/chat/completions proxy, also pass your model provider key as X-Provider-Key. We never store provider keys at rest.

Authorization: Bearer mem_live_… X-Provider-Key: sk-… # only when using the proxy endpoint Content-Type: application/json
POST /v1/route
Score a query against a namespace and return the routing decision and answer.

Request body

FieldTypeDescription
queryrequiredstringThe user question or prompt.
namespacerequiredstringMemory namespace to search. Created on first write.
modeloptionalstringModel used on Tier 3 escalation. Defaults to gpt-4o-mini.
thresholdsoptionalobjectOverride hit_threshold / confirm_threshold for this call.
contextoptionalstringDocument text or system prompt to ground Tier 3 inference.

Response

{ "id": "route_2YlzKx…", "answer": "3% annual escalation, compounding.", "tier": "memory_hit", // memory_hit | confirmed | inference "confidence": 0.94, "matched_memory_id": "mem_a8K…", "tokens_used": 0, "tokens_saved": 3420, "cost_saved_usd": 0.0084, "latency_ms": 14 }
POST /v1/memorize
Manually store a (question, answer) pair. Useful for seeding a namespace from an existing knowledge base.
FieldTypeDescription
questionrequiredstringCanonical question text.
answerrequiredstringStored response returned on memory hit.
namespacerequiredstringTarget namespace.
tagsoptionalstring[]Free-form labels for filtering and analytics.
sourceoptionalstringProvenance string (URL, doc ID, ticket ID).
GET /v1/memories?namespace=NAME&limit=50&cursor=…
List stored memories in a namespace, ordered by last hit time. Cursor-paginated.
DELETE /v1/memories/:id
Permanently remove a memory. Subsequent matching queries fall through to Tier 3.

Errors & rate limits

Standard HTTP status codes. Errors return a JSON body with code, message, and request_id.

StatusCodeMeaning
400invalid_requestMissing or malformed field. message describes which.
401auth_requiredMissing or invalid API key.
402quota_exceededPlan quota reached. Upgrade or wait for reset.
404namespace_not_foundNamespace doesn't exist. Create one with the first /memorize or /route.
429rate_limitedFree: 60 req/min. Pro: 600 req/min. Enterprise: custom.
502provider_errorTier 3 escalation failed at the model provider. Retry with backoff.

Chrome extension install guide

The MEMStorage extension routes your ChatGPT, Claude, and Gemini conversations through the memory layer — no code changes, just install and sign in.

01

Install from the Chrome Web Store

Search "MEMStorage" in the Chrome Web Store, or visit our extension page directly. Click Add to Chrome. The extension requests permission only for chat.openai.com, claude.ai, and gemini.google.com.

02

Sign in

Click the MEMStorage icon in your toolbar and sign in with your dashboard account. Your free tier covers 10,000 routed queries per month.

03

Open ChatGPT, Claude, or Gemini

You'll see a small MEMStorage badge in the corner of the conversation. When a memory hit is found, the answer appears instantly with a green pulse — no model call made. Click the badge to inspect tier, confidence, and tokens saved.

04

(Optional) Developer mode

Engineers running the open-source extension build can clone github.com/MEMStorage/memstorage-extension and load the unpacked /extension folder via chrome://extensionsLoad unpacked. Set your dashboard API key in Options.

Privacy

The extension only sees prompts you send and responses returned. Conversations are stored in your namespace, not pooled with other users. Delete any memory at any time from the dashboard or via DELETE /v1/memories/:id.

Enterprise integration guide

For teams running production AI workloads at scale, MEMStorage offers three deployment patterns. All three preserve full provider compatibility — your application code does not need to change.

Pattern A — OpenAI-compatible proxy

Change one environment variable. Point your existing OpenAI / Anthropic SDK at our base URL. We forward to the original provider on Tier 3 escalation using the provider key you pass in X-Provider-Key. Zero refactor.

# before OPENAI_BASE_URL=https://api.openai.com/v1 # after OPENAI_BASE_URL=https://api.memstorage.com/v1 MEMSTORAGE_API_KEY=mem_live_…

Pattern B — Native SDK

For finer control over thresholds, namespaces, and retrieval-augmented generation, use the MEMStorage SDK. Available in Python, TypeScript, and Go.

# pip install memstorage from memstorage import MemStorage mem = MemStorage(api_key="mem_live_…") result = mem.route( query="What is the security deposit?", namespace="lease-abstraction", context=lease_text, thresholds={"hit_threshold": 0.94} ) print(result.answer, result.tier, result.cost_saved_usd)

Pattern C — Self-hosted (VPC / on-prem)

For regulated workloads (financial services, healthcare, government), MEMStorage ships as a single container deployable in your VPC. Memories never leave your network. Encrypted at rest with a customer-managed KMS key. SOC 2 Type II report available under NDA.

CapabilityProxySDKSelf-hosted
Setup time~5 min~30 min~1 day
Code changesNoneMinimalNone
Data residencyUS (multi-region)US (multi-region)Your VPC
Custom thresholdsPer-call headerFullFull
SSO / SCIMYes
BYO embedding modelYesYes
Audit log export30 days30 daysUnlimited

Security & compliance

SOC 2 Type II in flight. GDPR-ready DPA available on request. All inflight traffic TLS 1.3. All stored memories AES-256 at rest. Full data export and deletion APIs. Penetration test report from Cure53 available under NDA.

Pilot program

We run a no-fee 30-day pilot for enterprise teams with at least 1M monthly inference calls. You bring the workflow and your provider key; we instrument the routing layer and produce your actual savings number. Request a pilot →

Support

Email patrick@memstorage.com for technical questions or to request the SDK preview. Founder reads every message; typical reply within one business day.

For enterprise procurement (SOC 2 letter, MSA, security review), use the contact form and select Enterprise integration.