MEMStorage Docs — Quick start, API reference, integration guides

Section 01

Quick start

From zero to first memory hit in three steps. You can get started without writing any code via the Chrome extension, or drop the API in front of any existing OpenAI / Anthropic / Gemini call.

01

Get an API key

Sign up free at memstorage.com/#pricing. Your dashboard issues a key starting with mem_live_…. Free tier includes 10,000 routed queries/month and 1 GB of memory.

02

Point your app at MEMStorage

Either change one line — swap your OpenAI base URL for https://api.memstorage.com/v1 — or call the routing endpoint directly. Your existing model API key stays in your environment; we proxy the model call only when no memory hit exists.

03

Ship traffic

Watch your dashboard fill with hit-rate, savings, and latency curves. Most production workloads see meaningful hit rate within the first 1,000 queries.

Drop-in proxy (curl)

The fastest way to test routing on your own traffic — no SDK required.

# Replace OpenAI's URL with MEMStorage. Same request shape.
curl https://api.memstorage.com/v1/chat/completions \
  -H "Authorization: Bearer $MEMSTORAGE_API_KEY" \
  -H "X-Provider-Key: $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role":"user","content":"What is the rent escalation clause?"}],
    "memory": { "namespace": "lease-abstraction", "ttl_days": 90 }
  }'

# Response includes routing telemetry
{
  "id": "chatcmpl_…",
  "choices": [...],
  "memstorage": {
    "tier": "memory_hit",
    "confidence": 0.94,
    "tokens_saved": 3420,
    "cost_saved_usd": 0.0084
  }
}

Heads up

The memory.namespace isolates memories per workflow. Use one namespace per use case (support-tier1, lease-abstraction, etc.) to keep semantic matches tight.

Section 02

How the three tiers work

Every query you send is scored against your namespace's memory store. Based on the similarity score and your configured thresholds, the router takes one of three paths.

Tier 1 — Memory hit (zero tokens)

If the incoming query scores above your hit_threshold (default 0.92 cosine similarity on a 768-dim embedding), the stored answer is returned immediately. No model API call. No tokens consumed. Latency: typically 8–25 ms.

Tier 2 — Confirmation call (~20 tokens)

When the score lands in the uncertainty band (0.78 ≤ score < 0.92 by default), the router issues a 20-token confirmation prompt to a small model — "Is question A semantically equivalent to question B? Answer yes/no." If yes, the stored answer is returned. If no, escalate to Tier 3.

Tier 3 — Full inference + storage

Below the uncertainty band, the query is sent to your configured model with full context. The response is embedded and stored in the namespace, indexed for future similarity search. Every subsequent semantically-equivalent query becomes a Tier 1 hit.

Configurable thresholds

You can tune the routing thresholds per namespace. Higher hit_threshold means stricter matches and higher accuracy at the cost of lower hit rate. We publish recommended starting values per use case in the dashboard.

Threshold	Default	Range	Effect
hit_threshold	0.92	0.85–0.98	Above this score, return cached answer immediately.
confirm_threshold	0.78	0.65–0.90	Above this score, run a 20-token confirmation call.
ttl_days	90	1–730	How long memories live before re-validation.
max_tokens_saved	unlimited	—	Cap accounting field; set to 0 to disable.

Section 03

API reference

Base URL: https://api.memstorage.com/v1. All endpoints accept and return JSON. All times are UTC ISO-8601.

Authentication

Send your MEMStorage API key as a Bearer token. If you use the OpenAI-compatible /v1/chat/completions proxy, also pass your model provider key as X-Provider-Key. We never store provider keys at rest.

Authorization: Bearer mem_live_…
X-Provider-Key: sk-…              # only when using the proxy endpoint
Content-Type: application/json

POST /v1/route

Score a query against a namespace and return the routing decision and answer.

Request body

Field	Type	Description
queryrequired	string	The user question or prompt.
namespacerequired	string	Memory namespace to search. Created on first write.
modeloptional	string	Model used on Tier 3 escalation. Defaults to gpt-4o-mini.
thresholdsoptional	object	Override hit_threshold / confirm_threshold for this call.
contextoptional	string	Document text or system prompt to ground Tier 3 inference.

Response

{
  "id": "route_2YlzKx…",
  "answer": "3% annual escalation, compounding.",
  "tier": "memory_hit",         // memory_hit | confirmed | inference
  "confidence": 0.94,
  "matched_memory_id": "mem_a8K…",
  "tokens_used": 0,
  "tokens_saved": 3420,
  "cost_saved_usd": 0.0084,
  "latency_ms": 14
}

POST /v1/memorize

Manually store a (question, answer) pair. Useful for seeding a namespace from an existing knowledge base.

Field	Type	Description
questionrequired	string	Canonical question text.
answerrequired	string	Stored response returned on memory hit.
namespacerequired	string	Target namespace.
tagsoptional	string[]	Free-form labels for filtering and analytics.
sourceoptional	string	Provenance string (URL, doc ID, ticket ID).

GET /v1/memories?namespace=NAME&limit=50&cursor=…

List stored memories in a namespace, ordered by last hit time. Cursor-paginated.

DELETE /v1/memories/:id

Permanently remove a memory. Subsequent matching queries fall through to Tier 3.

Errors & rate limits

Standard HTTP status codes. Errors return a JSON body with code, message, and request_id.

Status	Code	Meaning
400	invalid_request	Missing or malformed field. message describes which.
401	auth_required	Missing or invalid API key.
402	quota_exceeded	Plan quota reached. Upgrade or wait for reset.
404	namespace_not_found	Namespace doesn't exist. Create one with the first /memorize or /route.
429	rate_limited	Free: 60 req/min. Pro: 600 req/min. Enterprise: custom.
502	provider_error	Tier 3 escalation failed at the model provider. Retry with backoff.

Section 04

Chrome extension install guide

The MEMStorage extension routes your ChatGPT, Claude, and Gemini conversations through the memory layer — no code changes, just install and sign in.

01

Install from the Chrome Web Store

Search "MEMStorage" in the Chrome Web Store, or visit our extension page directly. Click Add to Chrome. The extension requests permission only for chat.openai.com, claude.ai, and gemini.google.com.

02

Sign in

Click the MEMStorage icon in your toolbar and sign in with your dashboard account. Your free tier covers 10,000 routed queries per month.

03

Open ChatGPT, Claude, or Gemini

You'll see a small MEMStorage badge in the corner of the conversation. When a memory hit is found, the answer appears instantly with a green pulse — no model call made. Click the badge to inspect tier, confidence, and tokens saved.

04

(Optional) Developer mode

Engineers running the open-source extension build can clone github.com/MEMStorage/memstorage-extension and load the unpacked /extension folder via chrome://extensions → Load unpacked. Set your dashboard API key in Options.

Privacy

The extension only sees prompts you send and responses returned. Conversations are stored in your namespace, not pooled with other users. Delete any memory at any time from the dashboard or via DELETE /v1/memories/:id.

Section 05

Enterprise integration guide

For teams running production AI workloads at scale, MEMStorage offers three deployment patterns. All three preserve full provider compatibility — your application code does not need to change.

Pattern A — OpenAI-compatible proxy

Change one environment variable. Point your existing OpenAI / Anthropic SDK at our base URL. We forward to the original provider on Tier 3 escalation using the provider key you pass in X-Provider-Key. Zero refactor.

# before
OPENAI_BASE_URL=https://api.openai.com/v1

# after
OPENAI_BASE_URL=https://api.memstorage.com/v1
MEMSTORAGE_API_KEY=mem_live_…

Pattern B — Native SDK

For finer control over thresholds, namespaces, and retrieval-augmented generation, use the MEMStorage SDK. Available in Python, TypeScript, and Go.

# pip install memstorage
from memstorage import MemStorage

mem = MemStorage(api_key="mem_live_…")

result = mem.route(
    query="What is the security deposit?",
    namespace="lease-abstraction",
    context=lease_text,
    thresholds={"hit_threshold": 0.94}
)
print(result.answer, result.tier, result.cost_saved_usd)

Pattern C — Self-hosted (VPC / on-prem)

For regulated workloads (financial services, healthcare, government), MEMStorage ships as a single container deployable in your VPC. Memories never leave your network. Encrypted at rest with a customer-managed KMS key. SOC 2 Type II report available under NDA.

Capability	Proxy	SDK	Self-hosted
Setup time	~5 min	~30 min	~1 day
Code changes	None	Minimal	None
Data residency	US (multi-region)	US (multi-region)	Your VPC
Custom thresholds	Per-call header	Full	Full
SSO / SCIM	—	—	Yes
BYO embedding model	—	Yes	Yes
Audit log export	30 days	30 days	Unlimited

Security & compliance

SOC 2 Type II in flight. GDPR-ready DPA available on request. All inflight traffic TLS 1.3. All stored memories AES-256 at rest. Full data export and deletion APIs. Penetration test report from Cure53 available under NDA.

Pilot program

We run a no-fee 30-day pilot for enterprise teams with at least 1M monthly inference calls. You bring the workflow and your provider key; we instrument the routing layer and produce your actual savings number. Request a pilot →

Section 06

Support

Email patrick@memstorage.com for technical questions or to request the SDK preview. Founder reads every message; typical reply within one business day.

For enterprise procurement (SOC 2 letter, MSA, security review), use the contact form and select Enterprise integration.