Build with the memory layer.
MEMStorage routes every AI query through a three-tier decision before it reaches the model — instant memory hits, lightweight confirmations, or full inference with storage. This page walks through getting started in 5 minutes, the full HTTP API, and how to integrate at enterprise scale.
Quick start
From zero to first memory hit in three steps. You can get started without writing any code via the Chrome extension, or drop the API in front of any existing OpenAI / Anthropic / Gemini call.
Get an API key
Sign up free at memstorage.com/#pricing. Your dashboard issues a key starting with mem_live_…. Free tier includes 10,000 routed queries/month and 1 GB of memory.
Point your app at MEMStorage
Either change one line — swap your OpenAI base URL for https://api.memstorage.com/v1 — or call the routing endpoint directly. Your existing model API key stays in your environment; we proxy the model call only when no memory hit exists.
Ship traffic
Watch your dashboard fill with hit-rate, savings, and latency curves. Most production workloads see meaningful hit rate within the first 1,000 queries.
Drop-in proxy (curl)
The fastest way to test routing on your own traffic — no SDK required.
The memory.namespace isolates memories per workflow. Use one namespace per use case (support-tier1, lease-abstraction, etc.) to keep semantic matches tight.
How the three tiers work
Every query you send is scored against your namespace's memory store. Based on the similarity score and your configured thresholds, the router takes one of three paths.
Tier 1 — Memory hit (zero tokens)
If the incoming query scores above your hit_threshold (default 0.92 cosine similarity on a 768-dim embedding), the stored answer is returned immediately. No model API call. No tokens consumed. Latency: typically 8–25 ms.
Tier 2 — Confirmation call (~20 tokens)
When the score lands in the uncertainty band (0.78 ≤ score < 0.92 by default), the router issues a 20-token confirmation prompt to a small model — "Is question A semantically equivalent to question B? Answer yes/no." If yes, the stored answer is returned. If no, escalate to Tier 3.
Tier 3 — Full inference + storage
Below the uncertainty band, the query is sent to your configured model with full context. The response is embedded and stored in the namespace, indexed for future similarity search. Every subsequent semantically-equivalent query becomes a Tier 1 hit.
Configurable thresholds
You can tune the routing thresholds per namespace. Higher hit_threshold means stricter matches and higher accuracy at the cost of lower hit rate. We publish recommended starting values per use case in the dashboard.
| Threshold | Default | Range | Effect |
|---|---|---|---|
| hit_threshold | 0.92 | 0.85–0.98 | Above this score, return cached answer immediately. |
| confirm_threshold | 0.78 | 0.65–0.90 | Above this score, run a 20-token confirmation call. |
| ttl_days | 90 | 1–730 | How long memories live before re-validation. |
| max_tokens_saved | unlimited | — | Cap accounting field; set to 0 to disable. |
API reference
Base URL: https://api.memstorage.com/v1. All endpoints accept and return JSON. All times are UTC ISO-8601.
Authentication
Send your MEMStorage API key as a Bearer token. If you use the OpenAI-compatible /v1/chat/completions proxy, also pass your model provider key as X-Provider-Key. We never store provider keys at rest.
Request body
| Field | Type | Description |
|---|---|---|
| queryrequired | string | The user question or prompt. |
| namespacerequired | string | Memory namespace to search. Created on first write. |
| modeloptional | string | Model used on Tier 3 escalation. Defaults to gpt-4o-mini. |
| thresholdsoptional | object | Override hit_threshold / confirm_threshold for this call. |
| contextoptional | string | Document text or system prompt to ground Tier 3 inference. |
Response
| Field | Type | Description |
|---|---|---|
| questionrequired | string | Canonical question text. |
| answerrequired | string | Stored response returned on memory hit. |
| namespacerequired | string | Target namespace. |
| tagsoptional | string[] | Free-form labels for filtering and analytics. |
| sourceoptional | string | Provenance string (URL, doc ID, ticket ID). |
Errors & rate limits
Standard HTTP status codes. Errors return a JSON body with code, message, and request_id.
| Status | Code | Meaning |
|---|---|---|
| 400 | invalid_request | Missing or malformed field. message describes which. |
| 401 | auth_required | Missing or invalid API key. |
| 402 | quota_exceeded | Plan quota reached. Upgrade or wait for reset. |
| 404 | namespace_not_found | Namespace doesn't exist. Create one with the first /memorize or /route. |
| 429 | rate_limited | Free: 60 req/min. Pro: 600 req/min. Enterprise: custom. |
| 502 | provider_error | Tier 3 escalation failed at the model provider. Retry with backoff. |
Chrome extension install guide
The MEMStorage extension routes your ChatGPT, Claude, and Gemini conversations through the memory layer — no code changes, just install and sign in.
Install from the Chrome Web Store
Search "MEMStorage" in the Chrome Web Store, or visit our extension page directly. Click Add to Chrome. The extension requests permission only for chat.openai.com, claude.ai, and gemini.google.com.
Sign in
Click the MEMStorage icon in your toolbar and sign in with your dashboard account. Your free tier covers 10,000 routed queries per month.
Open ChatGPT, Claude, or Gemini
You'll see a small MEMStorage badge in the corner of the conversation. When a memory hit is found, the answer appears instantly with a green pulse — no model call made. Click the badge to inspect tier, confidence, and tokens saved.
(Optional) Developer mode
Engineers running the open-source extension build can clone github.com/MEMStorage/memstorage-extension and load the unpacked /extension folder via chrome://extensions → Load unpacked. Set your dashboard API key in Options.
The extension only sees prompts you send and responses returned. Conversations are stored in your namespace, not pooled with other users. Delete any memory at any time from the dashboard or via DELETE /v1/memories/:id.
Enterprise integration guide
For teams running production AI workloads at scale, MEMStorage offers three deployment patterns. All three preserve full provider compatibility — your application code does not need to change.
Pattern A — OpenAI-compatible proxy
Change one environment variable. Point your existing OpenAI / Anthropic SDK at our base URL. We forward to the original provider on Tier 3 escalation using the provider key you pass in X-Provider-Key. Zero refactor.
Pattern B — Native SDK
For finer control over thresholds, namespaces, and retrieval-augmented generation, use the MEMStorage SDK. Available in Python, TypeScript, and Go.
Pattern C — Self-hosted (VPC / on-prem)
For regulated workloads (financial services, healthcare, government), MEMStorage ships as a single container deployable in your VPC. Memories never leave your network. Encrypted at rest with a customer-managed KMS key. SOC 2 Type II report available under NDA.
| Capability | Proxy | SDK | Self-hosted |
|---|---|---|---|
| Setup time | ~5 min | ~30 min | ~1 day |
| Code changes | None | Minimal | None |
| Data residency | US (multi-region) | US (multi-region) | Your VPC |
| Custom thresholds | Per-call header | Full | Full |
| SSO / SCIM | — | — | Yes |
| BYO embedding model | — | Yes | Yes |
| Audit log export | 30 days | 30 days | Unlimited |
Security & compliance
SOC 2 Type II in flight. GDPR-ready DPA available on request. All inflight traffic TLS 1.3. All stored memories AES-256 at rest. Full data export and deletion APIs. Penetration test report from Cure53 available under NDA.
We run a no-fee 30-day pilot for enterprise teams with at least 1M monthly inference calls. You bring the workflow and your provider key; we instrument the routing layer and produce your actual savings number. Request a pilot →
Support
Email patrick@memstorage.com for technical questions or to request the SDK preview. Founder reads every message; typical reply within one business day.
For enterprise procurement (SOC 2 letter, MSA, security review), use the contact form and select Enterprise integration.