FAQ — MEMStorage

What MEMStorage Is

MEMStorage is the Inference Control Layer that sits upstream of your model stack. Using memory routing, it decides — before a request reaches the model — whether the request can be resolved from memory, needs lightweight validation, or requires full inference.

The goal is simple: stop paying for full inference on requests that don't need it.

No. MEMStorage sits above your existing models — you keep OpenAI, Anthropic, Gemini, or your own. It adds a routing layer that decides when a model needs to run at all. You don't switch providers; you add a layer.

RAG retrieves context and then runs full inference to generate an answer. MEMStorage decides whether full inference should happen at all — resolving high-confidence repeated requests without re-running the model. They are complementary: RAG improves what the model sees; MEMStorage reduces how often the model runs.

Prompt caching works within a single session and only for identical prefixes. MEMStorage works across sessions and across users, scores incoming requests against existing memory by confidence, and validates uncertain matches before serving them. It is decision infrastructure, not a key-value cache.

Most inference gateways, routers, and control planes assume a model call will happen and focus on routing, security, observability, or optimization. MEMStorage operates before that assumption. It determines whether fresh inference is required at all, or whether trusted state or lightweight validation can satisfy the request before compute is consumed.

Most AI systems recompute by default — the same questions hit the model again and again at full cost. A large share of enterprise AI queries are repeats. Routing those requests away from full inference is where the cost, latency, and governance gains come from.

The problem is largest where AI runs at scale on repetitive, document-heavy workflows — and where cost, auditability, and data isolation are budget-line concerns. Enterprises also need routing decisions that finance and compliance can verify. That is the environment MEMStorage is built for.

Yes. MEMStorage works above any model — OpenAI, Anthropic, Gemini, open models, or your private deployments — with no changes to your stack. You add the layer; whatever you already run becomes cheaper to operate.

The cheapest token is the one you never generate. By routing repeated requests away from full inference, MEMStorage targets the structural problem behind rising AI bills: paying to recompute answers you already have. Every routing decision is logged, so the impact is measurable rather than taken on faith.

Product

Prompt caching only works within a single session and only for identical prefixes. MEMStorage works across sessions, across users, and matches by confidence rather than exact text. v1 uses confidence-scored matching; embedding-based retrieval is on the roadmap.

If someone asks the same question in a different way, we still catch it. That's the layer OpenAI and Anthropic don't cover — and the fact that they price cached inputs dramatically lower tells you they know a huge share of usage is repeated queries.

We use a three-tier scoring system. Score ≥75% is a memory hit, returned instantly at zero cost. Score 28–74% triggers a lightweight 20-token confirmation call. Below 28% is a novel query — goes to full inference.

The thresholds are configurable per enterprise client based on their risk tolerance for false positives.

The confirmation tier exists exactly for this. When we're uncertain, we don't just serve memory — we validate it with a minimal model call first.

Enterprises can also set their own confidence thresholds and flag certain query types for mandatory full inference regardless of match score.

They can. Some do. But building it means solving confidence scoring, threshold tuning, cross-session persistence, security isolation between users, and ongoing maintenance.

Our three-tier routing is live today, using confidence-scored matching, with embedding-based retrieval on the roadmap. And it's patent pending — our specific approach to three-tier routing is protected.

We only store answers that came from successful full inference calls. We don't store uncertain or flagged responses.

Enterprises can also set expiry windows on stored memories so stale answers get refreshed automatically.

Security & Data

Each enterprise client has isolated memory storage with no cross-client data sharing. v1 uses confidence-scored matching; embedding-based retrieval is on the roadmap.

We can also deploy on-premise for clients who require it.

IP & Legal

We filed the provisional patent application. A provisional gives us 12 months of protection while we file the nonprovisional.

The raise funds the nonprovisional filing among other milestones.

Traction & Business

Enterprise: API layer for companies spending $30K+ per month on inference. Direct outreach, 30-day pilots, money-back if no measurable ROI.

Consumer: Chrome extension for individuals using ChatGPT or Claude. Bottom-up adoption that creates enterprise pull when teams start using it.

For Investors

The capital deploys against four pillars: engineering (vector embeddings, Chrome extension, persistent infrastructure), go-to-market (enterprise pilots, beta launch, first sales hire), IP & legal (nonprovisional patent filing, trademark, corporate counsel), and operations (infrastructure, tooling, runway).

Round target: 3–5 paying enterprise pilots, 1,000 beta users, and Chrome Web Store launch.

Raise size and terms shared privately on request — patrick@memstorage.com.

Once we have 3–5 signed enterprise pilots and 1,000 beta users, we raise a Series A to build the full team and expand the enterprise sales motion.

The memory flywheel compounds with scale — the unit economics improve as usage grows.

Questions we
get every time.

Still have questions?

Questions weget every time.

Still have questions?

Questions we
get every time.