Retrieval
How Memax finds the right context — semantic search, hybrid ranking, reranking, and the retrieval pipeline.
Retrieval is the core of Memax. When an agent asks "how does auth work?", Memax needs to find the most relevant pieces of knowledge from potentially thousands of memories.
Retrieval pipeline
Query → Embed → Vector Search → Rerank → Filter → Return- Embed — the query is embedded into a vector using Voyage AI
- Vector Search — pgvector finds the top-N most similar chunks using cosine similarity
- Rerank — Cohere Rerank scores each candidate for relevance to the original query
- Filter — boundary enforcement, deduplication, and hub scoping
- Return — ranked results with relevance scores
Precision over recall
Memax optimizes for precision over recall. Returning irrelevant context is worse than returning nothing — it wastes the agent's context window and can lead to hallucinations.
This means:
- Results below a relevance threshold are dropped, not returned
- A query that doesn't match anything returns an empty set, not "close enough" results
- Fewer, highly relevant results beat many loosely related ones
Semantic vs. keyword search
memax recall uses semantic search — it finds content by meaning, not just keyword matching.
# These all find the same architecture doc:
memax recall "how does the system architecture work?"
memax recall "what's the tech stack?"
memax recall "describe the infrastructure"memax search uses structured filters — categories, tags, dates, and other metadata.
# Find by category
memax search --category decisions
# Find by tag
memax search --tags "auth,security"
# Find recent
memax search --since 7dContext hints
Improve retrieval accuracy by providing context alongside your query:
memax recall "how does auth work?" --hint "working on the login flow"In MCP, agents can pass project_context and hint parameters:
{
"tool": "memax_recall",
"arguments": {
"query": "how does auth work?",
"hint": "debugging a token refresh issue",
"project_context": {
"repo": "github.com/org/app",
"branch": "fix/auth-refresh"
}
}
}Performance
| Metric | Target |
|---|---|
| Recall latency (p95) | < 500ms |
| Embedding time | ~50ms |
| Vector search | ~20ms |
| Reranking | ~200ms |
The retrieval pipeline is designed to never block the user's agent. If any step is slow, it degrades gracefully — returning cached or partial results rather than timing out.