memaxdocs
Core Concepts

Retrieval

How Memax finds the right context — semantic search, hybrid ranking, reranking, and the retrieval pipeline.

Retrieval is the core of Memax. When an agent asks "how does auth work?", Memax needs to find the most relevant pieces of knowledge from potentially thousands of memories.

Retrieval pipeline

Query → Embed → Vector Search → Rerank → Filter → Return
  1. Embed — the query is embedded into a vector using Voyage AI
  2. Vector Search — pgvector finds the top-N most similar chunks using cosine similarity
  3. Rerank — Cohere Rerank scores each candidate for relevance to the original query
  4. Filter — boundary enforcement, deduplication, and hub scoping
  5. Return — ranked results with relevance scores

Precision over recall

Memax optimizes for precision over recall. Returning irrelevant context is worse than returning nothing — it wastes the agent's context window and can lead to hallucinations.

This means:

  • Results below a relevance threshold are dropped, not returned
  • A query that doesn't match anything returns an empty set, not "close enough" results
  • Fewer, highly relevant results beat many loosely related ones

memax recall uses semantic search — it finds content by meaning, not just keyword matching.

# These all find the same architecture doc:
memax recall "how does the system architecture work?"
memax recall "what's the tech stack?"
memax recall "describe the infrastructure"

memax search uses structured filters — categories, tags, dates, and other metadata.

# Find by category
memax search --category decisions

# Find by tag
memax search --tags "auth,security"

# Find recent
memax search --since 7d

Context hints

Improve retrieval accuracy by providing context alongside your query:

memax recall "how does auth work?" --hint "working on the login flow"

In MCP, agents can pass project_context and hint parameters:

{
  "tool": "memax_recall",
  "arguments": {
    "query": "how does auth work?",
    "hint": "debugging a token refresh issue",
    "project_context": {
      "repo": "github.com/org/app",
      "branch": "fix/auth-refresh"
    }
  }
}

Performance

MetricTarget
Recall latency (p95)< 500ms
Embedding time~50ms
Vector search~20ms
Reranking~200ms

The retrieval pipeline is designed to never block the user's agent. If any step is slow, it degrades gracefully — returning cached or partial results rather than timing out.