Retrieval
How Memax finds the right memories — semantic search, reranking, and the retrieval pipeline.
Retrieval turns a natural-language question into a ranked set of
relevant memory chunks. memax recall and POST /v1/recall both
use this pipeline.
Pipeline
Query → Understand → Retrieve (hybrid) → Rerank → Access filter → Return- Understand — the query is normalized into retrieval signals (semantic, lexical, and contextual) that feed the next stage.
- Retrieve — a hybrid retrieval layer combines dense semantic matching with complementary signals for exact-phrase and metadata hits, ranked together.
- Rerank — a rerank stage rescores the top candidates against
the original query for precision. Pass
no_rerank: true/--no-rerankto skip this stage for lower latency. - Access filter — results you can't access are dropped. This happens at the data layer, not the handler (see Boundaries).
- Return — ranked
RecalledMemory[]withrelevance_score, source metadata, and the matchedchunk_content(not the full memory — usememax show <id>orGET /v1/memories/{id}for that).
The specific models and index backing each stage are implementation details and subject to change.
Semantic, not keyword
Recall finds by meaning — paraphrases and synonymous queries hit the same memories.
# All three should find the same architecture memory:
memax recall "how does the system architecture work?"
memax recall "what's the tech stack?"
memax recall "describe the infrastructure"For metadata-based listing (tags, topic, recent), use
memax list — there is no memax search command.
Filtering
Recall accepts explicit filters that narrow the candidate set before scoring:
| Filter | CLI flag | API field |
|---|---|---|
| Limit | --limit | limit |
| Specific hubs | --hub <slug> (boosts ranking) | hub_ids[] |
| Tags | --tags foo,bar | tags[] |
| A specific topic | --topic-id <id> | topic_id |
| A memory kind | — | kind |
| Include archived | --include-archived | include_archived |
| Skip reranker | --no-rerank | no_rerank |
| Temporal window | — | created_after, created_before |
Recall does not take a --hint or threshold flag today.
Context passing
MCP-based recall accepts a project_context object (repo / project /
branch) that the server uses as a soft relevance boost. This is
how hooks and the CLI pass the current project state without
polluting the query text:
{
"tool": "memax_recall",
"arguments": {
"query": "how does auth work?",
"project_context": {
"repo": "github.com/org/app",
"branch": "fix/auth-refresh"
}
}
}Performance
Recall is interactive — it's expected to complete in well under a second on normal corpora. Exact timing depends on:
- Query understanding (may involve an external model call).
- Corpus size + index state.
- Whether the rerank stage ran (rerank is the dominant component when on).
- Cold vs. warm pools (first call per machine after deploy is slower).
The server returns query_metadata.latency_ms on every recall
response so you can track end-to-end time directly.
If rerank latency is the bottleneck for your use case, pass no_rerank: true
— the result set is less precisely ordered but returns noticeably faster. For
hooks in the Claude Code prompt-submit path, you may want this tradeoff.