Discovery & Search
Catalog Architecture
When MetaMCP first connects to a child server, it calls tools/list on that server and caches the result. The catalog maps tool names to their JSON schemas and owning servers.
Cache TTL: 1 hour (CATALOG_TTL_MS = 3_600_000 from src/catalog.ts). After expiry, the next request to that server triggers a fresh tools/list call.
Cleanup interval: Every 5 minutes (CATALOG_CLEANUP_MS = 300_000), the catalog removes entries for servers that are no longer connected.
The catalog is built lazily. Servers that have never been contacted have no catalog entries. When mcp_discover or mcp_provision runs a search, any server not yet in the catalog is spawned, its tools are fetched, and the results are cached.
Keyword Scoring
Every search query is matched against tool names and descriptions using a simple scoring system.
| Match Type | Score | Constant |
|---|---|---|
| Exact name match | 10 | SCORE_EXACT_NAME |
| Name contains query | 5 | SCORE_NAME_CONTAINS |
| Description contains query | 2 | SCORE_DESC_CONTAINS |
Matching is case-insensitive. A tool can accumulate points from multiple match types. For example, a tool named take_screenshot with a description containing "screenshot" would score 5 (name contains) + 2 (description contains) = 7 for the query "screenshot".
Results are sorted by score in descending order. The default limit is the top 20 results (DEFAULT_TOP_N).
Keyword scoring is always available, with no external dependencies.
Semantic Search
When a Voyage AI API key is configured, MetaMCP generates vector embeddings for tool descriptions and enables semantic search.
Configuration: Set either VOYAGE_API_KEY or ANTHROPIC_API_KEY as an environment variable. MetaMCP uses the Voyage AI embedding API.
Model: voyage-3-lite. Embeddings are generated for each tool's description when the catalog is first populated.
Storage: Embeddings are stored as raw Float32Array blobs in SQLite. The embedding dimensions are determined by the model.
Query flow:
- The search query is embedded using the same model.
- Candidate tools are retrieved (up to
VECTOR_CANDIDATE_LIMIT = 50). - Cosine similarity is computed between the query embedding and each candidate's embedding.
- Results are scored on a [0, 1] scale.
Semantic search excels at finding tools when the query uses different terminology than the tool name. For example, searching "take a picture of the page" can match browser_screenshot even though the words are different.
Hybrid Scoring Formula
When semantic search is available, results are ranked using a weighted combination of both signals.
finalScore = 0.6 * semanticScore + 0.4 * keywordScore| Weight | Source | Constant |
|---|---|---|
| 0.6 | Semantic similarity | SEMANTIC_WEIGHT |
| 0.4 | Keyword match | KEYWORD_WEIGHT |
Both scores are normalized to [0, 1] before combining. Keyword scores are normalized by dividing by the maximum possible score (an exact name match at 10).
When semantic search is unavailable (no API key configured), the system falls back to keyword-only ranking.
Vector Store
Embeddings are persisted in a SQLite database for fast retrieval across sessions.
Location: ~/.metamcp/catalog.db
Table schema:
| Column | Type | Description |
|---|---|---|
server |
TEXT | Name of the owning server |
tool_name |
TEXT | Name of the tool |
description |
TEXT | Tool description text |
embedding |
BLOB | Raw Float32Array bytes |
updated_at |
INTEGER | Timestamp of last update |
The database uses WAL (Write-Ahead Logging) journal mode for concurrent read performance. Search uses brute-force cosine similarity, which is adequate at MetaMCP's scale (typically hundreds to low thousands of tools, not millions).
The schema is versioned with automatic migration when the format changes.
Deleting ~/.metamcp/catalog.db is safe. MetaMCP regenerates embeddings on the next search. The only cost is a brief delay while embeddings are recomputed.
Registry Search
When mcp_provision cannot find a matching capability among local servers, it searches the public npm registry for published MCP servers.
Registry endpoint: https://registry.modelcontextprotocol.io/servers
Cache TTL: 24 hours. Registry results are cached locally to avoid repeated network calls.
Fallback list: If the registry is unreachable, MetaMCP includes a built-in list of 20 well-known servers from 7 organizations:
| Namespace | Servers |
|---|---|
@modelcontextprotocol |
filesystem, github, gitlab, google-maps, memory, postgres, slack, sqlite, brave-search, puppeteer, fetch, everything |
@anthropic |
sequential-thinking |
@playwright |
mcp |
@stripe |
mcp |
@sentry |
mcp-server |
| Community | mcp-server-docker, mcp-server-kubernetes, mcp-server-git, mcp-server-linear |
Registry search is only triggered by mcp_provision, not by mcp_discover. Discovery searches local catalogs only.
Next Steps
- Auto-Provisioning for how
mcp_provisioninstalls and starts matched servers - The Four Tools for how discovery fits into the overall tool surface