Semantic Tool Selection at Scale
At >20 tools, use embedding-based selection. Pick ~12 per turn with cosine similarity + prepareStep hook.
Summary
Passing 50+ tool definitions to the model per request wastes tokens and dilutes focus. Semantic tool selection uses embeddings to pick ~12 relevant tools per turn based on the user's message. The "toolpick" pattern: embed tool descriptions once at startup, cache them, then at each step use cosine similarity to rank tools and pass only the top candidates.
- 20-tool threshold: Beyond 20 tools, token cost of listing all tools + context loss > benefit of pre-loaded tools.
- ~12 tools per turn: Empirically optimal; balances discovery vs. context window.
- "Always active" set: Small, critical tools (web_search, search_tools, meta-tools) always available.
- Embedding cache: Compute embeddings once at startup; reuse across requests (Redis or file cache).
- prepareStep hook: Vercel AI SDK's integration point; called before model receives tools.
The Problem
// ❌ Don't do this at scale
const agent = new ToolLoopAgent({
model: openai("gpt-4o-mini"),
instructions: systemPrompt,
tools: {
// 50+ tools, all passed to every request
customers_list: {...},
customers_get: {...},
customers_create: {...},
orders_list: {...},
orders_get: {...},
orders_update: {...},
invoices_list: {...},
// ... 40 more ...
},
});Cost: ~500 tokens listing tool definitions alone. The model wastes context parsing irrelevant tools (order tools when asking about customers).
The Pattern: Embedding-Based Selection
The production app uses the toolpick library with OpenAI embeddings:
// chat/tools.ts
import { createToolIndex, fileCache, type ToolIndex } from "toolpick";
import { openai } from "@ai-sdk/openai";
import type { PrepareStepFunction } from "ai";
let cachedIndex: ToolIndex | null = null;
export async function ensureToolIndex(ctx: McpContext) {
if (cachedIndex) return cachedIndex;
// Step 1: Get all tool definitions from MCP
const toolDefinitions = await getMcpToolDefinitions();
// Step 2: Embed tool descriptions (one-time cost)
const index = await createToolIndex(toolDefinitions, {
embeddingModel: openai.embeddingModel("text-embedding-3-small"),
// Cache embeddings to disk; reuse across restarts
embeddingCache: fileCache(".toolpick-cache.json"),
// Cross-domain dependency graph: when tool A is selected,
// its related tools are pre-loaded even if embeddings wouldn't
// have picked them. Prevents mid-workflow discovery gaps.
relatedTools: {
invoices_create: ["customers_list"], // Need customer to invoice
invoices_create_from_tracker: ["customers_list"],
invoices_recurring_create: ["customers_list"],
tracker_timer_start: ["tracker_projects_list"], // Need project to track time
tracker_entries_create: ["tracker_projects_list"],
tracker_entries_list: ["tracker_projects_list"],
tracker_projects_list: ["tracker_entries_list"],
transactions_update: ["categories_list"], // Need category to categorize
},
});
// Step 3: Warm up (fetch embeddings)
await index.warmUp();
cachedIndex = index;
return index;
}
export function buildPrepareStep(options: {
maxTools: number;
alwaysActive?: string[];
}): PrepareStepFunction {
if (!cachedIndex) {
throw new Error("Tool index not initialized");
}
const base = cachedIndex.prepareStep({ maxTools: options.maxTools });
const always = options.alwaysActive ?? [];
return async (stepOptions: any) => {
// Let toolpick select top N tools by cosine similarity
const step = await base(stepOptions);
// Append always-active tools (they don't get filtered by embeddings)
if (step?.activeTools) {
for (const name of always) {
if (!step.activeTools.includes(name)) {
step.activeTools.push(name);
}
}
}
return step;
};
}How it works:
- At startup,
createToolIndexembeds all tool descriptions using OpenAI embeddings. - Embeddings are cached (
.toolpick-cache.json); subsequent restarts use cached values. - When a user message arrives,
prepareStepis called. toolpickcomputes the similarity between the user's message and all cached embeddings.- Top-N tools (e.g., 12) are selected and passed to the model.
- Always-active tools (web_search, search_tools, meta-tools) are always appended.
Integration with ToolLoopAgent
// chat/assistant-runtime.ts
import { ToolLoopAgent } from "ai";
import { openai } from "@ai-sdk/openai";
export async function streamAssistant(params: {
systemPrompt: string;
messages: ModelMessage[];
tools: Record<string, Tool>;
ctx: McpContext;
}) {
// Ensure tool index is warm
await ensureToolIndex(params.ctx);
// Get all tools (needed for execution)
const allTools = params.tools;
// Build the prepareStep hook
const prepareStep = buildPrepareStep({
maxTools: 12,
// Always expose critical discovery tools
alwaysActive: ["web_search", "search_tools", "composio_search_tools", "composio_multi_execute"],
});
const agent = new ToolLoopAgent({
model: openai("gpt-4o-mini"),
instructions: params.systemPrompt,
tools: allTools,
prepareStep, // ← Filter tools per turn
stopWhen: stepCountIs(10),
});
return agent.stream({
messages: params.messages,
experimental_transform: smoothStream(),
});
}What happens:
- User sends a message.
- Model receives system prompt + message + the ~12 most relevant tools (selected by prepareStep).
- Model reads tool descriptions and decides which (if any) to call.
- If model calls a tool, framework executes it.
- Result appended to conversation; loop continues.
- At the next turn,
prepareStepre-runs with the updated conversation; different tools may be selected.
Why relatedTools Matters
Without relatedTools, the agent hits a common failure mode: it starts creating an invoice, then discovers mid-workflow that it needs a customer ID but customers_list wasn't in the top-12 selection. It either halts or wastes a step calling search_tools.
The relatedTools map is a dependency injection for multi-step workflows. When invoices_create is selected by embeddings, customers_list is automatically pre-loaded — even if the user's message ("create an invoice for $500") has zero semantic similarity to "list customers."
Guidelines for building the map:
- Map write tools to the read tools they depend on (create invoice → list customers)
- Map bidirectional relationships where either side needs the other (
tracker_projects_list↔tracker_entries_listabove — listing projects often leads to listing entries, and vice versa) - Keep it minimal — only add dependencies you've observed agents needing in practice
- Don't add transitive dependencies (if A→B and B→C, don't add A→C unless agents actually need C when calling A)
Trade-offs
Pros
- Token savings: Tool list shrinks from ~500 tokens (50 tools) to ~120 tokens (12 tools) per request.
- Model clarity: Model focuses on relevant tools; less distraction.
- Discovery: Related tools are suggested via
relatedToolsconfig.
Cons
- Embedding latency: ~50–100ms per request (slight increase).
- Cold start: First request after server restart pays embedding cost (~2–5s).
- Coverage: If a tool isn't in the top 12, the model can't use it directly. Mitigation:
search_toolsmeta-tool lets model search for tools dynamically.
Using search_tools for Discovery
If the top-12 selection misses a tool, the model can call search_tools to find it:
server.registerTool(
"search_tools",
{
title: "Search Available Tools",
description: "Search for a tool by name or capability. Use this when you can't find the tool you need.",
inputSchema: z.object({
query: z.string().describe("What do you want to do? (e.g., 'list reports', 'send email', 'delete invoice')"),
}),
outputSchema: z.object({
tools: z.array(z.object({
name: z.string(),
description: z.string(),
})),
}),
},
async (params) => {
// Search tool index by keyword + similarity
const results = await cachedIndex.search(params.query, { maxResults: 5 });
return {
content: [{
type: "text",
text: JSON.stringify(results),
}],
structuredContent: { tools: results },
};
}
);Pattern:
- User asks for something the model doesn't immediately recognize.
- Model (before calling a tool) calls
search_toolswith the user's intent. search_toolsreturns matching tools.- Model picks the best match and uses it.
Example:
- User: "Send an email to alice@example.com"
- Model (not in top-12, doesn't see email tool) calls
search_toolswith "send email" search_toolsreturns[composio_gmail_send_message, composio_sendgrid_send_email]- Model calls one of them
Warm-Up and Lifecycle
// apps/api/src/index.ts (server startup)
import { ensureToolIndex } from "@api/chat/tools";
// On server startup, pre-warm the tool index
ensureToolIndex(createStubMcpContext()).catch((err) => {
logger.warn("Tool index warm-up failed (will retry on first request)", { error: err.message });
});
app.listen(3000, () => {
logger.info("Server started");
});What the warm-up does:
- Creates a stub MCP context.
- Calls
ensureToolIndexto trigger embedding computation. - If embeddings are cached, returns immediately (~10ms).
- If not cached, computes and saves to disk (~3–5s).
Result: First real user request gets instant semantic selection; no 3s latency on cold start.
Tuning maxTools
Different use cases have different optimal values:
// Simple assistant (few domains)
buildPrepareStep({ maxTools: 8, alwaysActive: ["web_search"] })
// Moderate (internal + external tools)
buildPrepareStep({ maxTools: 12, alwaysActive: ["web_search", "search_tools"] })
// Complex (internal + Composio + external + search)
buildPrepareStep({
maxTools: 15,
alwaysActive: ["web_search", "search_tools", "composio_search_tools", "composio_multi_execute"]
})Guidelines:
- Too low (5–8): Model misses relevant tools; forced to use search_tools repeatedly.
- Optimal (10–15): Balance between context efficiency and coverage.
- Too high (>20): Defeats the purpose; tokens saved are negligible.
Start at 12; measure token usage and model accuracy; adjust ±3 based on results.
Caching Strategy
The embedding cache is critical for performance:
// .toolpick-cache.json (auto-managed by fileCache)
{
"customers_list": [0.123, -0.456, ..., 0.789],
"customers_get": [0.234, -0.567, ..., 0.890],
// ... one vector per tool
}Key points:
- Cache is built once; reused across requests and server restarts.
- If you add/remove tools,
toolpickautomatically recomputes affected embeddings. - Cache is keyed by tool name; renaming a tool invalidates its embedding.
- For production, consider caching in Redis instead of disk:
import { createClient } from "redis";
const redis = createClient();
const embeddingCache = redisCache(redis, "toolpick:embeddings");
const index = await createToolIndex(toolDefinitions, {
embeddingModel: openai.embeddingModel("text-embedding-3-small"),
embeddingCache,
});Monitoring and Observability
Add logging inside buildPrepareStep (from the earlier code) to debug selection misses:
// Inside the returned function from buildPrepareStep, after `const step = await base(stepOptions)`:
logger.debug("[toolpick] Selected tools", {
userMessage: stepOptions.messages[stepOptions.messages.length - 1]?.content?.slice(0, 100),
selectedTools: step.activeTools,
count: step.activeTools.length,
});Metrics to track:
- Average tools selected per request.
- Frequency of
search_toolscalls (high frequency = selection misses). - Token usage before/after (should drop significantly).
- Model accuracy (did the model pick the right tool?).
Checklist
- Add
toolpicklibrary:npm install toolpick. - Create
ensureToolIndexwith embedding cache (file or Redis). - Build
buildPrepareStephook; append always-active tools. - Integrate with
ToolLoopAgentviaprepareStepoption. - Add
search_toolsmeta-tool for dynamic discovery. - Warm up tool index on server startup.
- Measure token usage; confirm significant savings.
- Monitor
search_toolscall frequency; adjustmaxToolsif too high. - Cache embeddings in Redis for production.
- Test tool selection with diverse user queries.
See Also
- Agentic Loop — prepareStep integration.
- External-App Routing — meta-tools for discovery.
- Naming and Descriptions — tool description best practices (embedding quality depends on description quality).
Platform-Agnostic Agent Core + Adapters
One agent core, N platform adapters. Use getPlatformInstructions() to inject formatting, not branching logic.
System Prompt as Configuration Layer
Compose prompts with buildSystemPrompt(context). Identity, safety, routing, formatting, platform rules — all in one immutable string.