Chunking Strategies
Fixed, semantic, late, contextual (Anthropic); overlap; evaluation
Chunking splits documents into embeddings-friendly units. Poor chunking (no context, no overlap) kills retrieval quality. Anthropic's Contextual Retrieval is the 2026 standard.
Summary
Chunking patterns:
- Fixed-size: 512–1024 tokens, 20% overlap. Baseline, predictable.
- Semantic: Split on boundaries (paragraph, section). Preserves structure.
- Late: Embed full doc, chunk after. Avoids boundary loss; expensive.
- Contextual (Anthropic): Prepend chunk-specific summary via Claude before embedding. 49–67% failure reduction.
2026 recommendation: Contextual Retrieval + 20% overlap.
Fixed-size chunking
function tokenize(text: string): string[] {
// Approximate: split on whitespace, estimate 1.3 tokens/word
return text.split(/\s+/);
}
function fixedChunk(text: string, chunkSize: number = 512, overlap: number = 0.2) {
const tokens = tokenize(text);
const overlapTokens = Math.floor(chunkSize * overlap);
const chunks: string[] = [];
for (let i = 0; i < tokens.length; i += chunkSize - overlapTokens) {
const chunk = tokens.slice(i, i + chunkSize).join(' ');
if (chunk.trim()) chunks.push(chunk);
}
return chunks;
}
const doc = 'Lorem ipsum dolor...';
const chunks = fixedChunk(doc, 512, 0.2);Semantic chunking
async function semanticChunk(text: string) {
// Split on paragraph boundaries
const paragraphs = text.split(/\n\n+/);
// Group paragraphs into chunks (target ~1000 tokens)
const chunks: string[] = [];
let current = '';
for (const para of paragraphs) {
const combined = current + '\n\n' + para;
if (estimateTokens(combined) > 1000) {
chunks.push(current);
current = para;
} else {
current = combined;
}
}
if (current) chunks.push(current);
return chunks;
}
function estimateTokens(text: string): number {
return Math.ceil(text.split(/\s+/).length * 1.3);
}Contextual Retrieval (Anthropic, 2026 STANDARD)
Prepend chunk context via Claude Haiku before embedding:
import Anthropic from '@anthropic-sdk/sdk';
import OpenAI from 'openai';
async function contextualEmbed(
chunk: string,
fullDoc: string,
chunkIndex: number
) {
const anthropic = new Anthropic();
// Generate chunk context via Haiku (fast, cheap)
const contextResponse = await anthropic.messages.create({
model: 'claude-3-5-haiku-20241022',
max_tokens: 150,
messages: [
{
role: 'user',
content: `Document (for context only):
${fullDoc.substring(0, 2000)}
Chunk #${chunkIndex}:
${chunk}
Generate a concise 1-2 sentence summary of what this chunk is about, referencing the document context (company, time period, domain, etc.).`,
},
],
});
const contextText =
contextResponse.content[0].type === 'text' ? contextResponse.content[0].text : '';
// Prepend context to chunk
const contextualChunk = `${contextText}\n\n${chunk}`;
// Embed the contextual text
const openai = new OpenAI();
const embedding = await openai.embeddings.create({
model: 'text-embedding-3-large',
input: contextualChunk,
});
return {
chunkIndex,
originalChunk: chunk,
contextualChunk,
embedding: embedding.data[0].embedding,
};
}Cost optimization with prompt caching:
// Cache the full document, reuse for multiple chunks
async function contextualEmbedWithCache(chunks: string[], fullDoc: string) {
const results = [];
for (let i = 0; i < chunks.length; i++) {
const cached = await anthropic.messages.create({
model: 'claude-3-5-haiku-20241022',
max_tokens: 150,
system_prompt: [
{
type: 'text',
text: 'You are a document analyst.',
},
{
type: 'text',
text: `Full Document:\n${fullDoc}`, // Cached after first use
cache_control: { type: 'ephemeral' },
},
],
messages: [
{
role: 'user',
content: `Chunk #${i}: ${chunks[i]}\n\nSummarize this chunk's context.`,
},
],
});
const contextText =
cached.content[0].type === 'text' ? cached.content[0].text : '';
// Embed (same process)
const embedding = await openai.embeddings.create({
model: 'text-embedding-3-large',
input: `${contextText}\n\n${chunks[i]}`,
});
results.push({ chunkIndex: i, embedding: embedding.data[0].embedding });
}
return results;
}Results: Contextual Retrieval reduces retrieval failures by 49% alone, 67% when combined with reranking.
Late chunking
Embed full document first, chunk after:
async function lateChunk(doc: string, chunkSize: number = 512) {
const openai = new OpenAI();
// Embed full document
const docEmbedding = await openai.embeddings.create({
model: 'text-embedding-3-large',
input: doc,
});
// Split into chunks
const tokens = tokenize(doc);
const chunks: string[] = [];
for (let i = 0; i < tokens.length; i += chunkSize) {
const chunk = tokens.slice(i, i + chunkSize).join(' ');
if (chunk.trim()) chunks.push(chunk);
}
// All chunks inherit document embedding (no re-embedding)
return chunks.map((chunk) => ({
chunk,
embedding: docEmbedding.data[0].embedding, // Reuse doc embedding
}));
}Pro: No context loss at boundaries
Con: Coarse granularity; one embedding per entire document
Chunking evaluation
async function evaluateChunkingStrategy(
strategy: 'fixed' | 'semantic' | 'contextual',
docs: any[],
queries: any[]
) {
let recall = 0;
let avgChunkCount = 0;
for (const doc of docs) {
let chunks: string[];
if (strategy === 'fixed') {
chunks = fixedChunk(doc.text, 512, 0.2);
} else if (strategy === 'semantic') {
chunks = await semanticChunk(doc.text);
} else {
chunks = await contextualChunk(doc.text);
}
avgChunkCount += chunks.length;
// Embed and store
const embeddings = await Promise.all(
chunks.map((chunk) =>
openai.embeddings.create({
model: 'text-embedding-3-large',
input: chunk,
})
)
);
// Test retrieval
for (const query of queries) {
const queryEmb = await openai.embeddings.create({
model: 'text-embedding-3-large',
input: query.text,
});
const relevant = embeddings.some((emb) =>
cosineSimilarity(queryEmb.data[0].embedding, emb.data[0].embedding) >
0.75
);
if (relevant) recall++;
}
}
console.log(`Strategy: ${strategy}`);
console.log(
`Recall: ${((recall / (docs.length * queries.length)) * 100).toFixed(2)}%`
);
console.log(
`Avg chunks per doc: ${(avgChunkCount / docs.length).toFixed(1)}`
);
}Anti-patterns
- No chunking at all. Embedding entire documents kills granularity.
- Chunks with no overlap. Context bridges lost; retrieval fails.
- Chunk size inconsistency. Mix fixed and semantic; results degrade.
- No overlap calculation. Should be 10–30% for most domains.
- Ignoring chunk metadata. Store source, page, section; filter later.