Multi-Turn Conversations: Building B2B Product AI That Remembers Context
Single-turn Q&A is the easy part. Learn how to architect stateful, multi-turn conversations for product AI — handling follow-ups, pronoun resolution, cart-building, and ambiguity across a complete buying session.
Most product AI demos look the same: a single question, a single crisp answer. "What is the load capacity of the HD-500 shelf bracket?" → "The HD-500 supports up to 200 kg per bracket." Clean. Impressive. Done.
But real buyers don't work that way.
They start broad: "What shelf brackets do you have for heavy loads?" Then narrow: "Which of those come in stainless?" Then get specific: "How many do I need for a 2-meter shelf?" Then ask a follow-up: "And can I get those with same-week delivery?" And if the AI doesn't remember that the "those" refers to the M12-grade stainless bracket they landed on two turns ago — the conversation falls apart.
Multi-turn conversation handling is the gap between a product AI that demos well and one that actually closes sales. This article covers how to architect it properly.
Why Multi-Turn Is Hard (It's Not Just Context Windows)
The naive solution — dump the entire conversation history into the LLM context — breaks down faster than you'd expect, and for reasons beyond token limits.
The Retrieval Problem
In a single-turn RAG system, retrieval is simple: embed the user's query, find the nearest chunks, stuff them into the prompt. In a multi-turn system, the retrieval query is no longer obvious.
When a buyer types "What about the stainless version?", there is no standalone question here. To retrieve relevant chunks, you need to understand that "stainless version" refers to the shelf bracket from two turns back. The retrieval query should probably be "stainless steel shelf bracket HD-500 specifications" — but the raw message gives you none of that.
This is the coreference resolution problem, and it affects both retrieval and response quality.
Conversation State Drift
Over a long session, the buyer's intent may shift. They start looking at shelf brackets, discover they also need mounting hardware, then pivot to a different product category entirely. If your system treats the entire history as equally relevant, old context pollutes new retrieval. If it ignores history, it loses continuity.
You need selective state management — not just "remember everything" but "remember the right things at the right time."
Compounding Errors
In single-turn RAG, a retrieval miss produces one bad answer. In multi-turn, a retrieval miss in turn 3 can corrupt the context that's referenced in turns 4, 5, and 6. Errors compound.
The Architecture: Three Layers of State
A well-designed multi-turn product AI maintains state at three distinct levels:
┌─────────────────────────────────────────────┐
│ Layer 1: Conversation Buffer │
│ (recent N turns, verbatim) │
├─────────────────────────────────────────────┤
│ Layer 2: Session Context │
│ (structured state: entities, intent, cart) │
├─────────────────────────────────────────────┤
│ Layer 3: Long-Term Knowledge │
│ (your product catalog, RAG retrieval) │
└─────────────────────────────────────────────┘
Let's walk through each layer.
Layer 1: Conversation Buffer
The simplest layer. Keep the last N turns of conversation verbatim and include them in every prompt. This handles the easy cases — follow-ups that reference the immediately preceding exchange.
interface Turn {
role: 'user' | 'assistant'
content: string
timestamp: number
}
class ConversationBuffer {
private turns: Turn[] = []
private maxTurns: number
constructor(maxTurns = 6) {
this.maxTurns = maxTurns
}
add(role: Turn['role'], content: string) {
this.turns.push({ role, content, timestamp: Date.now() })
if (this.turns.length > this.maxTurns * 2) {
// maxTurns pairs (user + assistant)
this.turns = this.turns.slice(-this.maxTurns * 2)
}
}
format(): string {
return this.turns
.map((t) => `${t.role === 'user' ? 'Buyer' : 'Assistant'}: ${t.content}`)
.join('\n')
}
}What to keep in the buffer: 4–8 turns is usually enough for pronoun resolution and immediate follow-ups. Beyond that, you're adding tokens without proportional benefit — and older turns may actively mislead retrieval.
What NOT to do: Don't feed the entire session history to every retrieval step. The buyer's first question about shelf brackets shouldn't pollute retrieval for their later questions about cable management.
Layer 2: Session Context (The Crucial Layer)
This is where most implementations fall short. The conversation buffer handles what was said; the session context tracks what matters about the session in structured form.
interface SessionContext {
// Products currently in focus
focusedProducts: ProductRef[]
// Attributes the buyer has established as requirements
activeFilters: {
material?: string[]
category?: string
priceRange?: { min?: number; max?: number }
deliveryRequirement?: string
quantity?: number
}
// Buyer's stated intent / stage
intent: 'exploring' | 'comparing' | 'specifying' | 'ready-to-order'
// Items added to a notional cart or shortlist
shortlist: ProductRef[]
// Open questions / ambiguities we need to resolve
pendingClarifications: string[]
}You update this context after every turn using a lightweight extraction step — either a small LLM call or structured parsing of the response.
Extracting Entities from Each Turn
After generating a response, run a quick extraction pass to update session context:
async function extractSessionUpdates(
userMessage: string,
assistantResponse: string,
currentContext: SessionContext
): Promise<Partial<SessionContext>> {
const prompt = `
You are extracting structured information from a B2B product conversation turn.
Current session context: ${JSON.stringify(currentContext)}
Latest exchange:
Buyer: ${userMessage}
Assistant: ${assistantResponse}
Extract any updates to the session context. Return a JSON object with only the fields that changed:
- focusedProducts: products now in focus (with id and name)
- activeFilters: requirements the buyer stated or confirmed
- intent: buyer's current stage (exploring/comparing/specifying/ready-to-order)
- shortlist: products the buyer expressed interest in keeping
- pendingClarifications: unresolved ambiguities we should address
Return only changed fields. Return null for no changes.
`
const result = await llm.complete({ user: prompt, format: 'json' })
return result ?? {}
}This structured context is what enables intent-aware retrieval — instead of using just the raw user message to query the vector store, you augment it with what you know about the session.
Solving the Retrieval Problem: Query Rewriting
The core technique for multi-turn RAG retrieval is query rewriting — transforming the raw user message into a standalone, context-aware query before hitting the vector store.
async function rewriteQuery(
userMessage: string,
buffer: ConversationBuffer,
sessionContext: SessionContext
): Promise<string> {
const prompt = `
You are preparing a search query for a B2B product catalog.
Recent conversation:
${buffer.format()}
Session context:
- Currently discussing: ${sessionContext.focusedProducts.map((p) => p.name).join(', ') || 'nothing specific'}
- Active requirements: ${JSON.stringify(sessionContext.activeFilters)}
Latest user message: "${userMessage}"
Rewrite this message as a complete, standalone search query that includes all relevant context.
The query will be used to search a product catalog — be specific and include technical terms.
Return only the rewritten query, nothing else.
`
return llm.complete({ user: prompt })
}Example rewrites:
| Raw message | Rewritten query |
|---|---|
| "What about the stainless version?" | "stainless steel shelf bracket HD-500 specifications load capacity" |
| "How many do I need for 3 meters?" | "shelf bracket spacing requirements 3 meter shelf load calculation" |
| "Is there a cheaper option?" | "shelf brackets heavy duty lower cost alternative HD-500 substitute" |
| "Can those be powder coated?" | "shelf bracket powder coating finish options custom colors" |
This single technique — rewriting the query before retrieval — is responsible for the majority of the quality improvement in multi-turn systems. It's also lightweight: a small, fast model can do this well without needing your most capable (and expensive) LLM.
Handling Product Focus: The "Those" Problem
In B2B product conversations, buyers frequently refer to products by pronoun or partial description: "those", "the ones you mentioned", "the cheaper variant", "that bracket".
Your system needs to resolve these references to actual product identifiers before retrieval. The session context's focusedProducts field handles this:
async function resolveProductRefs(
userMessage: string,
sessionContext: SessionContext
): Promise<ProductRef[]> {
// If no pronouns/vague references, no resolution needed
const vagueTerms = ['it', 'those', 'them', 'that', 'these', 'the ones', 'the same']
const hasVagueTerm = vagueTerms.some((t) =>
userMessage.toLowerCase().includes(t)
)
if (!hasVagueTerm || sessionContext.focusedProducts.length === 0) {
return []
}
// Return currently focused products as the resolution
return sessionContext.focusedProducts
}
async function buildRetrievalQuery(
userMessage: string,
sessionContext: SessionContext,
buffer: ConversationBuffer
): Promise<{ query: string; filters?: Record<string, unknown> }> {
const resolvedProducts = await resolveProductRefs(userMessage, sessionContext)
// Build a hybrid retrieval request: semantic query + metadata filters
const query = await rewriteQuery(userMessage, buffer, sessionContext)
const filters: Record<string, unknown> = {}
// If we've resolved to specific products, add metadata filter to anchor retrieval
if (resolvedProducts.length > 0) {
filters.productIds = resolvedProducts.map((p) => p.id)
}
// Apply any active category/attribute filters
if (sessionContext.activeFilters.category) {
filters.category = sessionContext.activeFilters.category
}
return { query, filters }
}Combining semantic query rewriting with metadata filters (covered in our article on hybrid search strategies) gives you precise, context-aware retrieval that narrows to relevant products without losing the semantic flexibility of vector search.
Managing Intent Transitions
A buyer's intent changes throughout a session. Recognizing intent transitions lets you serve the right information at the right moment — and avoid answering exploration questions when the buyer is ready to order.
type BuyingIntent = 'exploring' | 'comparing' | 'specifying' | 'ready-to-order'
const INTENT_SIGNALS: Record<BuyingIntent, string[]> = {
exploring: [
'what do you have', 'show me', 'what options', 'I need', 'looking for',
],
comparing: [
'difference between', 'vs', 'compared to', 'which is better', 'pros and cons',
],
specifying: [
'load capacity', 'dimensions', 'material', 'certification', 'technical spec',
'how many', 'how much', 'weight', 'rating',
],
'ready-to-order': [
'add to cart', 'order', 'buy', 'price', 'delivery', 'stock', 'availability',
'lead time', 'minimum order',
],
}
function detectIntent(message: string): BuyingIntent | null {
const lower = message.toLowerCase()
for (const [intent, signals] of Object.entries(INTENT_SIGNALS)) {
if (signals.some((s) => lower.includes(s))) {
return intent as BuyingIntent
}
}
return null
}With intent detection, you can tailor both retrieval scope and response tone:
- Exploring: Return broad category overviews, feature comparisons, product family summaries
- Comparing: Surface spec comparison tables, differentiation points, trade-offs
- Specifying: Retrieve detailed technical sheets, datasheets, compliance documents
- Ready-to-order: Pull pricing, stock levels, lead times, MOQ — connect to live commerce data
This is what agentic product discovery looks like in practice: not just answering questions, but understanding where the buyer is in their journey and serving appropriately.
Building the Shortlist: The Persistent Cart
For B2B buyers, a conversation often ends with a shortlist — a set of products to quote or order. Tracking this across the session creates genuine value that single-turn systems can't provide.
class BuyerShortlist {
private items: Map<string, { product: ProductRef; quantity?: number; note?: string }> = new Map()
add(product: ProductRef, quantity?: number) {
this.items.set(product.id, { product, quantity })
}
remove(productId: string) {
this.items.delete(productId)
}
updateQuantity(productId: string, quantity: number) {
const item = this.items.get(productId)
if (item) item.quantity = quantity
}
summarize(): string {
if (this.items.size === 0) return 'No items in shortlist.'
return Array.from(this.items.values())
.map((item) => {
const qty = item.quantity ? ` × ${item.quantity}` : ''
return `- ${item.product.name} (${item.product.id})${qty}`
})
.join('\n')
}
toOrderPayload() {
return Array.from(this.items.values()).map(({ product, quantity }) => ({
productId: product.id,
quantity: quantity ?? 1,
}))
}
}The assistant can maintain this shortlist across the conversation and surface it when appropriate:
"You've got the HD-500 Stainless (×12) and the M8 Wall Anchor Kit (×2) in your shortlist. Want me to check availability and put together a quote?"
This transforms a product information conversation into a buying workflow — which is exactly where AI chat creates measurable ROI for distributors and wholesalers. For a deeper look at the business case, see our analysis of hidden costs of unanswered product questions.
Handling Ambiguity Gracefully
Multi-turn conversations surface more ambiguity than single-turn interactions. When the system can't confidently resolve a reference or intent, the right response is a targeted clarifying question — not a guess.
async function detectAmbiguity(
query: string,
sessionContext: SessionContext,
retrievedChunks: Chunk[]
): Promise<string | null> {
// Multiple highly-scored products → ambiguous reference
if (sessionContext.focusedProducts.length > 3) {
return `I want to make sure I'm answering about the right product. Are you asking about ${
sessionContext.focusedProducts.slice(0, 3).map((p) => p.name).join(', ')
}, or a different item?`
}
// No focused products + pronoun → unclear reference
const hasVagueTerm = ['it', 'those', 'them', 'that'].some((t) =>
query.toLowerCase().includes(t)
)
if (hasVagueTerm && sessionContext.focusedProducts.length === 0) {
return `Could you clarify which product you're referring to? I want to make sure I pull the right specifications.`
}
// Retrieval returned low-confidence chunks
if (retrievedChunks.every((c) => c.score < 0.6)) {
return `I don't have detailed information on that specific aspect in my current catalog. Could you give me more detail, or would you like me to flag this for our technical team?`
}
return null // No ambiguity detected
}Good clarifying questions are:
- Specific: Reference what you actually don't know, not generic "can you clarify?"
- Brief: One sentence, not a list of sub-questions
- Grounded: Offer candidate answers where possible ("Are you asking about the M8 or M12 version?")
Prompt Architecture for Multi-Turn
With all these pieces in place, here's how the final prompt is assembled:
async function buildPrompt(
userMessage: string,
buffer: ConversationBuffer,
sessionContext: SessionContext,
retrievedChunks: Chunk[]
): Promise<{ system: string; user: string }> {
const context = retrievedChunks
.map((c) => `[${c.source}]\n${c.text}`)
.join('\n\n---\n\n')
const system = `
You are a B2B product specialist for a wholesale distributor.
You help buyers find the right products, answer technical questions, and build orders.
Current session state:
- Products in focus: ${sessionContext.focusedProducts.map((p) => p.name).join(', ') || 'none'}
- Active requirements: ${JSON.stringify(sessionContext.activeFilters)}
- Buyer stage: ${sessionContext.intent}
- Shortlist: ${sessionContext.shortlist.length} items
Be specific, accurate, and use technical language appropriate for B2B buyers.
If you're uncertain, say so. Never invent specifications.
When the buyer seems ready to order, offer to summarize their shortlist and check availability.
`
const user = `
Recent conversation:
${buffer.format()}
Product information:
${context}
Buyer's latest message: ${userMessage}
`
return { system, user }
}Notice how the session context is injected directly into the system prompt — not buried in the user turn. This keeps the model's behavior consistent regardless of where the conversation is in its history.
Testing Multi-Turn Systems
Single-turn RAG testing is relatively straightforward. Multi-turn testing requires evaluating sequences of interactions, which introduces new complexity.
Build test scenarios as conversation scripts:
const testScenario = {
name: 'Buyer narrows from category to specific SKU',
turns: [
{ user: 'I need heavy-duty shelf brackets for a warehouse racking system' },
{ user: 'Which of those are rated for over 150kg per bracket?' },
{ user: 'Do the HD-500 come in stainless?' },
{ user: 'How many would I need for a 4-meter shelf?' },
{ user: 'What\'s the minimum order quantity?' },
],
assertions: [
// Turn 2: "those" resolves to shelf brackets from turn 1
{ turnIndex: 1, check: 'response mentions HD-series or similar heavy-duty brackets' },
// Turn 3: "HD-500" is now in focus
{ turnIndex: 2, check: 'response addresses stainless steel availability for HD-500' },
// Turn 4: calculation uses shelf bracket context, not generic
{ turnIndex: 3, check: 'response gives bracket count recommendation for 4 meters' },
// Turn 5: MOQ is for HD-500, not a generic answer
{ turnIndex: 4, check: 'response gives MOQ specific to HD-500 stainless' },
],
}Run these scenario tests against your system and measure how often the context resolution succeeds end-to-end. A 5-turn scenario where step 3 fails means steps 4 and 5 are also wrong — waterfall failure tracking gives you a more honest picture of quality than per-turn accuracy.
The Business Case: Why Multi-Turn Matters for B2B
The ROI case for AI product assistants usually starts with support deflection. But multi-turn conversation quality is the difference between a novelty and a real sales tool.
Consider two scenarios:
Single-turn system: Buyer asks a question, gets an answer. Asks another question, gets another answer. After 5 questions, they're no closer to an order because every answer is context-free. They call a sales rep.
Multi-turn system: Buyer asks a question. The system maintains context, narrows the product set across turns, builds a shortlist, and surfaces a quote workflow when the buyer is ready. The AI becomes a 24/7 inside sales assistant.
For wholesalers and distributors with thousands of SKUs and buyers who work outside business hours, this is meaningful. The conversational commerce shift in B2B is precisely this — moving from reactive Q&A to a guided buying experience that doesn't require a human on the other end.
Summary: The Multi-Turn Stack
| Component | Purpose | Key technique |
|---|---|---|
| Conversation buffer | Handle immediate follow-ups | Last 4–8 turns verbatim |
| Session context | Track products, filters, intent | Structured extraction after each turn |
| Query rewriter | Fix retrieval for vague messages | LLM rewrites raw message to standalone query |
| Product ref resolver | Handle pronoun references | Match to focused products in session context |
| Intent detector | Adapt response to buying stage | Signal-based classification |
| Shortlist tracker | Convert discovery to order | Persist across session |
| Ambiguity handler | Prevent confident wrong answers | Targeted clarifying questions |
Each layer addresses a distinct failure mode. The conversation buffer alone gets you 60% of the way there. Adding query rewriting and session context gets you to 90%. The shortlist and intent-aware responses are what separate product AI that closes deals from product AI that just answers questions.
Ready to Build a Product AI That Thinks in Sessions?
Axoverna handles multi-turn conversation architecture out of the box — query rewriting, session context, product focus tracking, and shortlist management are all part of the platform. Your team ships a better buying experience without building the conversational infrastructure from scratch.
Book a demo to see multi-turn product discovery in action with your own catalog, or start a free trial and experience it firsthand.
Turn your product catalog into an AI knowledge base
Axoverna ingests your product data, builds a semantic search index, and gives you an embeddable chat widget — in minutes, not months.
Related articles
Why Session Memory Matters for Repeat B2B Buyers, and How to Design It Without Breaking Trust
The strongest B2B product AI systems do not treat every conversation like a cold start. They use session memory to preserve buyer context, speed up repeat interactions, and improve recommendation quality, while staying grounded in live product data and clear trust boundaries.
Unit Normalization in B2B Product AI: Why 1/2 Inch, DN15, and 15 mm Should Mean the Same Thing
B2B product AI breaks fast when dimensions, thread sizes, pack quantities, and engineering units are stored in inconsistent formats. Here is how to design unit normalization that improves retrieval, filtering, substitutions, and answer accuracy.
Source-Aware RAG: How to Combine PIM, PDFs, ERP, and Policy Content Without Conflicting Answers
Most product AI failures are not caused by weak models, but by mixing sources with different authority levels. Here is how B2B teams design source-aware RAG that keeps specs, availability, pricing rules, and policy answers aligned.