Multi-Turn Conversations: Building B2B Product AI That Remembers Context

Single-turn Q&A is the easy part. Learn how to architect stateful, multi-turn conversations for product AI — handling follow-ups, pronoun resolution, cart-building, and ambiguity across a complete buying session.

Axoverna Team

March 8, 202615 min read

Most product AI demos look the same: a single question, a single crisp answer. "What is the load capacity of the HD-500 shelf bracket?" → "The HD-500 supports up to 200 kg per bracket." Clean. Impressive. Done.

But real buyers don't work that way.

They start broad: "What shelf brackets do you have for heavy loads?" Then narrow: "Which of those come in stainless?" Then get specific: "How many do I need for a 2-meter shelf?" Then ask a follow-up: "And can I get those with same-week delivery?" And if the AI doesn't remember that the "those" refers to the M12-grade stainless bracket they landed on two turns ago — the conversation falls apart.

Multi-turn conversation handling is the gap between a product AI that demos well and one that actually closes sales. This article covers how to architect it properly.

Why Multi-Turn Is Hard (It's Not Just Context Windows)

The naive solution — dump the entire conversation history into the LLM context — breaks down faster than you'd expect, and for reasons beyond token limits.

The Retrieval Problem

In a single-turn RAG system, retrieval is simple: embed the user's query, find the nearest chunks, stuff them into the prompt. In a multi-turn system, the retrieval query is no longer obvious.

When a buyer types "What about the stainless version?", there is no standalone question here. To retrieve relevant chunks, you need to understand that "stainless version" refers to the shelf bracket from two turns back. The retrieval query should probably be "stainless steel shelf bracket HD-500 specifications" — but the raw message gives you none of that.

This is the coreference resolution problem, and it affects both retrieval and response quality.

Conversation State Drift

Over a long session, the buyer's intent may shift. They start looking at shelf brackets, discover they also need mounting hardware, then pivot to a different product category entirely. If your system treats the entire history as equally relevant, old context pollutes new retrieval. If it ignores history, it loses continuity.

You need selective state management — not just "remember everything" but "remember the right things at the right time."

Compounding Errors

In single-turn RAG, a retrieval miss produces one bad answer. In multi-turn, a retrieval miss in turn 3 can corrupt the context that's referenced in turns 4, 5, and 6. Errors compound.

The Architecture: Three Layers of State

A well-designed multi-turn product AI maintains state at three distinct levels:

┌─────────────────────────────────────────────┐
│  Layer 1: Conversation Buffer               │
│  (recent N turns, verbatim)                 │
├─────────────────────────────────────────────┤
│  Layer 2: Session Context                   │
│  (structured state: entities, intent, cart) │
├─────────────────────────────────────────────┤
│  Layer 3: Long-Term Knowledge               │
│  (your product catalog, RAG retrieval)      │
└─────────────────────────────────────────────┘

Let's walk through each layer.

Layer 1: Conversation Buffer

The simplest layer. Keep the last N turns of conversation verbatim and include them in every prompt. This handles the easy cases — follow-ups that reference the immediately preceding exchange.

interface Turn {
  role: 'user' | 'assistant'
  content: string
  timestamp: number
}
 
class ConversationBuffer {
  private turns: Turn[] = []
  private maxTurns: number
 
  constructor(maxTurns = 6) {
    this.maxTurns = maxTurns
  }
 
  add(role: Turn['role'], content: string) {
    this.turns.push({ role, content, timestamp: Date.now() })
    if (this.turns.length > this.maxTurns * 2) {
      // maxTurns pairs (user + assistant)
      this.turns = this.turns.slice(-this.maxTurns * 2)
    }
  }
 
  format(): string {
    return this.turns
      .map((t) => `${t.role === 'user' ? 'Buyer' : 'Assistant'}: ${t.content}`)
      .join('\n')
  }
}

What to keep in the buffer: 4–8 turns is usually enough for pronoun resolution and immediate follow-ups. Beyond that, you're adding tokens without proportional benefit — and older turns may actively mislead retrieval.

What NOT to do: Don't feed the entire session history to every retrieval step. The buyer's first question about shelf brackets shouldn't pollute retrieval for their later questions about cable management.

Layer 2: Session Context (The Crucial Layer)

This is where most implementations fall short. The conversation buffer handles what was said; the session context tracks what matters about the session in structured form.

interface SessionContext {
  // Products currently in focus
  focusedProducts: ProductRef[]
 
  // Attributes the buyer has established as requirements
  activeFilters: {
    material?: string[]
    category?: string
    priceRange?: { min?: number; max?: number }
    deliveryRequirement?: string
    quantity?: number
  }
 
  // Buyer's stated intent / stage
  intent: 'exploring' | 'comparing' | 'specifying' | 'ready-to-order'
 
  // Items added to a notional cart or shortlist
  shortlist: ProductRef[]
 
  // Open questions / ambiguities we need to resolve
  pendingClarifications: string[]
}

You update this context after every turn using a lightweight extraction step — either a small LLM call or structured parsing of the response.

Extracting Entities from Each Turn

After generating a response, run a quick extraction pass to update session context:

async function extractSessionUpdates(
  userMessage: string,
  assistantResponse: string,
  currentContext: SessionContext
): Promise<Partial<SessionContext>> {
  const prompt = `
You are extracting structured information from a B2B product conversation turn.
 
Current session context: ${JSON.stringify(currentContext)}
 
Latest exchange:
Buyer: ${userMessage}
Assistant: ${assistantResponse}
 
Extract any updates to the session context. Return a JSON object with only the fields that changed:
- focusedProducts: products now in focus (with id and name)
- activeFilters: requirements the buyer stated or confirmed
- intent: buyer's current stage (exploring/comparing/specifying/ready-to-order)
- shortlist: products the buyer expressed interest in keeping
- pendingClarifications: unresolved ambiguities we should address
 
Return only changed fields. Return null for no changes.
`
 
  const result = await llm.complete({ user: prompt, format: 'json' })
  return result ?? {}
}

This structured context is what enables intent-aware retrieval — instead of using just the raw user message to query the vector store, you augment it with what you know about the session.

Solving the Retrieval Problem: Query Rewriting

The core technique for multi-turn RAG retrieval is query rewriting — transforming the raw user message into a standalone, context-aware query before hitting the vector store.

async function rewriteQuery(
  userMessage: string,
  buffer: ConversationBuffer,
  sessionContext: SessionContext
): Promise<string> {
  const prompt = `
You are preparing a search query for a B2B product catalog.
 
Recent conversation:
${buffer.format()}
 
Session context:
- Currently discussing: ${sessionContext.focusedProducts.map((p) => p.name).join(', ') || 'nothing specific'}
- Active requirements: ${JSON.stringify(sessionContext.activeFilters)}
 
Latest user message: "${userMessage}"
 
Rewrite this message as a complete, standalone search query that includes all relevant context.
The query will be used to search a product catalog — be specific and include technical terms.
Return only the rewritten query, nothing else.
`
 
  return llm.complete({ user: prompt })
}

Example rewrites:

Raw message	Rewritten query
"What about the stainless version?"	"stainless steel shelf bracket HD-500 specifications load capacity"
"How many do I need for 3 meters?"	"shelf bracket spacing requirements 3 meter shelf load calculation"
"Is there a cheaper option?"	"shelf brackets heavy duty lower cost alternative HD-500 substitute"
"Can those be powder coated?"	"shelf bracket powder coating finish options custom colors"

This single technique — rewriting the query before retrieval — is responsible for the majority of the quality improvement in multi-turn systems. It's also lightweight: a small, fast model can do this well without needing your most capable (and expensive) LLM.

Handling Product Focus: The "Those" Problem

In B2B product conversations, buyers frequently refer to products by pronoun or partial description: "those", "the ones you mentioned", "the cheaper variant", "that bracket".

Your system needs to resolve these references to actual product identifiers before retrieval. The session context's focusedProducts field handles this:

async function resolveProductRefs(
  userMessage: string,
  sessionContext: SessionContext
): Promise<ProductRef[]> {
  // If no pronouns/vague references, no resolution needed
  const vagueTerms = ['it', 'those', 'them', 'that', 'these', 'the ones', 'the same']
  const hasVagueTerm = vagueTerms.some((t) =>
    userMessage.toLowerCase().includes(t)
  )
 
  if (!hasVagueTerm || sessionContext.focusedProducts.length === 0) {
    return []
  }
 
  // Return currently focused products as the resolution
  return sessionContext.focusedProducts
}
 
async function buildRetrievalQuery(
  userMessage: string,
  sessionContext: SessionContext,
  buffer: ConversationBuffer
): Promise<{ query: string; filters?: Record<string, unknown> }> {
  const resolvedProducts = await resolveProductRefs(userMessage, sessionContext)
 
  // Build a hybrid retrieval request: semantic query + metadata filters
  const query = await rewriteQuery(userMessage, buffer, sessionContext)
 
  const filters: Record<string, unknown> = {}
 
  // If we've resolved to specific products, add metadata filter to anchor retrieval
  if (resolvedProducts.length > 0) {
    filters.productIds = resolvedProducts.map((p) => p.id)
  }
 
  // Apply any active category/attribute filters
  if (sessionContext.activeFilters.category) {
    filters.category = sessionContext.activeFilters.category
  }
 
  return { query, filters }
}

Combining semantic query rewriting with metadata filters (covered in our article on hybrid search strategies) gives you precise, context-aware retrieval that narrows to relevant products without losing the semantic flexibility of vector search.

Managing Intent Transitions

A buyer's intent changes throughout a session. Recognizing intent transitions lets you serve the right information at the right moment — and avoid answering exploration questions when the buyer is ready to order.

type BuyingIntent = 'exploring' | 'comparing' | 'specifying' | 'ready-to-order'
 
const INTENT_SIGNALS: Record<BuyingIntent, string[]> = {
  exploring: [
    'what do you have', 'show me', 'what options', 'I need', 'looking for',
  ],
  comparing: [
    'difference between', 'vs', 'compared to', 'which is better', 'pros and cons',
  ],
  specifying: [
    'load capacity', 'dimensions', 'material', 'certification', 'technical spec',
    'how many', 'how much', 'weight', 'rating',
  ],
  'ready-to-order': [
    'add to cart', 'order', 'buy', 'price', 'delivery', 'stock', 'availability',
    'lead time', 'minimum order',
  ],
}
 
function detectIntent(message: string): BuyingIntent | null {
  const lower = message.toLowerCase()
  for (const [intent, signals] of Object.entries(INTENT_SIGNALS)) {
    if (signals.some((s) => lower.includes(s))) {
      return intent as BuyingIntent
    }
  }
  return null
}

With intent detection, you can tailor both retrieval scope and response tone:

Exploring: Return broad category overviews, feature comparisons, product family summaries
Comparing: Surface spec comparison tables, differentiation points, trade-offs
Specifying: Retrieve detailed technical sheets, datasheets, compliance documents
Ready-to-order: Pull pricing, stock levels, lead times, MOQ — connect to live commerce data

This is what agentic product discovery looks like in practice: not just answering questions, but understanding where the buyer is in their journey and serving appropriately.

Building the Shortlist: The Persistent Cart

For B2B buyers, a conversation often ends with a shortlist — a set of products to quote or order. Tracking this across the session creates genuine value that single-turn systems can't provide.

class BuyerShortlist {
  private items: Map<string, { product: ProductRef; quantity?: number; note?: string }> = new Map()
 
  add(product: ProductRef, quantity?: number) {
    this.items.set(product.id, { product, quantity })
  }
 
  remove(productId: string) {
    this.items.delete(productId)
  }
 
  updateQuantity(productId: string, quantity: number) {
    const item = this.items.get(productId)
    if (item) item.quantity = quantity
  }
 
  summarize(): string {
    if (this.items.size === 0) return 'No items in shortlist.'
    return Array.from(this.items.values())
      .map((item) => {
        const qty = item.quantity ? ` × ${item.quantity}` : ''
        return `- ${item.product.name} (${item.product.id})${qty}`
      })
      .join('\n')
  }
 
  toOrderPayload() {
    return Array.from(this.items.values()).map(({ product, quantity }) => ({
      productId: product.id,
      quantity: quantity ?? 1,
    }))
  }
}

The assistant can maintain this shortlist across the conversation and surface it when appropriate:

"You've got the HD-500 Stainless (×12) and the M8 Wall Anchor Kit (×2) in your shortlist. Want me to check availability and put together a quote?"

This transforms a product information conversation into a buying workflow — which is exactly where AI chat creates measurable ROI for distributors and wholesalers. For a deeper look at the business case, see our analysis of hidden costs of unanswered product questions.

Handling Ambiguity Gracefully

Multi-turn conversations surface more ambiguity than single-turn interactions. When the system can't confidently resolve a reference or intent, the right response is a targeted clarifying question — not a guess.

async function detectAmbiguity(
  query: string,
  sessionContext: SessionContext,
  retrievedChunks: Chunk[]
): Promise<string | null> {
  // Multiple highly-scored products → ambiguous reference
  if (sessionContext.focusedProducts.length > 3) {
    return `I want to make sure I'm answering about the right product. Are you asking about ${
      sessionContext.focusedProducts.slice(0, 3).map((p) => p.name).join(', ')
    }, or a different item?`
  }
 
  // No focused products + pronoun → unclear reference
  const hasVagueTerm = ['it', 'those', 'them', 'that'].some((t) =>
    query.toLowerCase().includes(t)
  )
  if (hasVagueTerm && sessionContext.focusedProducts.length === 0) {
    return `Could you clarify which product you're referring to? I want to make sure I pull the right specifications.`
  }
 
  // Retrieval returned low-confidence chunks
  if (retrievedChunks.every((c) => c.score < 0.6)) {
    return `I don't have detailed information on that specific aspect in my current catalog. Could you give me more detail, or would you like me to flag this for our technical team?`
  }
 
  return null  // No ambiguity detected
}

Good clarifying questions are:

Specific: Reference what you actually don't know, not generic "can you clarify?"
Brief: One sentence, not a list of sub-questions
Grounded: Offer candidate answers where possible ("Are you asking about the M8 or M12 version?")

Prompt Architecture for Multi-Turn

With all these pieces in place, here's how the final prompt is assembled:

async function buildPrompt(
  userMessage: string,
  buffer: ConversationBuffer,
  sessionContext: SessionContext,
  retrievedChunks: Chunk[]
): Promise<{ system: string; user: string }> {
  const context = retrievedChunks
    .map((c) => `[${c.source}]\n${c.text}`)
    .join('\n\n---\n\n')
 
  const system = `
You are a B2B product specialist for a wholesale distributor. 
You help buyers find the right products, answer technical questions, and build orders.
 
Current session state:
- Products in focus: ${sessionContext.focusedProducts.map((p) => p.name).join(', ') || 'none'}
- Active requirements: ${JSON.stringify(sessionContext.activeFilters)}
- Buyer stage: ${sessionContext.intent}
- Shortlist: ${sessionContext.shortlist.length} items
 
Be specific, accurate, and use technical language appropriate for B2B buyers.
If you're uncertain, say so. Never invent specifications.
When the buyer seems ready to order, offer to summarize their shortlist and check availability.
`
 
  const user = `
Recent conversation:
${buffer.format()}
 
Product information:
${context}
 
Buyer's latest message: ${userMessage}
`
 
  return { system, user }
}

Notice how the session context is injected directly into the system prompt — not buried in the user turn. This keeps the model's behavior consistent regardless of where the conversation is in its history.

Testing Multi-Turn Systems

Single-turn RAG testing is relatively straightforward. Multi-turn testing requires evaluating sequences of interactions, which introduces new complexity.

Build test scenarios as conversation scripts:

const testScenario = {
  name: 'Buyer narrows from category to specific SKU',
  turns: [
    { user: 'I need heavy-duty shelf brackets for a warehouse racking system' },
    { user: 'Which of those are rated for over 150kg per bracket?' },
    { user: 'Do the HD-500 come in stainless?' },
    { user: 'How many would I need for a 4-meter shelf?' },
    { user: 'What\'s the minimum order quantity?' },
  ],
  assertions: [
    // Turn 2: "those" resolves to shelf brackets from turn 1
    { turnIndex: 1, check: 'response mentions HD-series or similar heavy-duty brackets' },
    // Turn 3: "HD-500" is now in focus
    { turnIndex: 2, check: 'response addresses stainless steel availability for HD-500' },
    // Turn 4: calculation uses shelf bracket context, not generic
    { turnIndex: 3, check: 'response gives bracket count recommendation for 4 meters' },
    // Turn 5: MOQ is for HD-500, not a generic answer
    { turnIndex: 4, check: 'response gives MOQ specific to HD-500 stainless' },
  ],
}

Run these scenario tests against your system and measure how often the context resolution succeeds end-to-end. A 5-turn scenario where step 3 fails means steps 4 and 5 are also wrong — waterfall failure tracking gives you a more honest picture of quality than per-turn accuracy.

The Business Case: Why Multi-Turn Matters for B2B

The ROI case for AI product assistants usually starts with support deflection. But multi-turn conversation quality is the difference between a novelty and a real sales tool.

Consider two scenarios:

Single-turn system: Buyer asks a question, gets an answer. Asks another question, gets another answer. After 5 questions, they're no closer to an order because every answer is context-free. They call a sales rep.

Multi-turn system: Buyer asks a question. The system maintains context, narrows the product set across turns, builds a shortlist, and surfaces a quote workflow when the buyer is ready. The AI becomes a 24/7 inside sales assistant.

For wholesalers and distributors with thousands of SKUs and buyers who work outside business hours, this is meaningful. The conversational commerce shift in B2B is precisely this — moving from reactive Q&A to a guided buying experience that doesn't require a human on the other end.

Summary: The Multi-Turn Stack

Component	Purpose	Key technique
Conversation buffer	Handle immediate follow-ups	Last 4–8 turns verbatim
Session context	Track products, filters, intent	Structured extraction after each turn
Query rewriter	Fix retrieval for vague messages	LLM rewrites raw message to standalone query
Product ref resolver	Handle pronoun references	Match to focused products in session context
Intent detector	Adapt response to buying stage	Signal-based classification
Shortlist tracker	Convert discovery to order	Persist across session
Ambiguity handler	Prevent confident wrong answers	Targeted clarifying questions

Each layer addresses a distinct failure mode. The conversation buffer alone gets you 60% of the way there. Adding query rewriting and session context gets you to 90%. The shortlist and intent-aware responses are what separate product AI that closes deals from product AI that just answers questions.

Ready to Build a Product AI That Thinks in Sessions?

Axoverna handles multi-turn conversation architecture out of the box — query rewriting, session context, product focus tracking, and shortlist management are all part of the platform. Your team ships a better buying experience without building the conversational infrastructure from scratch.

Book a demo to see multi-turn product discovery in action with your own catalog, or start a free trial and experience it firsthand.

Ready to get started?

Turn your product catalog into an AI knowledge base

Axoverna ingests your product data, builds a semantic search index, and gives you an embeddable chat widget — in minutes, not months.

Start free — no credit card required →Read the docs

Technical

BOM-Aware Product AI: How to Turn Part-Level Questions Into Procurement-Ready Answers

Most product AI systems answer one SKU at a time. B2B buyers work from assemblies, spare parts lists, and bills of materials. BOM-aware retrieval helps AI reason across sets of parts, dependencies, alternates, and order constraints so conversations lead to real purchasing decisions.

May 24, 202611 min read

Technical

Revenue-Weighted Evaluation for B2B Product AI: Why All Retrieval Errors Are Not Equal

Most B2B teams evaluate product AI with flat accuracy metrics. The better approach is to weight failures by commercial risk, so mistakes on high-value, high-complexity workflows get fixed before low-stakes browsing errors.

May 23, 202611 min read

Technical

How Conversation Mining Turns Product AI Into a Product Data Improvement Engine

Most B2B teams treat AI chat logs as support exhaust. The smarter move is to mine them for missing attributes, broken mappings, unclear terminology, and catalog blind spots, then feed those insights back into product data operations.

May 22, 202612 min read

Why Multi-Turn Is Hard (It's Not Just Context Windows)

The Retrieval Problem

Conversation State Drift

Compounding Errors

The Architecture: Three Layers of State

Layer 1: Conversation Buffer

Layer 2: Session Context (The Crucial Layer)

Extracting Entities from Each Turn

Solving the Retrieval Problem: Query Rewriting

Handling Product Focus: The "Those" Problem

Managing Intent Transitions

Building the Shortlist: The Persistent Cart

Handling Ambiguity Gracefully

Prompt Architecture for Multi-Turn

Testing Multi-Turn Systems

The Business Case: Why Multi-Turn Matters for B2B

Summary: The Multi-Turn Stack

Ready to Build a Product AI That Thinks in Sessions?

Turn your product catalog into an AI knowledge base

Related articles

BOM-Aware Product AI: How to Turn Part-Level Questions Into Procurement-Ready Answers

Revenue-Weighted Evaluation for B2B Product AI: Why All Retrieval Errors Are Not Equal

How Conversation Mining Turns Product AI Into a Product Data Improvement Engine