Multi-Turn Conversations: Building B2B Product AI That Remembers Context

Single-turn Q&A is the easy part. Learn how to architect stateful, multi-turn conversations for product AI — handling follow-ups, pronoun resolution, cart-building, and ambiguity across a complete buying session.

Axoverna Team
15 min read

Most product AI demos look the same: a single question, a single crisp answer. "What is the load capacity of the HD-500 shelf bracket?""The HD-500 supports up to 200 kg per bracket." Clean. Impressive. Done.

But real buyers don't work that way.

They start broad: "What shelf brackets do you have for heavy loads?" Then narrow: "Which of those come in stainless?" Then get specific: "How many do I need for a 2-meter shelf?" Then ask a follow-up: "And can I get those with same-week delivery?" And if the AI doesn't remember that the "those" refers to the M12-grade stainless bracket they landed on two turns ago — the conversation falls apart.

Multi-turn conversation handling is the gap between a product AI that demos well and one that actually closes sales. This article covers how to architect it properly.


Why Multi-Turn Is Hard (It's Not Just Context Windows)

The naive solution — dump the entire conversation history into the LLM context — breaks down faster than you'd expect, and for reasons beyond token limits.

The Retrieval Problem

In a single-turn RAG system, retrieval is simple: embed the user's query, find the nearest chunks, stuff them into the prompt. In a multi-turn system, the retrieval query is no longer obvious.

When a buyer types "What about the stainless version?", there is no standalone question here. To retrieve relevant chunks, you need to understand that "stainless version" refers to the shelf bracket from two turns back. The retrieval query should probably be "stainless steel shelf bracket HD-500 specifications" — but the raw message gives you none of that.

This is the coreference resolution problem, and it affects both retrieval and response quality.

Conversation State Drift

Over a long session, the buyer's intent may shift. They start looking at shelf brackets, discover they also need mounting hardware, then pivot to a different product category entirely. If your system treats the entire history as equally relevant, old context pollutes new retrieval. If it ignores history, it loses continuity.

You need selective state management — not just "remember everything" but "remember the right things at the right time."

Compounding Errors

In single-turn RAG, a retrieval miss produces one bad answer. In multi-turn, a retrieval miss in turn 3 can corrupt the context that's referenced in turns 4, 5, and 6. Errors compound.


The Architecture: Three Layers of State

A well-designed multi-turn product AI maintains state at three distinct levels:

┌─────────────────────────────────────────────┐
│  Layer 1: Conversation Buffer               │
│  (recent N turns, verbatim)                 │
├─────────────────────────────────────────────┤
│  Layer 2: Session Context                   │
│  (structured state: entities, intent, cart) │
├─────────────────────────────────────────────┤
│  Layer 3: Long-Term Knowledge               │
│  (your product catalog, RAG retrieval)      │
└─────────────────────────────────────────────┘

Let's walk through each layer.


Layer 1: Conversation Buffer

The simplest layer. Keep the last N turns of conversation verbatim and include them in every prompt. This handles the easy cases — follow-ups that reference the immediately preceding exchange.

interface Turn {
  role: 'user' | 'assistant'
  content: string
  timestamp: number
}
 
class ConversationBuffer {
  private turns: Turn[] = []
  private maxTurns: number
 
  constructor(maxTurns = 6) {
    this.maxTurns = maxTurns
  }
 
  add(role: Turn['role'], content: string) {
    this.turns.push({ role, content, timestamp: Date.now() })
    if (this.turns.length > this.maxTurns * 2) {
      // maxTurns pairs (user + assistant)
      this.turns = this.turns.slice(-this.maxTurns * 2)
    }
  }
 
  format(): string {
    return this.turns
      .map((t) => `${t.role === 'user' ? 'Buyer' : 'Assistant'}: ${t.content}`)
      .join('\n')
  }
}

What to keep in the buffer: 4–8 turns is usually enough for pronoun resolution and immediate follow-ups. Beyond that, you're adding tokens without proportional benefit — and older turns may actively mislead retrieval.

What NOT to do: Don't feed the entire session history to every retrieval step. The buyer's first question about shelf brackets shouldn't pollute retrieval for their later questions about cable management.


Layer 2: Session Context (The Crucial Layer)

This is where most implementations fall short. The conversation buffer handles what was said; the session context tracks what matters about the session in structured form.

interface SessionContext {
  // Products currently in focus
  focusedProducts: ProductRef[]
 
  // Attributes the buyer has established as requirements
  activeFilters: {
    material?: string[]
    category?: string
    priceRange?: { min?: number; max?: number }
    deliveryRequirement?: string
    quantity?: number
  }
 
  // Buyer's stated intent / stage
  intent: 'exploring' | 'comparing' | 'specifying' | 'ready-to-order'
 
  // Items added to a notional cart or shortlist
  shortlist: ProductRef[]
 
  // Open questions / ambiguities we need to resolve
  pendingClarifications: string[]
}

You update this context after every turn using a lightweight extraction step — either a small LLM call or structured parsing of the response.

Extracting Entities from Each Turn

After generating a response, run a quick extraction pass to update session context:

async function extractSessionUpdates(
  userMessage: string,
  assistantResponse: string,
  currentContext: SessionContext
): Promise<Partial<SessionContext>> {
  const prompt = `
You are extracting structured information from a B2B product conversation turn.
 
Current session context: ${JSON.stringify(currentContext)}
 
Latest exchange:
Buyer: ${userMessage}
Assistant: ${assistantResponse}
 
Extract any updates to the session context. Return a JSON object with only the fields that changed:
- focusedProducts: products now in focus (with id and name)
- activeFilters: requirements the buyer stated or confirmed
- intent: buyer's current stage (exploring/comparing/specifying/ready-to-order)
- shortlist: products the buyer expressed interest in keeping
- pendingClarifications: unresolved ambiguities we should address
 
Return only changed fields. Return null for no changes.
`
 
  const result = await llm.complete({ user: prompt, format: 'json' })
  return result ?? {}
}

This structured context is what enables intent-aware retrieval — instead of using just the raw user message to query the vector store, you augment it with what you know about the session.


Solving the Retrieval Problem: Query Rewriting

The core technique for multi-turn RAG retrieval is query rewriting — transforming the raw user message into a standalone, context-aware query before hitting the vector store.

async function rewriteQuery(
  userMessage: string,
  buffer: ConversationBuffer,
  sessionContext: SessionContext
): Promise<string> {
  const prompt = `
You are preparing a search query for a B2B product catalog.
 
Recent conversation:
${buffer.format()}
 
Session context:
- Currently discussing: ${sessionContext.focusedProducts.map((p) => p.name).join(', ') || 'nothing specific'}
- Active requirements: ${JSON.stringify(sessionContext.activeFilters)}
 
Latest user message: "${userMessage}"
 
Rewrite this message as a complete, standalone search query that includes all relevant context.
The query will be used to search a product catalog — be specific and include technical terms.
Return only the rewritten query, nothing else.
`
 
  return llm.complete({ user: prompt })
}

Example rewrites:

Raw messageRewritten query
"What about the stainless version?""stainless steel shelf bracket HD-500 specifications load capacity"
"How many do I need for 3 meters?""shelf bracket spacing requirements 3 meter shelf load calculation"
"Is there a cheaper option?""shelf brackets heavy duty lower cost alternative HD-500 substitute"
"Can those be powder coated?""shelf bracket powder coating finish options custom colors"

This single technique — rewriting the query before retrieval — is responsible for the majority of the quality improvement in multi-turn systems. It's also lightweight: a small, fast model can do this well without needing your most capable (and expensive) LLM.


Handling Product Focus: The "Those" Problem

In B2B product conversations, buyers frequently refer to products by pronoun or partial description: "those", "the ones you mentioned", "the cheaper variant", "that bracket".

Your system needs to resolve these references to actual product identifiers before retrieval. The session context's focusedProducts field handles this:

async function resolveProductRefs(
  userMessage: string,
  sessionContext: SessionContext
): Promise<ProductRef[]> {
  // If no pronouns/vague references, no resolution needed
  const vagueTerms = ['it', 'those', 'them', 'that', 'these', 'the ones', 'the same']
  const hasVagueTerm = vagueTerms.some((t) =>
    userMessage.toLowerCase().includes(t)
  )
 
  if (!hasVagueTerm || sessionContext.focusedProducts.length === 0) {
    return []
  }
 
  // Return currently focused products as the resolution
  return sessionContext.focusedProducts
}
 
async function buildRetrievalQuery(
  userMessage: string,
  sessionContext: SessionContext,
  buffer: ConversationBuffer
): Promise<{ query: string; filters?: Record<string, unknown> }> {
  const resolvedProducts = await resolveProductRefs(userMessage, sessionContext)
 
  // Build a hybrid retrieval request: semantic query + metadata filters
  const query = await rewriteQuery(userMessage, buffer, sessionContext)
 
  const filters: Record<string, unknown> = {}
 
  // If we've resolved to specific products, add metadata filter to anchor retrieval
  if (resolvedProducts.length > 0) {
    filters.productIds = resolvedProducts.map((p) => p.id)
  }
 
  // Apply any active category/attribute filters
  if (sessionContext.activeFilters.category) {
    filters.category = sessionContext.activeFilters.category
  }
 
  return { query, filters }
}

Combining semantic query rewriting with metadata filters (covered in our article on hybrid search strategies) gives you precise, context-aware retrieval that narrows to relevant products without losing the semantic flexibility of vector search.


Managing Intent Transitions

A buyer's intent changes throughout a session. Recognizing intent transitions lets you serve the right information at the right moment — and avoid answering exploration questions when the buyer is ready to order.

type BuyingIntent = 'exploring' | 'comparing' | 'specifying' | 'ready-to-order'
 
const INTENT_SIGNALS: Record<BuyingIntent, string[]> = {
  exploring: [
    'what do you have', 'show me', 'what options', 'I need', 'looking for',
  ],
  comparing: [
    'difference between', 'vs', 'compared to', 'which is better', 'pros and cons',
  ],
  specifying: [
    'load capacity', 'dimensions', 'material', 'certification', 'technical spec',
    'how many', 'how much', 'weight', 'rating',
  ],
  'ready-to-order': [
    'add to cart', 'order', 'buy', 'price', 'delivery', 'stock', 'availability',
    'lead time', 'minimum order',
  ],
}
 
function detectIntent(message: string): BuyingIntent | null {
  const lower = message.toLowerCase()
  for (const [intent, signals] of Object.entries(INTENT_SIGNALS)) {
    if (signals.some((s) => lower.includes(s))) {
      return intent as BuyingIntent
    }
  }
  return null
}

With intent detection, you can tailor both retrieval scope and response tone:

  • Exploring: Return broad category overviews, feature comparisons, product family summaries
  • Comparing: Surface spec comparison tables, differentiation points, trade-offs
  • Specifying: Retrieve detailed technical sheets, datasheets, compliance documents
  • Ready-to-order: Pull pricing, stock levels, lead times, MOQ — connect to live commerce data

This is what agentic product discovery looks like in practice: not just answering questions, but understanding where the buyer is in their journey and serving appropriately.


Building the Shortlist: The Persistent Cart

For B2B buyers, a conversation often ends with a shortlist — a set of products to quote or order. Tracking this across the session creates genuine value that single-turn systems can't provide.

class BuyerShortlist {
  private items: Map<string, { product: ProductRef; quantity?: number; note?: string }> = new Map()
 
  add(product: ProductRef, quantity?: number) {
    this.items.set(product.id, { product, quantity })
  }
 
  remove(productId: string) {
    this.items.delete(productId)
  }
 
  updateQuantity(productId: string, quantity: number) {
    const item = this.items.get(productId)
    if (item) item.quantity = quantity
  }
 
  summarize(): string {
    if (this.items.size === 0) return 'No items in shortlist.'
    return Array.from(this.items.values())
      .map((item) => {
        const qty = item.quantity ? ` × ${item.quantity}` : ''
        return `- ${item.product.name} (${item.product.id})${qty}`
      })
      .join('\n')
  }
 
  toOrderPayload() {
    return Array.from(this.items.values()).map(({ product, quantity }) => ({
      productId: product.id,
      quantity: quantity ?? 1,
    }))
  }
}

The assistant can maintain this shortlist across the conversation and surface it when appropriate:

"You've got the HD-500 Stainless (×12) and the M8 Wall Anchor Kit (×2) in your shortlist. Want me to check availability and put together a quote?"

This transforms a product information conversation into a buying workflow — which is exactly where AI chat creates measurable ROI for distributors and wholesalers. For a deeper look at the business case, see our analysis of hidden costs of unanswered product questions.


Handling Ambiguity Gracefully

Multi-turn conversations surface more ambiguity than single-turn interactions. When the system can't confidently resolve a reference or intent, the right response is a targeted clarifying question — not a guess.

async function detectAmbiguity(
  query: string,
  sessionContext: SessionContext,
  retrievedChunks: Chunk[]
): Promise<string | null> {
  // Multiple highly-scored products → ambiguous reference
  if (sessionContext.focusedProducts.length > 3) {
    return `I want to make sure I'm answering about the right product. Are you asking about ${
      sessionContext.focusedProducts.slice(0, 3).map((p) => p.name).join(', ')
    }, or a different item?`
  }
 
  // No focused products + pronoun → unclear reference
  const hasVagueTerm = ['it', 'those', 'them', 'that'].some((t) =>
    query.toLowerCase().includes(t)
  )
  if (hasVagueTerm && sessionContext.focusedProducts.length === 0) {
    return `Could you clarify which product you're referring to? I want to make sure I pull the right specifications.`
  }
 
  // Retrieval returned low-confidence chunks
  if (retrievedChunks.every((c) => c.score < 0.6)) {
    return `I don't have detailed information on that specific aspect in my current catalog. Could you give me more detail, or would you like me to flag this for our technical team?`
  }
 
  return null  // No ambiguity detected
}

Good clarifying questions are:

  • Specific: Reference what you actually don't know, not generic "can you clarify?"
  • Brief: One sentence, not a list of sub-questions
  • Grounded: Offer candidate answers where possible ("Are you asking about the M8 or M12 version?")

Prompt Architecture for Multi-Turn

With all these pieces in place, here's how the final prompt is assembled:

async function buildPrompt(
  userMessage: string,
  buffer: ConversationBuffer,
  sessionContext: SessionContext,
  retrievedChunks: Chunk[]
): Promise<{ system: string; user: string }> {
  const context = retrievedChunks
    .map((c) => `[${c.source}]\n${c.text}`)
    .join('\n\n---\n\n')
 
  const system = `
You are a B2B product specialist for a wholesale distributor. 
You help buyers find the right products, answer technical questions, and build orders.
 
Current session state:
- Products in focus: ${sessionContext.focusedProducts.map((p) => p.name).join(', ') || 'none'}
- Active requirements: ${JSON.stringify(sessionContext.activeFilters)}
- Buyer stage: ${sessionContext.intent}
- Shortlist: ${sessionContext.shortlist.length} items
 
Be specific, accurate, and use technical language appropriate for B2B buyers.
If you're uncertain, say so. Never invent specifications.
When the buyer seems ready to order, offer to summarize their shortlist and check availability.
`
 
  const user = `
Recent conversation:
${buffer.format()}
 
Product information:
${context}
 
Buyer's latest message: ${userMessage}
`
 
  return { system, user }
}

Notice how the session context is injected directly into the system prompt — not buried in the user turn. This keeps the model's behavior consistent regardless of where the conversation is in its history.


Testing Multi-Turn Systems

Single-turn RAG testing is relatively straightforward. Multi-turn testing requires evaluating sequences of interactions, which introduces new complexity.

Build test scenarios as conversation scripts:

const testScenario = {
  name: 'Buyer narrows from category to specific SKU',
  turns: [
    { user: 'I need heavy-duty shelf brackets for a warehouse racking system' },
    { user: 'Which of those are rated for over 150kg per bracket?' },
    { user: 'Do the HD-500 come in stainless?' },
    { user: 'How many would I need for a 4-meter shelf?' },
    { user: 'What\'s the minimum order quantity?' },
  ],
  assertions: [
    // Turn 2: "those" resolves to shelf brackets from turn 1
    { turnIndex: 1, check: 'response mentions HD-series or similar heavy-duty brackets' },
    // Turn 3: "HD-500" is now in focus
    { turnIndex: 2, check: 'response addresses stainless steel availability for HD-500' },
    // Turn 4: calculation uses shelf bracket context, not generic
    { turnIndex: 3, check: 'response gives bracket count recommendation for 4 meters' },
    // Turn 5: MOQ is for HD-500, not a generic answer
    { turnIndex: 4, check: 'response gives MOQ specific to HD-500 stainless' },
  ],
}

Run these scenario tests against your system and measure how often the context resolution succeeds end-to-end. A 5-turn scenario where step 3 fails means steps 4 and 5 are also wrong — waterfall failure tracking gives you a more honest picture of quality than per-turn accuracy.


The Business Case: Why Multi-Turn Matters for B2B

The ROI case for AI product assistants usually starts with support deflection. But multi-turn conversation quality is the difference between a novelty and a real sales tool.

Consider two scenarios:

Single-turn system: Buyer asks a question, gets an answer. Asks another question, gets another answer. After 5 questions, they're no closer to an order because every answer is context-free. They call a sales rep.

Multi-turn system: Buyer asks a question. The system maintains context, narrows the product set across turns, builds a shortlist, and surfaces a quote workflow when the buyer is ready. The AI becomes a 24/7 inside sales assistant.

For wholesalers and distributors with thousands of SKUs and buyers who work outside business hours, this is meaningful. The conversational commerce shift in B2B is precisely this — moving from reactive Q&A to a guided buying experience that doesn't require a human on the other end.


Summary: The Multi-Turn Stack

ComponentPurposeKey technique
Conversation bufferHandle immediate follow-upsLast 4–8 turns verbatim
Session contextTrack products, filters, intentStructured extraction after each turn
Query rewriterFix retrieval for vague messagesLLM rewrites raw message to standalone query
Product ref resolverHandle pronoun referencesMatch to focused products in session context
Intent detectorAdapt response to buying stageSignal-based classification
Shortlist trackerConvert discovery to orderPersist across session
Ambiguity handlerPrevent confident wrong answersTargeted clarifying questions

Each layer addresses a distinct failure mode. The conversation buffer alone gets you 60% of the way there. Adding query rewriting and session context gets you to 90%. The shortlist and intent-aware responses are what separate product AI that closes deals from product AI that just answers questions.


Ready to Build a Product AI That Thinks in Sessions?

Axoverna handles multi-turn conversation architecture out of the box — query rewriting, session context, product focus tracking, and shortlist management are all part of the platform. Your team ships a better buying experience without building the conversational infrastructure from scratch.

Book a demo to see multi-turn product discovery in action with your own catalog, or start a free trial and experience it firsthand.

Ready to get started?

Turn your product catalog into an AI knowledge base

Axoverna ingests your product data, builds a semantic search index, and gives you an embeddable chat widget — in minutes, not months.