Query Intent Classification: The Hidden Layer That Makes B2B Product AI Actually Work

Most product AI systems treat all queries the same. The ones that actually work don't. Here's how pre-retrieval intent classification — routing, entity extraction, and query decomposition — separates mediocre product AI from genuinely useful ones.

Axoverna Team
13 min read

There's a moment every product AI deployment eventually hits. The system launches, the early demos look impressive, the first few hundred queries get answered well. Then a customer types something like:

"I need something like the XR-40 but rated for higher temperatures — what are my options, and can any of them ship this week?"

And the AI stumbles. It retrieves the XR-40 product page, generates a vague answer about temperature ratings, and completely misses the "what are my options" framing and the "ship this week" urgency signal.

The query wasn't hard. A competent sales engineer would have parsed it in two seconds: the buyer wants an alternative product recommendation, filtered by a technical attribute (temperature rating), with a stock and lead time check as a secondary requirement. That's a different retrieval task than a simple specification lookup — and it deserves a different retrieval strategy.

The gap between your demo and production isn't the language model. It's the absence of a query understanding layer: the logic that lives before retrieval and determines what kind of question this actually is.

This article is a practical guide to building that layer.


Why a Single Retrieval Strategy Fails

Most RAG pipelines are built around a single retrieval path: embed the query, find the closest chunks, pass them to the LLM. This works well enough for a specific, narrow class of queries — ones where the answer is basically contained in a single document chunk.

B2B product queries are rarely like that.

Consider the range of intent behind questions a real buyer might ask a product AI:

QueryWhat they actually want
"What's the pressure rating of the V200 valve?"Lookup: a specific attribute from a specific product
"What's the difference between the V200 and V300?"Comparison: attributes from two products, side-by-side
"I need a valve for 100 bar, 200°C, and DN50 pipe"Configuration: filter by spec requirements to find candidates
"My V200 is leaking at the packing gland — why?"Troubleshooting: diagnostic reasoning over technical documents
"Do you have anything like the Swagelok SS-43GS4?"Cross-reference: competitor part matching to your catalog
"What accessories do I need to install the V200?"Expansion: related products, accessories, kits
"What's the lead time on V200 right now?"Availability: inventory and fulfillment data, not product specs

Each of these requires a meaningfully different retrieval approach. A lookup wants a single accurate chunk. A comparison needs chunks from multiple products, structured for side-by-side synthesis. A configuration query should trigger metadata filtering against a spec database before any semantic search. A troubleshooting query should prioritize installation guides and service bulletins over product pages.

If you push all seven through the same embedding-and-retrieve pipeline, you'll get mediocre results for most of them. Classification — identifying the intent before you retrieve — lets you route each query to the strategy that actually handles it well.


The Four Components of Query Understanding

A complete query understanding layer does four things before a single vector search is executed:

1. Intent Classification

The coarsest and most important step. Map the query into one of your defined intent categories. The taxonomy depends on your domain, but a practical starting set for B2B product AI typically includes:

  • Lookup — retrieve a specific attribute or specification
  • Compare — evaluate two or more products across shared attributes
  • Configure / Filter — find candidates matching a set of technical requirements
  • Troubleshoot — diagnose a problem using technical or service documentation
  • Cross-reference — match a competitor or legacy part number to your catalog
  • Availability — check stock, lead times, pricing
  • Expand — find related products, accessories, or substitutes
  • General / Ambiguous — unclear intent, needs clarification or fallback

You can implement classification with a lightweight prompt sent to the LLM itself before retrieval:

You are classifying a user query for a B2B product knowledge system.
Classify the following query into exactly one of these intent categories:
lookup | compare | configure | troubleshoot | cross_reference | availability | expand | general

Query: "I need something like the XR-40 but rated for higher temperatures"

Return JSON: {"intent": "<category>", "confidence": <0-1>}

This costs a tiny fraction of your total token budget — typically under 100 tokens — and pays dividends immediately. At high volumes, you can replace or augment this with a fine-tuned classifier to reduce latency, but an LLM-based classifier is perfectly adequate to start.

2. Entity Extraction

Once you know the intent, extract the structured entities: the nouns and attributes the query contains that your retrieval system can use precisely.

In B2B product domains, the relevant entity types typically include:

  • Product references — model codes, part numbers, SKUs (V200, XR-40, 304-SS-HEX-M10)
  • Technical attributes — specifications with units and values (200°C, 100 bar, DN50)
  • Product categories — classes of items (valve, bearing, connector, fitting)
  • Comparison targets — pairs or groups mentioned together
  • Competitor references — brand names or competitor model codes
  • Constraint modifiershigher than, at least, compatible with, rated for

Entity extraction enables two things downstream. First, it lets you run exact-match filters alongside semantic search — critical for part numbers and SKUs, where semantic similarity is meaningless (see our piece on hybrid search for B2B catalogs). Second, it grounds the LLM's synthesis in specific, extracted facts rather than leaving it to infer them from free text.

A simple extraction prompt:

Extract structured entities from this product query.

Query: "I need a stainless valve for 100 bar, 200°C, and DN50 pipe"

Return JSON:
{
  "product_references": [],
  "attributes": [
    {"name": "material", "value": "stainless"},
    {"name": "pressure", "value": "100", "unit": "bar"},
    {"name": "temperature", "value": "200", "unit": "°C"},
    {"name": "nominal_diameter", "value": "50", "unit": "DN"}
  ],
  "categories": ["valve"],
  "constraints": []
}

With these entities extracted, your retrieval layer can run a metadata filter on pressure >= 100 AND temperature >= 200 AND nominal_diameter = DN50 before doing semantic search — dramatically improving precision for specification-based queries. This is the kind of integration described in detail in our metadata filtering for RAG article.

3. Query Decomposition

Some queries contain multiple distinct sub-questions that are better answered separately and then assembled.

"I need something like the XR-40 but rated for higher temperatures — what are my options, and can any of them ship this week?"

There are three sub-tasks here:

  1. Find the XR-40's current temperature rating (lookup)
  2. Find alternatives with higher temperature ratings (configure)
  3. Check stock/lead time for the candidates found in step 2 (availability)

Running this as a single retrieval query almost always produces a poor answer. The semantic blend of "XR-40", "temperature rating", "options", and "ship this week" creates an incoherent embedding that doesn't match any chunk well.

Decomposition splits this into sequential retrieval steps — each with a clear scope and a clean query — and then assembles the results at synthesis time. The LLM orchestrates this naturally with a simple system prompt:

You are a product AI assistant. Before answering complex queries,
break them into sub-questions that can be answered independently.
Answer each sub-question using the provided retrieval tool.
Synthesize a final answer from the individual results.

This is the foundation of what we've previously called agentic RAG — though you don't need a full multi-tool agent architecture to implement basic decomposition. A structured prompt with two or three sequential retrieval calls handles most real-world cases.

4. Query Rewriting

Even after classification and entity extraction, the raw query itself is often not the best input for embedding-based retrieval. Buyers use colloquial language, abbreviations, pronouns referring to context from earlier in the conversation, and implicit assumptions.

Query rewriting takes the original query and transforms it into a form better suited to retrieval:

  • Expansion: add domain-specific synonyms (valvevalve, solenoid valve, control valve, shut-off valve)
  • Disambiguation: resolve pronouns from conversation history (that onethe V200 stainless valve mentioned above)
  • Normalization: standardize unit formats and abbreviations (3/4"19mm, DN20)
  • Hypothetical document embedding (HyDE): generate a hypothetical ideal answer and embed that for retrieval — often dramatically improving recall for vague or underspecified queries

HyDE is particularly powerful for B2B product AI. Instead of embedding the vague query "something corrosion-resistant for marine environments," you generate a short hypothetical product description matching that requirement — then retrieve based on the embedding of that. Products that match cluster much more tightly in embedding space around a hypothetical answer than around the user's original imprecise phrasing.


Routing: Connecting Intent to Retrieval Strategy

With intent classified and entities extracted, the routing logic determines which retrieval path each query follows.

A simplified routing table:

lookup        → semantic search over product chunks; return top-3
compare       → semantic search over 2+ product namespaces; structured diff output
configure     → metadata filter first, then semantic search on filtered subset
troubleshoot  → semantic search prioritizing service/installation document chunks
cross_ref     → BM25 exact match on part numbers; semantic fallback on description
availability  → skip RAG; call inventory/ERP API directly
expand        → graph traversal on product relationships; semantic search for accessories
general       → standard semantic search with clarification prompt if confidence < 0.6

The availability routing is worth highlighting. Inventory and pricing data should almost never go through your static RAG pipeline — it's dynamic data that goes stale within hours. Intent classification lets you intercept availability queries early and route them to a live API call instead, keeping your AI's answers accurate on the dimensions that matter most to purchase decisions.


Implementation: Keeping Latency Acceptable

The obvious concern with a query understanding layer is added latency. If each user message requires one or two LLM calls before retrieval even begins, the perceived response time climbs.

A few practical strategies:

Use a smaller model for classification. Intent classification and entity extraction are structured output tasks. A model like GPT-4o-mini or Claude Haiku handles them accurately at a fraction of the cost and latency of your main synthesis model. Running classification + extraction in 50-200ms total is achievable.

Parallelize when possible. Classification and a preliminary broad retrieval can run in parallel. You start a semantic search with the raw query while the classifier runs; once classification resolves, you either use those results (for lookup), refine them with filters (for configure), or discard and re-query (for cross-reference). Net additional latency: near zero for most intents.

Cache classification for repeated queries. Product AI in B2B environments sees heavy repetition — the same questions asked by many different users. A classification cache keyed on normalized query text eliminates the classifier overhead for common queries entirely.

Set a hard fallback. If classification takes longer than your timeout, fall back to standard semantic search. Never let the query understanding layer become a failure mode that breaks the whole system. Graceful degradation to a simpler pipeline is always the right call.


Signals to Monitor in Production

Once deployed, the query understanding layer generates its own rich signal about your product catalog's weaknesses:

  • Queries classified as general with low confidence — these are gaps in your intent taxonomy or ambiguous query patterns that need better handling
  • Extraction failures (no entities found in a configure query) — often indicates missing structured metadata in your catalog that buyers assume you have
  • Availability queries hitting RAG instead of a live API — indicates your routing hasn't been connected to your inventory system yet
  • Cross-reference queries with no matches — competitor part numbers your catalog doesn't cover, representing real lost sales opportunities

Review these weekly when you first launch. The query log is the most honest feedback your product catalog will ever get from the market. What people actually ask reveals what they actually need — and where your product data has gaps.

We covered the broader topic of using production signals to improve your RAG pipeline in our article on measuring ROI from B2B product AI. The short version: query logs are underutilized gold.


A Realistic Rollout Path

You don't need to build all four components at once. A sensible progression:

Week 1–2: Add intent classification only. Log the distribution of intents in your query traffic. This alone reveals which query types your current pipeline handles poorly — and gives you a prioritized roadmap.

Week 3–4: Add entity extraction and wire it to metadata filtering for configure-intent queries. This is usually the highest-ROI improvement for B2B catalogs because specification-based queries are both extremely common and extremely poorly handled by pure semantic search.

Week 5–6: Add availability routing — intercept availability-intent queries and route to your ERP or inventory API. Your AI stops confidently stating incorrect stock information.

Week 7–8: Add decomposition for multi-intent queries. Tackle the expand intent if cross-sell is a priority.

Query rewriting and HyDE can be layered in at any point — they tend to have the smallest incremental impact when the other components are already working, but meaningful impact on ambiguous or underspecified queries.


What This Looks Like in Practice

Here's a concrete example of the full pipeline processing a real B2B query:

Raw query: "We're replacing our old Fisher 667 actuators — what's compatible with our existing valve bodies, and do you have anything in stock?"

Classification result: {intent: "cross_reference", confidence: 0.82} — but with a secondary availability intent flagged

Extracted entities:

  • Product reference: Fisher 667 (competitor)
  • Action: replacement / compatibility check
  • Secondary intent: inventory check

Routing decision: Run BM25 exact match on Fisher 667 in your cross-reference database; fallback to semantic search on "actuator compatible with Fisher 667"; simultaneously call inventory API for any matched candidates

Query rewrite for semantic fallback: "spring-and-diaphragm actuator compatible with globe valve bodies, equivalent to Fisher 667 mounting standard"

Result: Two matching actuators found, both with stock availability pulled from ERP, synthesized into a response that directly addresses both the compatibility and availability questions.

A naive pipeline would have retrieved generic actuator documentation and generated an answer about what a Fisher 667 is. The query-understanding pipeline answers the actual question.


The Bottom Line

Query intent classification, entity extraction, decomposition, and rewriting are not advanced optimizations. For B2B product AI deployed in serious commercial contexts — where buyers expect accurate, actionable answers — they are the baseline. Without them, you're sending every query through a single pipeline that was designed for the easiest class of query you'll see.

The good news is that none of this is particularly hard to implement. An LLM you already have, a routing table you write once, and a metadata filter layer (as covered in our metadata filtering guide) get you 80% of the way there in a few weeks.

The product AI systems that earn buyer trust — the ones that feel like talking to a knowledgeable engineer rather than a search bar — are the ones that understand what a question is before they try to answer it.


See It In Action

Axoverna's product knowledge platform includes intent classification and entity extraction out of the box, pre-tuned for B2B catalog and specification queries. You bring the product data; we handle the query understanding layer.

Start a free trial and see how Axoverna handles the full range of queries your buyers actually ask — not just the easy ones.

Ready to get started?

Turn your product catalog into an AI knowledge base

Axoverna ingests your product data, builds a semantic search index, and gives you an embeddable chat widget — in minutes, not months.