GraphRAG for B2B Product Catalogs: Unlocking Relationship Queries with Knowledge Graphs

Flat vector search answers 'what is this product?' brilliantly. It struggles with 'what works with this product?'. GraphRAG — combining knowledge graphs with semantic retrieval — is how leading B2B teams are solving product relationship queries at scale.

Axoverna Team
14 min read

There's a class of question that trips up almost every B2B product AI in the wild. Not "what are the specs of SKU 4822?" — flat vector search handles that fine. The hard questions look like this:

  • "What mounting brackets are compatible with the Series 7 linear actuator?"
  • "We discontinued part A2-70 last month — what's the approved substitute?"
  • "I need everything required to install this pump — what's the complete bill of materials?"
  • "Which of your valves are certified for both ATEX Zone 1 and operating temperatures below -20°C?"

These are relationship queries. They require traversing connections between products, not just retrieving descriptions of individual ones. And they're exactly the queries that matter most in B2B sales — the ones that prevent specification errors, reduce returns, and replace a 30-minute call with your sales engineer.

GraphRAG — the combination of knowledge graphs with retrieval-augmented generation — is the architectural pattern that solves this. Here's how it works and how to implement it for a real product catalog.


Why Flat Vector Search Fails for Relationship Queries

Before explaining GraphRAG, it's worth being precise about what breaks.

Standard RAG works by embedding your documents into a vector space, then at query time finding the chunks whose embeddings are closest to the query embedding. The chunks retrieved become context for the LLM.

This works beautifully for semantic similarity queries. "What is the tensile strength of the M12 hex bolt?" maps naturally to a product chunk that mentions tensile strength. The retrieval is correct.

The problem is that relationship information is often not co-located with the products it describes. Consider compatibility data:

  • Product A is compatible with Product B
  • Product B is compatible with Product C and Product D
  • Product D has been superseded by Product E

This relationship graph exists in your data somewhere — in compatibility matrices, spare parts tables, supersession records, cross-reference sheets. But it's fundamentally graph-shaped, not document-shaped. When you flatten it into text chunks and embed it, you lose the traversal capability that makes it useful.

If someone asks "what's compatible with product A?", your vector search may return the product A description chunk (not helpful) or a compatibility matrix that mentions product A somewhere (maybe helpful, but slow to parse and limited in scope). What you actually want is to walk the graph: find the node for product A, follow its compatibility edges, and retrieve the endpoint nodes.

The embedding model also doesn't distinguish between "Product A is compatible with Product B" and "Product A is incompatible with Product B" as well as you'd like — negation and directionality are notoriously tricky for cosine similarity.


What GraphRAG Actually Means

GraphRAG is not a single algorithm — it's a pattern with several concrete implementations. What they share is a structure where:

  1. A knowledge graph captures entities (products, categories, certifications, specs) and the typed relationships between them
  2. A vector index enables semantic search over node or chunk content
  3. A query layer combines graph traversal and semantic retrieval to answer questions that neither approach handles well alone

Microsoft's published GraphRAG framework focuses on text-derived knowledge graphs built from unstructured documents. For product AI, you have an advantage: your relationships are often already structured in your PIM, ERP, or data warehouse. You're not extracting a graph from text — you're exposing a graph that already exists in your data.

This makes the B2B product catalog use case one of the cleanest GraphRAG applications in enterprise software.


Mapping Your Product Catalog to a Knowledge Graph

The first step is deciding what the nodes and edges in your graph represent. For a typical B2B product catalog, the relevant entities and relationships fall into predictable patterns.

Nodes (Entities)

Node TypeExamples
ProductIndividual SKUs, product variants
Product FamilySeries, ranges, model lines
CategoryFasteners → Bolts → Hex Bolts
CertificationISO 4014, ATEX Zone 1, UL Listed, RoHS
MaterialA2 Stainless Steel, POM, PTFE
Attribute ValueThread pitch M12, Voltage rating 24V DC
DocumentDatasheets, installation guides, SDS sheets

Edges (Relationships)

RelationshipSource → TargetExample
COMPATIBLE_WITHProduct → ProductBolt → Nut (same thread standard)
SUPERSEDESProduct → ProductNew SKU → Discontinued SKU
BELONGS_TOProduct → CategorySKU → "Hex Bolts"
PART_OF_BOMProduct → AssemblyMounting bracket → Full kit
CERTIFIED_FORProduct → CertificationValve → ATEX Zone 1
MADE_FROMProduct → MaterialBolt → A2 Stainless Steel
DOCUMENTED_BYProduct → DocumentSKU → Datasheet PDF
CROSS_REFERENCESProduct → ProductYour SKU → Competitor SKU
TYPICALLY_BOUGHT_WITHProduct → ProductLearned from order history

This schema gives you a graph where you can answer questions like "find all products compatible with X that are also certified for Y" as a pure graph query — no LLM needed for the traversal, just for generating the natural-language answer.


Traversal Patterns for Common Query Types

Once you have a populated product knowledge graph, different query types map to different traversal patterns.

Compatibility Chain Traversal

"What accessories work with the Series 7 linear actuator?"

-- Neo4j Cypher
MATCH (p:Product {name: "Series 7 Linear Actuator"})
      -[:COMPATIBLE_WITH]->(accessory:Product)
RETURN accessory.sku, accessory.name, accessory.description
ORDER BY accessory.category

This retrieves first-order compatible products. For deeper traversal (accessories that work with accessories), add a variable-length path:

MATCH (p:Product {name: "Series 7 Linear Actuator"})
      -[:COMPATIBLE_WITH*1..2]->(related:Product)
RETURN DISTINCT related.sku, related.name

The LLM then receives these retrieved nodes as context to compose a natural-language answer — it's not doing the relationship traversal, it's narrating the results of it.

Substitution Resolution

"Part A2-70-SEAL is discontinued. What should I use instead?"

MATCH (discontinued:Product {sku: "A2-70-SEAL", status: "discontinued"})
      -[:SUPERSEDED_BY]->(replacement:Product)
RETURN replacement.sku, replacement.name, replacement.compatibility_notes

This is critical for distributors and wholesalers managing large, evolving catalogs. Supersession data is often maintained in ERPs as a structured table — it maps directly to SUPERSEDES edges in your graph.

Bill of Materials (BOM) Expansion

"What do I need to order to install the Type-B pump assembly?"

MATCH (assembly:Product {sku: "PUMP-TYPE-B"})
      -[:INCLUDES*1..3]->(component:Product)
RETURN component.sku, component.name, component.quantity_per_assembly
ORDER BY component.assembly_level

BOM traversal with depth limits prevents runaway queries on deeply nested assemblies while still capturing the relevant components.

Multi-Constraint Certification Filter

"Show me all valves rated for ATEX Zone 1 and below -20°C"

MATCH (p:Product)-[:BELONGS_TO]->(:Category {name: "Valves"})
MATCH (p)-[:CERTIFIED_FOR]->(:Certification {name: "ATEX Zone 1"})
WHERE p.operating_temp_min <= -20
RETURN p.sku, p.name, p.operating_temp_min, p.operating_temp_max

This combines graph traversal (the certification relationship) with property filtering (the temperature constraint) in a single query — something that would require multiple RAG retrievals with fragile merging logic if done with flat vector search alone.


Graph traversal answers relationship queries precisely. Semantic search answers open-ended, natural-language queries flexibly. GraphRAG uses both together, routing to each based on query type.

A practical implementation has three paths:

Path 1: Pure Graph Traversal

For queries where the entities are identified and the relationship is explicit. The query layer extracts the entity and relationship type, runs the Cypher/graph query, and returns structured results to the LLM for narration.

Triggers: "compatible with X", "substitute for Y", "what's in the BOM for Z", "what certifications does X have"

For queries that start with a known entity but want semantically related context. First traverse the graph to find related nodes, then run semantic search scoped to those nodes.

Example: "What should I know when installing the Series 7 actuator with the HA-200 bracket?"

  1. Graph query: find Series 7 and HA-200, confirm they're compatible, retrieve related document nodes
  2. Vector search scoped to those documents: find chunks about installation, clearances, torque specs
  3. LLM synthesizes the answer from retrieved chunks

This is much more precise than a cold semantic search over the whole catalog, which might return generic installation guidance for unrelated products.

For exploratory, undirected queries where no specific entity is identified. "What valves do you have for cryogenic applications?" runs as a standard semantic search — no graph traversal needed.

The query classifier determines the path. A simple classifier trained on a few hundred labeled queries works well for this. You can also route based on structural signals: queries containing known SKUs or product names get graph-assisted paths; open-ended queries get semantic search.

type QueryPath = 'graph' | 'graph_seeded' | 'semantic'
 
function classifyQueryPath(query: string, entities: Entity[]): QueryPath {
  const hasKnownEntity = entities.length > 0
  const hasRelationshipSignal = RELATIONSHIP_KEYWORDS.some(k => 
    query.toLowerCase().includes(k)
  )
  // "compatible with", "works with", "substitute for", "alternative to",
  // "what's in", "what do I need", "discontinued", "replaced by"
 
  if (hasKnownEntity && hasRelationshipSignal) return 'graph'
  if (hasKnownEntity && !hasRelationshipSignal) return 'graph_seeded'
  return 'semantic'
}

Building the Graph: Practical Approaches

Starting with Your Existing Data

Most B2B companies have relationship data scattered across systems — it's just not in a graph-optimized store. Common sources:

PIM systems: Cross-sell tables, accessory relationships, category hierarchies. Your PIM article integration (see our guide on integrating your PIM with a RAG pipeline) likely already extracts some of this — restructuring it as edges rather than prose chunks unlocks graph traversal.

ERP systems: Supersession records, BOM data, parts substitution tables. These are often the most authoritative source for SUPERSEDES and PART_OF_BOM edges.

Order history: Co-purchase data is a rich source of implicit compatibility signals. Products frequently ordered together are likely compatible. A simple apriori or co-occurrence analysis over your order history generates TYPICALLY_BOUGHT_WITH edges automatically.

Manufacturer documentation: Compatibility matrices in datasheets and spec sheets. These require extraction (sometimes manual, sometimes automated with an LLM), but the yield is high for complex product families.

Technology Options

For a production graph store, you have several choices:

Neo4j is the most mature purpose-built graph database, with excellent Cypher query language support, full-text and vector search built in (since 5.x), and good Python/Node.js drivers. It's the natural choice if the graph is a first-class part of your architecture.

PostgreSQL with Apache AGE or recursive CTEs works if you're already running Postgres and want to add graph capabilities incrementally. Recursive CTEs handle most traversal patterns; AGE provides native graph syntax. Avoids adding a new database tier.

Weaviate natively supports cross-references between objects, which maps well to product relationships. If you're using Weaviate as your vector store, its cross-reference API lets you add graph-like traversal without a separate graph database.

Amazon Neptune is a managed option for AWS-native architectures. Supports both Gremlin and SPARQL. Good if your team prefers not to manage graph infrastructure.

For most B2B product AI teams starting out, a PostgreSQL-based approach is the lowest-friction entry point: add a product_relationships table with (from_sku, relationship_type, to_sku, metadata), write traversal queries as recursive CTEs, and evolve to a dedicated graph database when query complexity warrants it.


Keeping the Graph In Sync

The same freshness challenge that applies to your vector index applies to your knowledge graph — arguably more acutely, because relationship errors (pointing to a discontinued product, showing stale compatibility data) produce confidently wrong answers that damage trust immediately.

See our article on product catalog sync and RAG freshness for the general principles. For the graph specifically:

Edge deletions are dangerous: When a product is discontinued, you need to remove or flag not just its node, but all edges pointing to it. A compatibility edge to a deleted node is worse than no edge — it actively misleads.

Version supersession chains: Don't delete supersession edges when a product is discontinued — the SUPERSEDES edge is itself the useful information. Mark the source node as status: discontinued and preserve the edge.

Bidirectionality: Decide upfront whether your edges are directed or undirected. COMPATIBLE_WITH is typically symmetric (if A works with B, B works with A), so store both directions or query both. SUPERSEDES is directed. Inconsistency here produces subtle bugs that are hard to debug.


When Do You Actually Need GraphRAG?

GraphRAG adds meaningful complexity to your architecture. Before adding it, evaluate whether your query distribution actually warrants it.

You need GraphRAG if:

  • A significant fraction of your support queries are relationship-type ("works with", "compatible", "substitute for", "what do I need")
  • Your catalog has dense relationship data: complex assemblies, accessories, variants, supersessions
  • Specification errors are costly — wrong compatibility answers lead to returned orders or equipment failures
  • Your catalog evolves frequently (discontinued SKUs, new product generations)

Flat vector search is probably enough if:

  • Your queries are primarily "tell me about product X" or "find me products for application Y"
  • Your catalog is relatively flat (individual products, few accessories or assemblies)
  • You're an early-stage deployment and want to prove value before adding infrastructure
  • Your relationship data is sparse or unmaintained

Many Axoverna customers start with flat RAG, which handles 70-80% of queries correctly, then add the graph layer once they've identified the specific relationship query patterns that flat search misses. The incremental approach lets you build confidence in the base system before layering in complexity.


What GraphRAG Enables End-to-End

The user experience payoff is significant. With a graph-backed product AI, your buyers can have conversations like:

Buyer: "We're specifying a hydraulic system for a mobile application. We need the Series-H pump — what else do we need?"

AI: "For a complete Series-H hydraulic installation, you'll typically need the following components: [BOM from graph traversal]. For mobile applications specifically, you'll also want to consider our Series-H-M manifold block (part HMB-200), which is rated for high-vibration environments — I see it's frequently ordered alongside the Series-H pump."

Buyer: "What if the HMB-200 is out of stock?"

AI: "The HMB-200 has an approved substitute: the HMB-210, which shares the same port configuration and pressure ratings. You can also use the third-party Bosch Rexroth M8-X manifold as a cross-reference — we have compatibility data confirming it fits the Series-H mounting pattern."

This multi-hop, relationship-rich conversation is impossible with flat vector search alone. It requires the graph to make it reliable and fast.


The Bigger Picture

Product knowledge AI in B2B isn't just about answering simple lookup queries faster than a search bar. The real ROI comes from replacing the sales engineer call — the 20-minute conversation where someone with deep product knowledge walks a customer through a specification, flags potential compatibility issues, suggests the right accessory, and confirms the right substitute when the first choice is unavailable.

That kind of expertise is encoded in relationships, not just descriptions. GraphRAG is how you put that relational knowledge into an AI system that's available at 2 AM, handles 50 simultaneous conversations, and never forgets which parts were superseded in last quarter's catalog refresh.

For related reading, see our guides on metadata filtering for precise retrieval, multi-turn conversations in B2B product AI, and hybrid search combining BM25 and dense vectors — all of which complement the GraphRAG architecture described here.


Axoverna's platform supports relationship-aware retrieval for B2B product catalogs — including compatibility traversal, BOM expansion, and supersession resolution, built on top of your existing product data.

Book a demo to see GraphRAG in action against a catalog like yours, or start a free trial and bring your first relationship dataset in minutes.

Ready to get started?

Turn your product catalog into an AI knowledge base

Axoverna ingests your product data, builds a semantic search index, and gives you an embeddable chat widget — in minutes, not months.