Multilingual Product AI: How to Serve International B2B Buyers in Any Language

B2B buyers increasingly expect to query product catalogs in their own language — even when your catalog lives in English. Here's how to architect multilingual RAG that actually works for product knowledge.

Axoverna Team

March 23, 202612 min read

A French procurement manager searches for "raccord hydraulique haute pression" on your distributor portal. Your catalog is entirely in English. The product exists — it's listed as "high-pressure hydraulic fitting" — but the search returns nothing. The buyer calls your support line, waits on hold, and eventually places the order with a competitor who happened to have a French-language interface.

This happens thousands of times a day across international B2B commerce. It's not a language problem, strictly speaking. It's a retrieval problem. And it's solvable with a multilingual RAG architecture that handles cross-lingual queries gracefully — without requiring you to manually translate your entire product catalog.

This article walks through how multilingual retrieval works, the specific design decisions that matter for B2B product data, and the tradeoffs between the main architectural approaches.

Why Multilingual Product Search Is Harder Than It Looks

The naive solution to multilingual product search is translation: translate every query to English before retrieval, then translate the response back. This works adequately for general text, but B2B product data breaks it in several ways.

Part numbers and model codes don't translate. A query like "Pumpe Typ KSB Etanorm 32-200" contains a manufacturer name, a product series, and a size specification. If a translation layer touches it, the part number gets mangled, and retrieval fails.

Technical terminology varies unpredictably. "Kugelhahn" (German for ball valve) might translate to "ball cock" in one translation model and "spherical tap" in another. If your catalog uses "ball valve" consistently, either translation misses it.

Units and standards are regional. European B2B buyers think in metric, use EN/DIN standards, and write specifications in formats that differ from their North American counterparts. These aren't just translation differences — they're conceptual gaps.

Brand and product names are language-agnostic but context-dependent. Searching "Siemens Sentron" in any language should retrieve the same products. But surrounding query context ("Siemens Sentron für Schaltanlagen 63A" vs "Siemens Sentron for 63A switchgear") carries semantic weight that a translation pipeline may distort.

A production multilingual retrieval system needs to handle all of these gracefully. There are three main architectural approaches, each with different tradeoffs.

Architecture 1: Query Translation Before Retrieval

The simplest approach: detect the query language, translate it to the language of your index (typically English), then run standard retrieval.

User query (French)
     │
     ▼
Language detection
     │
     ▼
Translation to English (LLM or MT)
     │
     ▼
Standard RAG retrieval (English index)
     │
     ▼
Retrieval results (English chunks)
     │
     ▼
LLM generates response in original language

When it works well: For catalogs that are entirely in one language and serving buyers who ask natural language questions (application descriptions, troubleshooting queries, compatibility checks). The LLM at the end of the pipeline handles the response language naturally — just instruct it to reply in the detected query language.

Where it breaks down: Part numbers, model codes, and technical identifiers get mangled by translation. You need a pre-translation extraction step that identifies and protects these tokens from translation. Something like:

interface ParsedQuery {
  translatable: string      // "I need a high pressure valve for my system"
  protected: string[]       // ["KSB-DN50-PN40", "ISO-228"]
}
 
function parseQueryForTranslation(query: string): ParsedQuery {
  const partNumberPattern = /\b[A-Z0-9]{3,}-[A-Z0-9\-\.]{2,}\b/g
  const protected = query.match(partNumberPattern) ?? []
  const translatable = query.replace(partNumberPattern, '__TERM__')
  return { translatable, protected }
}

After translation, reinsert the protected tokens at the positions marked by __TERM__. Then run retrieval using both the translated natural language and a direct lookup on the extracted identifiers.

The hidden cost: Translation adds latency (100–400ms for an LLM call, 50–150ms for a dedicated MT API), and it adds another failure mode. If the translation is poor, retrieval fails, and the system silently returns wrong results. You need evaluation on the translation step, not just end-to-end.

Architecture 2: Multilingual Embeddings

The more elegant solution — and the one that's become practically viable in the last 18 months — is to use an embedding model that naturally represents multiple languages in a shared vector space.

Models like multilingual-e5-large, paraphrase-multilingual-mpnet-base-v2, and the more recent mxbai-embed-multilingual-v1 are trained on parallel multilingual corpora. They learn that "ball valve" and "Kugelhahn" and "robinet à boisseau sphérique" should sit close together in embedding space — because they describe the same thing.

With a multilingual embedding model:

User query (French): "raccord hydraulique haute pression"
     │
     ▼
Multilingual embedding → vector near "high-pressure hydraulic fitting"
     │
     ▼
ANN search against index (English catalog, embedded with same model)
     │
     ▼
Retrieval: "high-pressure hydraulic fitting, PN40, 316SS..."
     │
     ▼
LLM responds in French

No translation step. No protected token extraction. The embedding model handles the cross-lingual alignment natively.

The practical tradeoffs:

Multilingual embedding models are larger and slower than English-only models. multilingual-e5-large is 560M parameters versus e5-base-en at 110M — roughly 5× more compute per embedding. For a catalog with 500,000 chunks, that's meaningful at ingest time. At query time, you're only embedding one string, so the latency difference is small (10–20ms in practice).

More importantly, multilingual models have lower peak performance on English-to-English retrieval than the best English-specialized models. If 80% of your users query in English, you're leaving some retrieval quality on the table with a multilingual model versus text-embedding-3-large or similar.

This tradeoff resolves differently depending on your query distribution:

Predominantly English queries, occasional multilingual → use English-specialized model + query translation fallback
Mixed multilingual queries across 3+ languages → multilingual embedding is the right default
High-precision requirement with English-first catalog → English model + dedicated multilingual search layer

Choosing a Multilingual Embedding Model

As of early 2026, these are the options worth evaluating:

Model	Languages	Dimensions	Relative Quality	Notes
`multilingual-e5-large`	100+	1024	Strong	Good general baseline
`mxbai-embed-multilingual-v1`	100+	1024	Strong	Recent; strong on European languages
`paraphrase-multilingual-mpnet-base-v2`	50+	768	Moderate	Lighter weight, lower quality
`text-embedding-3-large` (OpenAI)	100+	3072	Strong (ML)	Excellent but expensive per embedding
Cohere `embed-multilingual-v3`	100+	1024	Strong	Good B2B retrieval benchmark results

For B2B product data specifically, evaluate on queries that include:

Technical abbreviations and standards (EN, DIN, ASTM, JIS)
Numeric specifications in different languages ("32mm" vs "32-Millimeter")
Product categories in domain-specific terminology across languages

Don't rely solely on MTEB benchmark scores — they measure general NLP tasks, not product retrieval. Build a small evaluation set from your own catalog and test each candidate model on it.

Architecture 3: Language-Partitioned Indexes

The third approach: maintain separate indexes per language, each containing translated or localized content, and route queries to the appropriate index based on detected language.

User query (German)
     │
     ▼
Language detection → "de"
     │
     ├──► German index (translated catalog) ──► German chunks
     │
     └──► English index (fallback) ──► English chunks
           │
           ▼
     Fuse results → LLM response in German

This requires translating your catalog into each supported language at ingest time — a significant operational investment. But it enables the highest retrieval quality for each language, because each index is searched with language-specialized embeddings or BM25 models tuned to that language.

When this makes sense:

You have a small, stable catalog (thousands of products, not millions)
You serve well-defined, high-priority markets (e.g., German, French, and Spanish — not 20+ languages)
Retrieval precision is critical and you can absorb the translation cost at ingest time
You want to serve localized content (not just localized retrieval) — regional pricing, country-specific certifications, local technical standards

For most B2B distributors, this is over-engineered for the retrieval layer. Where language-partitioned indexes pay off is in the display layer: you translate product descriptions, specifications, and datasheets for your key markets, and serve localized content when it's available. The retrieval layer can still use multilingual embeddings, while the display layer falls back to translated content when available and English when not.

The Hybrid Multilingual Architecture in Practice

For most B2B product knowledge applications, the right architecture combines elements from all three approaches:

1. Multilingual embeddings as the primary retrieval layer

Use a high-quality multilingual embedding model for both document and query embeddings. This handles the vast majority of cross-lingual semantic queries with no added latency or complexity.

Pair it with hybrid BM25 + dense retrieval, but configure BM25 for the catalog's primary language. BM25 handles exact part numbers and model codes regardless of the query language — part number tokens are language-neutral.

2. Language detection and protected token extraction

Before embedding, run lightweight language detection (fastText's language ID model is ~1MB and handles this in under 5ms). Extract part numbers, model codes, and technical standards using a regex/pattern layer — these get passed directly to the exact-match lookup stage, bypassing the multilingual embedding entirely.

3. Translation fallback for weak retrieval results

If confidence scores on the top retrieved chunks fall below a threshold, fall through to a translation-then-retrieval path. This handles edge cases where the multilingual model struggles with highly domain-specific terminology in a particular language.

4. Response generation in the query language

Instruct your LLM to respond in the detected query language. Modern frontier models (GPT-4o, Claude, Gemini) handle this naturally — provide the retrieved context in English and ask for a response in French, and the model generates fluent French grounded in the English product data. The response quality is high because the model sees accurate source material, even if that source is in a different language.

const systemPrompt = `You are a product knowledge assistant for ${companyName}.
Answer the user's question based on the provided product information.
Always respond in ${detectedLanguage} — even though the product data may be in English.
Be precise with technical specifications. Do not invent specifications not found in the context.`

Content Enrichment for Multilingual Catalogs

The retrieval architecture gets you most of the way there. But for high-traffic markets, it's worth enriching your catalog data with language-specific content at ingest time.

Synonyms and terminology mappings. Build a domain-specific synonym dictionary for your key markets. "Ball valve" → "Kugelhahn", "Kugelventil" (German), "robinet à boisseau sphérique", "vanne à boisseau" (French). These synonyms can be stored as metadata and injected into the BM25 index, improving keyword retrieval for German and French queries against an English catalog without full translation.

Units and standards normalization. As we noted in our hybrid search article, adding both metric and imperial representations at ingest improves retrieval for international users. Extend this to standards: if your product is UL-listed, note that it meets the European equivalent; if it carries the CE mark, note US compliance equivalents where applicable. This cross-standard enrichment improves retrieval for buyers who search by their local standard names.

Regional specification formats. German buyers frequently include DIN numbers in queries. French buyers may include NF standards. Japanese buyers use JIS. Ingest these standard cross-references as additional indexed content, and your retrieval layer becomes significantly more robust across markets without a full multilingual content strategy.

Evaluating Multilingual Retrieval Quality

Standard retrieval metrics apply — Recall@K, MRR, NDCG — but you need evaluation sets per language, not just overall.

Build language-specific evaluation sets with at least 50 query-result pairs per language you intend to support. Include:

Category A (semantic cross-lingual): Natural language queries in language X, relevant products described in language Y. Tests the multilingual embedding alignment.
Category B (identifier queries): Part numbers, model codes, and standard references in any language. Tests the exact-match lookup layer.
Category C (technical specification queries): Queries with numeric specifications and units in language-specific formats. Tests the enrichment and normalization layer.

Track these separately in your evaluation harness. If Category A drops below your threshold in German while English is fine, the issue is embedding model quality for German — investigate a language-specific model or add German synonym enrichment. If Category B fails, the part number extraction regex is incomplete.

This breakdown gives you actionable signal, not just an aggregate number that hides where the system is actually failing.

Operational Considerations

Catalog update propagation. When a product changes in your English catalog, any language-specific enrichment for that product needs to update too. If you're maintaining translated descriptions or localized synonym mappings, establish a workflow that flags multilingual content for review when the source changes. This is often the piece teams underinvest in — retrieval quality degrades silently over time as the catalog evolves and localized content drifts out of sync.

Language coverage vs. quality tradeoff. It's better to serve three languages excellently than twelve languages poorly. Define your tier-1 markets (where you'll invest in synonym enrichment and potentially translated content) and tier-2 markets (multilingual embeddings only, no custom enrichment). Set user expectations accordingly.

Embedding model updates. When you upgrade your embedding model, you need to re-embed your entire catalog. For large catalogs, this is a planned maintenance event. Keep your embedding pipeline idempotent — if you can re-embed incrementally (by chunk hash), a model upgrade becomes a background job rather than a downtime window. We covered the mechanics in detail in our product catalog sync and freshness article.

The Business Case for Multilingual Product AI

The revenue impact of multilingual retrieval is measurable. For B2B distributors operating in multiple countries, the gap between "search works" and "search fails" translates directly to quote requests, order placement, and customer retention.

The customers who don't find what they're looking for don't usually complain — they just place the order elsewhere. The failed searches are invisible in your analytics until you look for them. Adding multilingual retrieval turns invisible failure into recovered revenue.

For companies already operating pan-European or international distribution, this is not a luxury feature for the roadmap. It's infrastructure for markets you're nominally serving but functionally failing.

Start Speaking Your Buyers' Language

Axoverna's retrieval engine is built on multilingual embeddings by default — the same product knowledge base serves buyers in English, German, French, Spanish, Dutch, and other major European languages without separate catalog translations or additional configuration.

Book a demo to see how your existing catalog performs across languages, or start a free trial and watch your international buyers find what they're looking for the first time.

Ready to get started?

Turn your product catalog into an AI knowledge base

Axoverna ingests your product data, builds a semantic search index, and gives you an embeddable chat widget — in minutes, not months.

Start free — no credit card required →Read the docs

Technical

BOM-Aware Product AI: How to Turn Part-Level Questions Into Procurement-Ready Answers

Most product AI systems answer one SKU at a time. B2B buyers work from assemblies, spare parts lists, and bills of materials. BOM-aware retrieval helps AI reason across sets of parts, dependencies, alternates, and order constraints so conversations lead to real purchasing decisions.

May 24, 202611 min read

Technical

Revenue-Weighted Evaluation for B2B Product AI: Why All Retrieval Errors Are Not Equal

Most B2B teams evaluate product AI with flat accuracy metrics. The better approach is to weight failures by commercial risk, so mistakes on high-value, high-complexity workflows get fixed before low-stakes browsing errors.

May 23, 202611 min read

Technical

How Conversation Mining Turns Product AI Into a Product Data Improvement Engine

Most B2B teams treat AI chat logs as support exhaust. The smarter move is to mine them for missing attributes, broken mappings, unclear terminology, and catalog blind spots, then feed those insights back into product data operations.

May 22, 202612 min read

Why Multilingual Product Search Is Harder Than It Looks

Architecture 1: Query Translation Before Retrieval

Architecture 2: Multilingual Embeddings

Choosing a Multilingual Embedding Model

Architecture 3: Language-Partitioned Indexes

The Hybrid Multilingual Architecture in Practice

1. Multilingual embeddings as the primary retrieval layer

2. Language detection and protected token extraction

3. Translation fallback for weak retrieval results

4. Response generation in the query language

Content Enrichment for Multilingual Catalogs

Evaluating Multilingual Retrieval Quality

Operational Considerations

The Business Case for Multilingual Product AI

Start Speaking Your Buyers' Language

Turn your product catalog into an AI knowledge base

Related articles

BOM-Aware Product AI: How to Turn Part-Level Questions Into Procurement-Ready Answers

Revenue-Weighted Evaluation for B2B Product AI: Why All Retrieval Errors Are Not Equal

How Conversation Mining Turns Product AI Into a Product Data Improvement Engine