2026-03-28

Improving embedding models’ semantic search using a Pyxon reasoning model in Arabic

Hamza Salem, Ahmed Algareeb & Manuel Mazzara

embeddings · semantic search · Arabic · RAG · reasoning models

A two-stage pipeline pairs multilingual embeddings with an Arabic-capable LLM that reorders retrieval results and explains each ranking—improving both order and explainability for Arabic queries.

Hamza Salem^* — PYXON AI Department (hamzas@pyxon.ai)
Ahmed Algareeb — PYXON AI Department (ahamdg@pyxon.ai)
Manuel Mazzara — Innopolis University (m.mazzara@innopolis.ru)

^* Corresponding author.

Paper

Full paper (PDF) — we will publish the link here as soon as it is available.

Abstract

Embedding-based semantic search can rank candidates by cosine similarity, but for Arabic, lexical and morphological overlap often diverge from true semantic relevance—for example terms like qaʼud “young camel” versus jumla “sentence”. We propose a two-stage pipeline:

An embedding model retrieves and ranks candidates with respect to a query.
An Arabic-capable reasoning model (LLM) reorders the list by comparing adjacent pairs and moving more relevant items up, then attaches a short Arabic explanation per item.

We evaluate on a query about camels (ibal) over 15 Arabic sentences mixing camel-related, grammar-related (jumla), and unrelated (amud) items. Using Cohere embed-multilingual-v3.0 and command-r7b-arabic-02-2025, the reasoning stage corrects several ranking errors—notably moving “qaʼud masafir saghir” (young traveling camel) from embedding rank 12 to reasoned rank 6—and yields interpretable explanations. Combining embeddings with a reasoning model appears to improve both order and explainability of semantic search in Arabic.

Introduction

Semantic search relies on embedding models that map sentences or documents to dense vectors and rank candidates by similarity (for example cosine) between query and candidate embeddings. This pattern scales well and underpins search, question answering, and retrieval-augmented generation (RAG). For high-resource languages, multilingual encoders often behave reasonably; for Arabic, diglossia, root-based morphology, and ambiguous surface forms add difficulty.

Lexical overlap with the query is not always a reliable proxy for relevance. Jamāl may evoke “camel” or appear inside jumla (“sentence”) in grammar contexts; qaʼud names a young camel and is conceptually related to ibal (“herd of camels”), yet embeddings may underscore that sense. Dialectal or low-frequency-but-relevant phrases can sink in the ranking. A post-processing step that reasons explicitly in Arabic is a natural mitigation.

Our approach keeps retrieval embedding-driven for scale, then runs a comparatively small reordering-and-explanation pass with an Arabic reasoning LLM: the model compares adjacent pairs in the ranked list and promotes the more query-relevant sentence when appropriate, assigning a reasoned score and a short Arabic justification per sentence.

Contributions

We summarize the work as follows:

Pipeline — A two-stage design that combines embedding retrieval with an Arabic LLM for reordering and per-candidate explanations.
Experiments — Results on one query and fifteen Arabic sentences spanning three settings: embedding-only (Cohere), multimodel comparisons (including E5-large), and embedding plus reasoning.
Qualitative outcomes — The reasoning stage materially fixes under-ranking of “qaʼud masafir saghir”, and produces Arabic explanations without relying on explicit in-prompt demonstrations.