How RAGFlow Resolves index_name and knowledgebase_id During Chat
How RAGFlow Resolves index_name and knowledgebase_id During Chat
When users select one or more datasets in RAGFlow chat, retrieval only works if the system can map each dataset to the correct vector storage layout. In practice, that means answering two key questions:
- Which index namespace should be used?
- Which vector dimension should be used for query and document matching?
This article walks through the internal flow and explains why index_name, knowledgebase_id, and embedding model consistency are tightly coupled.
The Problem Behind the Scenes
In a multi-tenant system, each tenant must be isolated at storage level, and each knowledge base may carry its own metadata. At retrieval time, RAGFlow must guarantee:
- tenant-level isolation,
- schema compatibility for vector fields,
- and consistent embedding behavior across selected datasets.
If any of these assumptions break, retrieval can fail or produce invalid similarity scores.
Step 1: Build index_name from tenant_id
RAGFlow derives the index namespace from tenant_id via search.index_name(tenant_id), then initializes the storage index with the current kb_id and vector size.
def init_kb(row, vector_size: int):
idxnm = search.index_name(row["tenant_id"])
parser_id = row.get("parser_id", None)
return settings.docStoreConn.create_idx(idxnm, row.get("kb_id", ""), vector_size, parser_id)
Why this matters:
tenant_idcontrols namespace boundaries.kb_ididentifies the target knowledge base inside that namespace.- The resulting storage object is prepared with the expected vector schema.
Step 2: Infer Vector Dimension from Embeddings
Vector dimension is not hard-coded. It is inferred from embedding output and therefore depends on the selected embedding model (embd_id) configured on the knowledge base.
vector_size = 0
for i, d in enumerate(docs):
v = vects[i].tolist()
vector_size = len(v)
d["q_%d_vec" % len(v)] = v
This has an important consequence: schema and query vectors are model-dependent. If datasets use different embedding models, their vector spaces are not directly compatible.
Step 3: Retrieval Uses the Same Embedding Context
During chat retrieval, RAGFlow passes the embedding model context, tenant scope, and selected knowledge base IDs into the retriever.
kbinfos = await retriever.retrieval(
" ".join(questions),
embd_mdl,
tenant_ids,
dialog.kb_ids,
1,
dialog.top_n,
dialog.similarity_threshold,
dialog.vector_similarity_weight,
doc_ids=attachments,
top=dialog.top_k,
aggs=True,
rerank_mdl=rerank_mdl,
rank_feature=label_question(" ".join(questions), kbs),
)
At this point, the system expects vectors in the target stores to match the embedding model used for the query.
How Table Names Are Derived
The physical table naming rule is:
{index_name}_{knowledgebase_id}
Example:
tenant_id = tenant123→index_name = ragflow_tenant123kb_id = kb456→ table nameragflow_tenant123_kb456
This naming strategy combines tenant isolation with knowledge-base-level granularity.
Why Mixed Embedding Models Are Rejected
RAGFlow validates selected datasets before chat. If the datasets do not share the same embedding model family, the request is rejected.
That safeguard prevents:
- cross-space similarity comparisons,
- vector dimension conflicts,
- and retrieval quality degradation caused by incompatible embedding spaces.
Key Takeaways
index_nameis derived fromtenant_idto enforce tenant isolation.- Vector dimension comes from the embedding model output, not manual configuration.
- Storage tables follow
{index_name}_{knowledgebase_id}. - Chat retrieval is valid only when selected datasets use compatible embedding models.
References
rag/svr/task_executor.py:564-567(init_kb, vector size inference)api/db/db_models.py:833(embd_idin knowledge base model)api/apps/chunk_app.py:408(embedding model bundle creation)api/db/services/dialog_service.py:560-574(chat retrieval call)rag/utils/infinity_conn.py:318-319(table naming)api/apps/dialog_app.py:103-106(embedding consistency validation)- https://deepwiki.com/search/datasetragflowknowledgebase-kn_06736f77-7afb-4b2c-8afa-7885dffcf5f3