Building Q&A with ragflow_sdk: Chat, Session, and ask()
Building Q&A with ragflow_sdk: Chat, Session, and ask()
If you are using ragflow_sdk and wondering how to send real Q&A requests, the key detail is this: chat objects manage sessions, but the actual question answering happens through Session.ask().
This article walks through the full flow, including non-streaming and streaming responses, plus a quick look at what happens under the hood.
Why Sessions Matter
In RAGFlow, you typically:
- Get a chat assistant (
Chatobject) - Create a session from that assistant
- Send questions through
session.ask()
Without creating a session first, you cannot perform Q&A in this pattern.
End-to-End Example
from ragflow_sdk import RAGFlow
# 1) Initialize client
rag = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
# 2) Get a chat assistant
assistant = rag.list_chats(name="Miss R")[0]
# 3) Create a session
session = assistant.create_session()
# 4) Ask a question (non-streaming)
response = session.ask("What is retrieval-augmented generation?", stream=False)
print(response.content)
Session.ask() Parameters
The ask() method supports:
question: The user query (required)stream: Whether to return a stream (True) or a single final message (False)**kwargs: Additional variables passed through to the backend request
This makes it easy to inject runtime options (for example, custom style or business context fields).
Streaming Response Example
Use streaming when you want token-by-token or chunk-by-chunk output in real time:
for part in session.ask("Explain vector databases in simple terms.", stream=True):
print(part.content, end="", flush=True)
This pattern is useful for CLI tools, chat UIs, and any interface where latency perception matters.
Under the Hood: Which API Endpoint Is Called?
Session.ask() dispatches by session type:
- Chat session →
POST /chats/{chat_id}/completions - Agent session →
POST /agents/{agent_id}/completions
Internally, the payload includes:
questionstreamsession_id- any extra
kwargs
The returned Message object typically includes:
content: model response textreference: retrieved document chunks (when available)
Practical Tips
- Create one session per conversation thread for cleaner context boundaries.
- Use
stream=Falsefor simple backend workflows and batch jobs. - Use
stream=Truefor interactive UX. - Check
referencewhen you need provenance or source transparency.
Summary
chat.py gives you the session lifecycle, but Session.ask() is the true Q&A entry point. Once you follow the assistant → session → ask flow, you can switch between standard and streaming responses with one flag and pass custom runtime parameters as needed.