
Solved the problem of mixed responses to multiple inquiries in Amazon Bedrock RAG chatbot using the working buffer + archive pattern
This page has been translated by machine translation. View original
While developing a RAG chatbot for a service desk, I received a report that "after receiving a response to the first inquiry, if a second inquiry is sent without pressing the clear button, the content of the first response and email draft gets mixed into the second response."
In this article, I'll write about analyzing the root cause of the problem and solving it using the working buffer + archive pattern.
System Overview
What I'm developing is an AI assistant for internal service desk staff. When a staff member enters an inquiry such as "Cannot log in to Salesforce," it searches past tickets from Amazon Bedrock Knowledge Bases, and the AI generates and presents similar cases, response procedures, and email drafts.
The tech stack looks like this:
- Frontend: Next.js + Server-Sent Events (SSE)
- Backend: Python / FastAPI
- LLM: Amazon Bedrock (Claude Sonnet)
- Knowledge Base: Amazon Bedrock Knowledge Bases + OpenSearch Serverless
Conversations support multi-turn dialogue (clarification flow), where the AI returns a confirmation question such as "What are the symptoms?" and when the staff member selects an option, the final response is generated.
Reproducing the Problem
After receiving the report, I actually tried it myself.
- Q1: "I want to request a printer installation" → AI returns printer-related procedures and email draft
- Q2: (without clearing) "Cannot log in to Salesforce" → Send
The response to Q2 contained phrases like "Regarding the printer installation..." and the email draft contained mixed content about both printers and Salesforce.
Root Cause
Checking the frontend code, the problem was simple.
const [messages, setMessages] = useState<ChatMessage[]>([]);
// Send Q1
setMessages([{ role: "user", content: "I want to request a printer installation" }]);
// After receiving Q1 response, add to messages
setMessages(prev => [...prev, { role: "assistant", content: q1Answer }]);
// Send Q2 — messages still contains all of Q1's exchanges
setMessages(prev => [...prev, { role: "user", content: "Cannot log in to Salesforce" }]);
Since the messages array is sent as-is to the backend, the LLM generates the Q2 response while seeing the entire Q1 conversation history. The message sequence received by Bedrock looks like this:
user: I want to request a printer installation
assistant: [Detailed printer-related response + email draft]
user: Cannot log in to Salesforce ← Q2
From the LLM's perspective, this is "a continuation of a conversation with a printer staff member." Even if the KB chunks for Q2 contain Salesforce-related information, since the immediately preceding assistant message contains printer information, it generates a response that carries that over.
Examining Solutions
I considered several approaches.
Option A: Always press the clear button before each inquiry
This solves it technically, but does not meet the client requirement that "independent inquiries should be treated as such even without clearing."
Option B: Automatically reset messages after the final response
// When final response is received
setMessages([]);
If we reset messages = [] after Q1 completes, the history will be empty when Q2 is sent. However, this makes it impossible to explicitly reference "the ticket #123 from earlier." Since the client also had a requirement that "related past exchanges should be referenceable," this option does not meet the requirements.
Option C: Working Buffer + Archive Pattern (adopted)
This is an industry-standard pattern similar to LangChain's ConversationSummaryBufferMemory.
messages[](working buffer): Holds only the currently ongoing inquiry. Reset after the final response is confirmed.archivedMessages[](archive): Accumulates completed inquiries as text. Passed as reference for the next inquiry.
This balances both requirements: "focus on the current inquiry while being able to reference the past when needed."
Implementation
Frontend Side
const [messages, setMessages] = useState<ChatMessage[]>([]);
const [archivedMessages, setArchivedMessages] = useState<ChatMessage[]>([]);
In the SSE event handler, behavior is branched by response_type.
if (chatResponse.response_type === "results") {
// Final response: move current messages to archive and reset working buffer
setMessages((currentMessages) => {
const finalAssistantMsg: ChatMessage = {
role: "assistant",
content: JSON.stringify({
response_type: chatResponse.response_type,
general_advice: chatResponse.general_advice,
}),
};
setArchivedMessages((arch) => [...arch, ...currentMessages, finalAssistantMsg]);
return []; // Reset working buffer
});
} else {
// Clarification question: continue adding to working buffer (maintain context)
setMessages((prev) => [
...prev,
{
role: "assistant",
content: JSON.stringify({
response_type: chatResponse.response_type,
question: chatResponse.question,
options: chatResponse.options,
}),
},
]);
}
The key point is calling setArchivedMessages inside the functional update of setMessages. This allows us to accurately archive the currentMessages accumulated during the clarification flow (the exchanges of Q + clarification questions + options).
We add archived_messages to the backend request.
body: JSON.stringify({
messages: nextMessages,
previous_ticket_ids: previousTicketIds,
archived_messages: archivedMessages, // Add archive
}),
hasHistory (the display condition for the clear button) was changed from messages.length > 0 to bubbles.length > 0. This ensures the clear button remains visible while bubbles are still displayed on screen, even after messages has been reset.
Backend Side: Model
We add a field to ChatRequest. Since it defaults to an empty list, backward compatibility with existing clients is maintained.
# backend/app/models/chat.py
class ChatRequest(BaseModel):
messages: list[ChatMessage] = Field(...)
archived_messages: list[ChatMessage] = Field(
default_factory=list,
description="Past completed inquiries (for reference). Does not affect the current inquiry.",
)
Backend Side: Context Injection
We inject the archive as text rather than as conversation messages.
# backend/app/api/routes/chat.py
def _format_archived_context(archived: list) -> str:
"""Format completed inquiries into reference text."""
lines: list[str] = []
exchange_num = 0
i = 0
while i < len(archived):
if archived[i].role == "user":
exchange_num += 1
lines.append(f"[Inquiry {exchange_num}] {archived[i].content}")
i += 1
if i < len(archived) and archived[i].role == "assistant":
try:
resp = json.loads(archived[i].content)
advice = resp.get("general_advice", "")
if advice:
lines.append(f"[Response {exchange_num}] {advice[:400]}")
except (ValueError, KeyError):
lines.append(f"[Response {exchange_num}] {archived[i].content[:400]}")
i += 1
else:
i += 1
return "\n".join(lines)
Then it is injected into the user message in the same way as KB chunks and domain context.
archived_section = ""
if request.archived_messages:
archived_text = _format_archived_context(request.archived_messages)
archived_section = f"\n\n## Past Inquiry Responses (For Reference)\n{archived_text}"
augmented = (
f"## Past Ticket Data (Knowledge Base Search Results)\n{context}"
f"{domain_ctx}"
f"{archived_section}" # ← Inject archive here
f"\n\n## Inquiry\n{request.messages[0].content}"
)
Why Not Pass It as Conversation Messages
Bedrock requires strict alternating user/assistant turns. Attempting to pass the archive as conversation messages makes order management complex, and the first archived user message is already bloated with injected KB chunks. Injecting as text is simpler and consistent with the existing KB chunk injection pattern.
Adding Instructions to the System Prompt
We explicitly instruct the LLM not to mix the archive into the current response.
## About Past Inquiries (For Reference)
If the message contains a "Past Inquiry Responses (For Reference)" section:
- Use it only when the current inquiry explicitly references a past topic
(e.g., "Tell me more about ticket #123 from earlier" → reference allowed)
- If the current inquiry is about a new topic, do not include past content in the response
- mail_draft and todo_actions must be generated based solely on the current inquiry's ticket
- Do not mix the content of past responses into the current response
Verification
After implementation, I verified with the following scenarios.
Mixing Test
- Q1 "I want to request a printer installation" → Response displayed
- Q2 "Cannot log in to Salesforce" → Send (without clearing)
I confirmed that Q2's general_advice contained only Salesforce-related content, and the email draft contained only Salesforce content as well.
Clarification Flow Test
- Q1 (an inquiry that returns clarification) → "What are the symptoms?" is returned
- Select option "Still cannot log in after password reset" → Final response
- Send Q2
Checking the network tab, Q2's request archived_messages contained 4 messages — [Q1_user, Q1_clarification, Q1_option, Q1_final_assistant] — confirming that the entire conversation was archived as a single unit.
Reference Test
After completing Q1, I sent "Tell me more about ticket #1234 from earlier," and the AI appropriately picked up Q1's context from the archive and responded.
Summary
To prevent context contamination across multiple inquiries in an Amazon Bedrock RAG chatbot, I implemented the working buffer + archive pattern.
| Problem | Solution |
|---|---|
| Q1's response gets mixed into Q2 | Reset messages (working buffer) after the final response |
| Past context becomes unreferenceable | Accumulate completed exchanges in archivedMessages and inject as reference text |
| Bedrock turn constraints | Inject archive as text rather than conversation messages |
The reliability of this pattern lies in "structural separation." Since Q2 is sent with messages empty, the LLM cannot reference Q1's conversation turns without relying on system prompt instructions. System prompt instructions are still needed for handling the archive, but this part is sufficiently covered by the relatively simple instruction "use it only when the user explicitly requests a reference."
I hope this serves as a reference for anyone who encounters similar issues with RAG chatbots.
