Solved the problem of mixed responses to multiple inquiries in an Amazon Bedrock RAG chatbot using the working buffer plus archive pattern

I solved the problem of the first inquiry's response bleeding into the second inquiry in a RAG chatbot using a working buffer + archive pattern. I will introduce an implementation that resets messages after the final response while keeping past context accessible via archivedMessages.

Amazon Bedrock

lin-yuchen

2026.05.29

This page has been translated by machine translation. View original

While developing a RAG chatbot for a service desk, I received a report that "after getting a response to the first inquiry, if you send a second inquiry without pressing the clear button, the content of the first response and email draft gets mixed into the second response."
In this article, I'll write about analyzing the root cause of the problem and resolving it using the working buffer + archive pattern.
 System OverviewWhat I'm developing is an AI assistant for in-house service desk staff. When a staff member enters an inquiry such as "I can't log in to Salesforce," the AI searches past tickets from Amazon Bedrock Knowledge Bases and presents similar cases, response procedures, and email drafts.
The technology stack looks like this:
Frontend: Next.js + Server-Sent Events (SSE)
Backend: Python / FastAPI
LLM: Amazon Bedrock (Claude Sonnet)
Knowledge Base: Amazon Bedrock Knowledge Bases + OpenSearch Serverless
Conversations support multi-turn dialogue (clarification flow), where the AI returns a clarifying question such as "What symptoms are you experiencing?" and once the staff member selects an option, the final response is generated.
 Reproducing the ProblemAfter receiving the report, I actually tried it myself.
Q1: "I'd like to request a printer installation" → AI returns printer-related procedures and email draft
Q2: (without clearing) "I can't log in to Salesforce" → Send
The response to Q2 then contained phrases like "Regarding the printer installation..." and the email draft was a mix of printer and Salesforce content.
 Root CauseChecking the frontend code, the problem was simple.
const [messages, setMessages] = useState<ChatMessage[]>([]);

// Send Q1
setMessages([{ role: "user", content: "I'd like to request a printer installation" }]);

// After receiving Q1 response, add to messages
setMessages(prev => [...prev, { role: "assistant", content: q1Answer }]);

// Send Q2 — messages still contains all of Q1's exchange
setMessages(prev => [...prev, { role: "user", content: "I can't log in to Salesforce" }]);
Since the backend request sends the messages array as-is, the LLM generates Q2's response while seeing the entire Q1 conversation history. The message sequence Bedrock receives looks like this:
user:      I'd like to request a printer installation
assistant: [Detailed printer-related response + email draft]
user:      I can't log in to Salesforce  ← Q2
From the LLM's perspective, this is "a continuation of the conversation with the printer staff member." Even if the KB chunks for Q2 contain Salesforce-related information, the preceding assistant message contains printer information, causing a response that carries that over.
 Examining Potential SolutionsI considered several approaches.
 Option A: Operationally require pressing the clear button each timeThis technically solves the problem, but doesn't meet the client requirement of "we want independent inquiries to be handled independently even without clearing."
 Option B: Automatically reset messages after the final response// When final response is received
setMessages([]);
By resetting messages = [] after Q1 is complete, the history would be empty when Q2 is sent. However, this would make it impossible to explicitly reference something like "tell me more about ticket #123 from earlier." The client also had a requirement that "we want to be able to reference related past exchanges," so this option doesn't meet the requirements.
 Option C: Working Buffer + Archive Pattern (adopted)This is an industry-standard pattern similar to LangChain's ConversationSummaryBufferMemory.
messages[] (working buffer): Holds only the currently ongoing inquiry. Reset after the final response is confirmed.
archivedMessages[] (archive): Accumulates completed inquiries as text. Passed as reference for subsequent inquiries.
This satisfies both requirements: "focus on the current inquiry while being able to reference the past when needed."
 Implementation Frontend Sideconst [messages, setMessages] = useState<ChatMessage[]>([]);
const [archivedMessages, setArchivedMessages] = useState<ChatMessage[]>([]);
Inside the SSE event handler, behavior is branched based on response_type.
if (chatResponse.response_type === "results") {
  // Final response: move current messages to archive and reset working buffer
  setMessages((currentMessages) => {
    const finalAssistantMsg: ChatMessage = {
      role: "assistant",
      content: JSON.stringify({
        response_type: chatResponse.response_type,
        general_advice: chatResponse.general_advice,
      }),
    };
    setArchivedMessages((arch) => [...arch, ...currentMessages, finalAssistantMsg]);
    return []; // Reset working buffer
  });
} else {
  // Clarification question: continue adding to working buffer (maintain context)
  setMessages((prev) => [
    ...prev,
    {
      role: "assistant",
      content: JSON.stringify({
        response_type: chatResponse.response_type,
        question: chatResponse.question,
        options: chatResponse.options,
      }),
    },
  ]);
}
The key point is calling setArchivedMessages inside the functional update of setMessages. This allows accurately archiving the currentMessages (the exchange of Q + clarifying questions + options) accumulated during the clarification flow.
The archived_messages is added to the backend request.
body: JSON.stringify({
  messages: nextMessages,
  previous_ticket_ids: previousTicketIds,
  archived_messages: archivedMessages, // Add archive
}),
hasHistory (the display condition for the clear button) was changed from messages.length > 0 to bubbles.length > 0. This ensures the clear button remains visible as long as bubbles are displayed on screen, even when messages is reset.
 Backend Side: ModelAdd a field to ChatRequest. Since it defaults to an empty list, backward compatibility with existing clients is maintained.
# backend/app/models/chat.py
class ChatRequest(BaseModel):
    messages: list[ChatMessage] = Field(...)
    archived_messages: list[ChatMessage] = Field(
        default_factory=list,
        description="Completed past inquiries (for reference). Does not affect the current inquiry.",
    )
 Backend Side: Context InjectionInject the archive as text rather than as conversation messages.
# backend/app/api/routes/chat.py
def _format_archived_context(archived: list) -> str:
    """Format completed inquiries into reference text."""
    lines: list[str] = []
    exchange_num = 0
    i = 0
    while i < len(archived):
        if archived[i].role == "user":
            exchange_num += 1
            lines.append(f"[Inquiry {exchange_num}] {archived[i].content}")
            i += 1
            if i < len(archived) and archived[i].role == "assistant":
                try:
                    resp = json.loads(archived[i].content)
                    advice = resp.get("general_advice", "")
                    if advice:
                        lines.append(f"[Response {exchange_num}] {advice[:400]}")
                except (ValueError, KeyError):
                    lines.append(f"[Response {exchange_num}] {archived[i].content[:400]}")
                i += 1
        else:
            i += 1
    return "\n".join(lines)
Then inject it into the user message in the same way as KB chunks and domain context.
archived_section = ""
if request.archived_messages:
    archived_text = _format_archived_context(request.archived_messages)
    archived_section = f"\n\n## Past Inquiry Responses (For Reference)\n{archived_text}"

augmented = (
    f"## Past Ticket Data (Knowledge Base Search Results)\n{context}"
    f"{domain_ctx}"
    f"{archived_section}"    # ← Inject archive here
    f"\n\n## Inquiry\n{request.messages[0].content}"
)
 Why Not Pass It as Conversation MessagesBedrock requires a strict alternating user/assistant turn structure. Trying to pass the archive as conversation messages makes order management complex, and the first archived user message is already bloated with injected KB chunks. Injecting as text is simpler and consistent with the existing KB chunk injection pattern.
 Adding Instructions to the System PromptAdd explicit instructions to prevent the LLM from mixing archive content into the current response.
## Regarding Past Inquiries (For Reference)

If the message contains a "Past Inquiry Responses (For Reference)" section:
- Only use it when the current inquiry explicitly references a past topic
  (e.g., "Tell me more about ticket #123 from earlier" → reference allowed)
- If the current inquiry is a new topic, do not include past content in the response
- mail_draft and todo_actions must always be generated based only on the current inquiry's ticket
- Do not mix the content of past responses into the current response
 VerificationAfter implementation, I verified with the following scenarios.
Mixing Test
Q1 "I'd like to request a printer installation" → Response displayed
Q2 "I can't log in to Salesforce" → Send (without clearing)
I confirmed that Q2's general_advice contained only Salesforce-related content and the email draft contained only Salesforce content.
Clarification Flow Test
Q1 (an inquiry that triggers clarification) → "What symptoms are you experiencing?" is returned
Select option "Still can't log in after password reset" → Final response
Send Q2
Checking the network tab, I confirmed that the archived_messages in Q2's request contained 4 messages [Q1_user, Q1_clarification, Q1_option, Q1_final_assistant], verifying that the entire conversation was archived as a single unit.
Reference Test
After completing Q1, I sent "Tell me more details about ticket #1234 from earlier," and it appropriately retrieved Q1's context from the archive and responded accordingly.
 SummaryTo prevent context contamination across multiple inquiries in an Amazon Bedrock RAG chatbot, I implemented the working buffer + archive pattern.


Problem
Solution


Q1 response mixing into Q2
Reset messages (working buffer) after final response

Unable to reference past content
Accumulate completed exchanges in archivedMessages and inject as reference text

Bedrock turn constraints
Inject archive as text rather than conversation messages

The reliability of this pattern comes down to "structural separation." Because Q2 is sent with messages in an empty state, the LLM cannot reference Q1's conversation turns without relying on system prompt instructions. The handling of the archive does require system prompt instructions, but a relatively simple instruction of "only use when the user explicitly requests a reference" is sufficient for this part.
I hope this serves as a reference for anyone who encounters a similar problem with RAG chatbots.

Solved the problem of mixed responses to multiple inquiries in an Amazon Bedrock RAG chatbot using the working buffer plus archive pattern

System Overview

Reproducing the Problem

Root Cause

Examining Potential Solutions

Option A: Operationally require pressing the clear button each time

Option B: Automatically reset messages after the final response

Option C: Working Buffer + Archive Pattern (adopted)

Implementation

Frontend Side

Backend Side: Model

Backend Side: Context Injection

Why Not Pass It as Conversation Messages

Adding Instructions to the System Prompt

Verification

Summary

AWS Topics

Trending Topics

Products & Services

Features and Series

Problem	Solution
Q1 response mixing into Q2	Reset `messages` (working buffer) after final response
Unable to reference past content	Accumulate completed exchanges in `archivedMessages` and inject as reference text
Bedrock turn constraints	Inject archive as text rather than conversation messages