I observed ChatGPT Web's DOM, communications, and API responses using the CDP capability of the Codex App

I observed ChatGPT Web's DOM, communications, and API responses using the CDP capability of the Codex App

2026.06.18

This page has been translated by machine translation. View original

Hey! I'm Nishimura Yuji from the Ops team!

In Codex's Browser Developer mode, in addition to a high-level API for browser operations, you can use the cdp capability to directly call the Chrome DevTools Protocol (CDP).

I confirmed this Developer mode / full CDP access locally with the Codex App's openai-bundled browser / chrome 26.609.71450.

You need to enable CDP in advance.

Screen to enable full CDP access in Browser settings

This time, I opened a logged-in ChatGPT from the Codex App and actually posted a short message, while testing how much could be observed with CDP.

This article covers the following 7 items.

  • Observing DOM changes
  • Monitoring network communication
  • Retrieving API response bodies
  • Taking screenshots
  • Generating PDFs
  • Retrieving console logs
  • Retrieving performance metrics

To get straight to the conclusion: click operations, console logs, and network communication were all controllable and observable from the Agent side. In practice, it seemed most manageable to delegate clicks and input to high-level APIs like tab.playwright, and delegate communication, responses, PDF, and Performance metrics observation to CDP.

What I Wanted to Test

CDP is the protocol used internally by Chrome DevTools. By sending commands to domains such as Network, Runtime, Page, and Performance, you can retrieve what's happening in the browser from the outside.

What I wanted to see was whether I could "operate an authenticated web app from the Codex App and retrieve information similar to what you'd see in DevTools' Network tab or Performance tab, all in the same flow."

The target, ChatGPT Web, returns a streaming response after sending a message and dynamically updates the screen. It seemed like a good subject for looking at DOM, Network, API responses, and Performance together.

Environment

  • Codex CLI 0.140.0
  • Codex App Browser / Chrome Plugin (openai-bundled browser / chrome 26.609.71450)
  • Browser Developer mode / full CDP access enabled
  • Logged-in ChatGPT session in Google Chrome
  • macOS 15.7.4
  • Confirmed on: 2026-06-16

First, Request the Observation from Codex

This time, I asked Codex App's chat to perform CDP observation targeting ChatGPT Web.

The request was roughly as follows.

prompt
Target the ChatGPT website and use CDP (Chrome DevTools Protocol) to observe browser behavior.

Actually post a message to the chat and check the following that occurs during that time.

- DOM manipulation
- Network communication monitoring
- Direct retrieval of API responses
- Screenshot capture
- PDF generation
- Console log retrieval
- Performance measurement

Since CDP can retrieve a lot of information, it's easier to organize the logs by deciding in advance what scope to look at.

Overall Flow

The actual verification proceeded in the following steps. It handled the operations without getting stuck anywhere.

  1. Open ChatGPT Web via the Chrome Plugin
  2. Enable CDP's Page, DOM, Runtime, Network, Performance, etc.
  3. Retrieve the CDP event cursor before sending
  4. Send a short test post to ChatGPT
  5. Read Network events and Runtime events after the cursor
  6. Retrieve the API response body from the necessary requestId
  7. Save screenshots, PDF, and Performance metrics

At minimum, the CDP setup looks like this.

cdp-setup.js
const cdp = await tab.capabilities.get("cdp");

for (const [method, params] of [
  ["Page.enable", {}],
  ["DOM.enable", {}],
  ["Runtime.enable", {}],
  ["Log.enable", {}],
  ["Network.enable", {}],
  ["Performance.enable", {}],
]) {
  await cdp.send(method, params);
}

On the other hand, for entering text in ChatGPT's input field and submitting, it used tab.playwright rather than CDP.

send-message.js
const prompt = "This is a CDP observation test. Please respond briefly with just 'Observation OK'.";

const input = tab.playwright.getByRole("textbox", {
  name: "Chat with ChatGPT",
});
await input.fill(prompt);

const sendButton = tab.playwright.getByRole("button", {
  name: "Send prompt",
});
await sendButton.click();

Rather than pushing all page operations to CDP, it's more readable and easier to isolate failures to separate input and clicks into high-level APIs and observation into CDP.

Results on Screen

The following test post was sent to ChatGPT.

chatgpt-prompt.txt
This is a CDP observation test. Please respond briefly with just 'Observation OK'.

The on-screen response was Observation OK. The screenshot below was also captured by the Agent.

State with ChatGPT's response displayed

From here, the Agent retrieved what was happening behind the scenes during this single post and summarized the content. Let's take a look at that.

What Was Visible in the DOM

DOM changes were observed by placing a MutationObserver inside the page with Runtime.evaluate. The target was elements under main.

The results were as follows.

Item Result
Total mutations 37
childList 33
characterData 4
Added nodes 19
Removed nodes 14

From when the Thinking display appeared to when the response body was shown, it could be tracked as childList and characterData changes.

Additionally, the DOM structure at that point in time was also retrieved with DOMSnapshot.captureSnapshot.

Item Result
document 3
node 1,563
layout node 1,085
string table 1,381

MutationObserver seems suited for tracking "what changed," while DOMSnapshot seems suited for recording "what the structure looked like at that point in time."

What Was Visible in Network

For Network events, a cursor was taken before the message was sent, and only events after that cursor were read after sending.

network-events.js
const before = await cdp.readEvents({
  limit: 1,
  timeoutMs: 1000,
});

// Send message after this

const observed = await cdp.readEvents({
  afterSequence: before.cursor,
  limit: 1000,
  methods: [
    "Network.requestWillBeSent",
    "Network.responseReceived",
    "Network.loadingFinished",
    "Network.loadingFailed",
  ],
  timeoutMs: 5000,
});

After the pre-send cursor, 381 CDP events and 163 Network requests were observed.

The main endpoints visible were as follows.

Method Path Observed Content
POST /backend-api/f/conversation Message sending and response stream
POST /backend-api/f/conversation/prepare Preparation before sending conversation
GET /backend-api/conversation/{id}/stream_status Stream status
GET /backend-api/conversation/{id}/textdocs Retrieving textdocs
GET / POST /backend-api/sentinel/* Sentinel-related
POST /ces/v1/* Telemetry / event-related

The target communication is /backend-api/f/conversation, but in an actual web app, related preparation processes, status checks, and telemetry all flow simultaneously. When reading with CDP, it's good to assume you'll be filtering by URL, method, and mimeType.

What Was Visible in the API Response Body

By recording the requestId from Network.responseReceived, you can retrieve the response body with Network.getResponseBody.

The main subject this time was the text/event-stream from /backend-api/f/conversation.

response-body.js
const responseBody = await cdp.send("Network.getResponseBody", {
  requestId: conversationRequestId,
});

const lines = responseBody.body
  .split("\n")
  .filter((line) => line.startsWith("data: "))
  .map((line) => line.slice("data: ".length));

The retrieved SSE body was 24,750 characters. There were 27 data: events, and [DONE] was confirmed at the end.

On screen, Observation OK was displayed, but the same string was also confirmed as an assistant response delta in the API stream.

response-delta.txt
assistant delta: $.v[0].v = "Observation OK"

Supplementary APIs were also confirmed.

API Result
/backend-api/conversation/{id}/stream_status {"status":"IS_STREAMING"}
/backend-api/conversation/{id}/textdocs []
/backend-api/f/conversation/prepare {"status":"ok", ...}

The conversation/prepare response contained a short-lived token, so the value was not saved, and only the fact that it returned status: ok was recorded.

Saving Screenshots and PDFs

Screenshots were taken using tab.screenshot(), the high-level API on the Codex App side.

screenshot.js
const screenshot = await tab.screenshot({
  fullPage: false,
  clip: {
    x: 270,
    y: 0,
    width: 1260,
    height: 768,
  },
});

Initially the entire screen was saved, but it included the personal area in the sidebar. For images used in public articles, only the conversation body side was cropped.

The PDF was generated with CDP's Page.printToPDF.

print-to-pdf.js
const pdf = await cdp.send("Page.printToPDF", {
  printBackground: true,
  preferCSSPageSize: true,
});

The PDF generated this time was 2 pages, approximately 49 KB. Screenshots seem useful for visual confirmation, while PDFs seem useful as artifacts to preserve the page state at the time of verification.

State with the CDP-generated PDF opened

Console and Performance Can Also Be Retrieved

Console logs were retrieved by subscribing to Runtime.consoleAPICalled. This time, as a confirmation of log retrieval, one console.info was fired via CDP and the event was read.

As a result, one info event from Runtime.consoleAPICalled was retrieved. This was not an investigation of naturally occurring app errors, but a confirmation that console events can be captured.

For Performance metrics, Performance.getMetrics was called before and after the message was sent, and the difference was examined.

Metric Difference
LayoutCount +234
RecalcStyleCount +2,675
LayoutDuration +0.064289
RecalcStyleDuration +0.835087
ScriptDuration +0.837414
TaskDuration +4.095959
Nodes +2,087

These numbers are from a single execution, so they are not rigorous benchmarks. They are values mixed with the transition from a new ChatGPT chat to the conversation URL, frame cleanup after initialization, and response rendering.

However, they could be useful for running the same operation multiple times and comparing, or for looking at trends before and after changes.

Organizing the Division of Roles

Within the scope tested this time, it was more practical to divide roles than to push everything to CDP.

What You Want to Do More Convenient Approach
Clicks, input, on-screen element operations tab.playwright
Observing DOM changes Runtime.evaluate + MutationObserver
Monitoring communication Network.* events
Retrieving API response bodies Network.getResponseBody
PDF generation Page.printToPDF
Performance metrics Performance.getMetrics

CDP seems to fit best not as "the only means of operating the browser," but as a tool for incorporating DevTools-equivalent observation into the Codex App's work log.

Impressions from Trying It Out

Until now, when I needed to check the backend of a web app, I would manually open the browser's DevTools and visually inspect the Network tab or Console tab. Using the Codex App's CDP capability, the Agent can handle those checks, making investigations much easier. Also, since it can be written out as steps, it becomes easier to retrace the same observation later.

What was particularly good was being able to compare the results visible on screen with the results on the API stream in one continuous flow without switching tools. This time, Observation OK was displayed on screen, and the same Observation OK was confirmed in the SSE assistant delta. It was possible to confirm that the display and communication matched within a single flow.

On the other hand, when handling an authenticated session, sensitive values not shown on screen can be retrieved, so caution is needed.

Summary

I opened ChatGPT Web via the Codex App's Chrome Plugin and used the CDP capability to confirm the following.

  • DOM changes could be observed with MutationObserver and DOMSnapshot
  • The SSE communication for /backend-api/f/conversation could be tracked from Network events
  • The response delta Observation OK could be directly confirmed with Network.getResponseBody
  • The conversation page could be converted to PDF with Page.printToPDF
  • Runtime.consoleAPICalled and Performance.getMetrics could also be retrieved

When you want to check both the "results shown on screen" and "the communication happening in the background" of a web app in the same verification log, the Codex App's CDP capability seems practically usable.

I hope this is helpful to someone.


Reference Links:


国内企業 AI活用実態調査2026 配布中

クラスメソッドが独自に行なったAI診断調査をもとに、企業のAI活用の現在地を調査レポートとしてまとめました。企業規模別の活用度傾向に加え、規模を超えてAI活用を進める企業に共通する取り組みまで、自社の現在地を捉えるためのヒントにぜひ。

国内企業 AI活用実態調査2026

無料でダウンロードする

Share this article