The Agentic Evolution - What Anthropic Says About Claude and the Current Age of Agents - AWS Summit Hamburg 2026 Report

The Agentic Evolution - What Anthropic Says About Claude and the Current Age of Agents - AWS Summit Hamburg 2026 Report

I will summarize what I heard from the Claude development team at the Anthropic session at AWS Summit Hamburg 2026 about "the reality of the first year of agents" and "why the tide turned in 2026." It became clear how our own task design should change in response to improvements in model capabilities.
2026.05.23

This page has been translated by machine translation. View original

This is Keisuke from the Berlin office.

I recently attended AWS Summit Hamburg 2026, held in Hamburg. It was my first time attending an AWS Summit, and everything was a fresh experience — the scale of the venue, the buzz around the booths, and the energy of the attendees.
IMG_1551

Among the various sessions I attended, the one that left the biggest impression on me was "The Agentic Evolution (sponsored by Anthropic)," presented by Anthropic. The speaker was from Anthropic's Applied AI team — the people who actually go to customers and help them integrate Claude into their workflows — and I liked how they shared what they're seeing on the ground.

Anthropic as a Company

Just to give a quick recap: Anthropic is an AI research lab founded only about five years ago, with over 2,000 employees. It's also well known that many of its co-founders came from OpenAI.

What was emphasized at the start was that Anthropic describes itself not as an "AI company" but as a "research lab." Their core mission is to help the world transition smoothly and safely to the powerful AI that's coming, and their commitment to building systems for deploying agents safely in real-world environments stems from that background.
IMG_1552

What Is an Agent, Anyway?

After a preface that "every customer has a different definition of 'agent,'" Anthropic's own definition was shared:

An agent is a system that dynamically decides its own processing steps and tool usage, and holds control over how it achieves a task.

It's a somewhat formal definition, but they followed up by saying "in short, it's an autonomous worker — something you give a task to, and it works through a loop to solve it on its own," which made much more intuitive sense.

The key point is that they advocate for agents where the LLM decides its own approach — not "workflow-type" agents (which follow predefined steps via prompt chaining).

The reason is simple: every time a model gets smarter, workflow-type agents are constrained by their structure and hard to improve, whereas autonomous loop-type agents directly benefit from improvements in model capability. This has been said before, but hearing it directly from Anthropic still carried a lot of weight.

Was 2025 Really the "Year of the Agent"?

While 2025 was widely called the "Year of the Agent," the speaker reflected that "honestly, it didn't quite live up to that."

There was a moment during the session where the audience was asked "Who has actually deployed an agent in production?" — and only a handful of hands went up.

That said, the speaker's view was that this barrier started breaking down in 2026. Here's a rough timeline:

  • October 2024: Claude 3.5 Sonnet. The model that sparked a surge in Claude's popularity on Cursor.
  • March 2025: Claude Code released as a research preview. Anthropic's first full-fledged agent product.
  • November 2025: Opus 4.5. The period the industry broadly called "the inflection point."
  • February 2026: Opus 4.6
  • Most recently: Opus 4.7

The repeated emphasis that "Opus 4.5 was where the tide turned" stuck with me. For example, at Spotify, top engineers reportedly haven't written a single line of code themselves since December — they now work by sending instructions from their phones via Slack. More on that later.

Claude Code and Claude Cowork

Anthropic currently offers two main agent products.

The first is Claude Code. Aimed at developers, it's a harness that runs in the terminal. Using bash commands and MCP servers, Claude writes code, builds it, reviews errors, and fixes them on its own in a loop.

The Auto Mode feature introduced here was interesting. Normally, Claude Code asks the developer "Is it okay to run this command?" as a Yes/No prompt, but Auto Mode allows these to be automatically approved. Apparently, "designing so that humans don't become the bottleneck" is key to long-running tasks.

The second is Claude Cowork. Aimed at non-engineers, it brings the philosophy of Claude Code into a chat UI.

The announcement that stood out most here was that Claude Cowork is now available via Amazon Bedrock. European enterprises — especially German companies — often face strict data residency requirements that make using the official Anthropic API difficult. The fact that the same functionality is available through Bedrock lowers the barrier to adoption locally, which is quite significant.

According to the speaker, Claude Cowork is essentially a "general-purpose harness" provided by Anthropic, and industry-specific needs are something each company builds on top of it. For example, Lagora for legal and Harvey for healthcare aren't using Claude Cowork as-is — they've built their own harnesses incorporating domain knowledge and MCP servers.

Real-World Case Studies

Spotify: Building Custom Solutions on Top of Claude Code

Spotify's codebase is a multi-repo setup with thousands of intertwined repositories, and running plain Claude Code at the repo root wasn't enough.

So they used the Agent SDK (the foundational layer of Claude Code) to build an in-house agent tailored to their specific tech stack. It was introduced with the note that "the best engineers haven't written a single line of code since December — they now work by sending instructions from their phones via Slack."

Honestly, this was a bit of a shock. We're not talking about "code completion" — this is "fully delegating the role of writing code."

Novo Nordisk: Clinical Trial Reports from 10 Weeks to 10 Minutes

This was an automation case study using Claude as a model — without harnesses like Claude Code.

The preparation of 300-page clinical trial reports, which previously took 10 weeks, was reduced to 10 minutes. Of course, the final numbers and conclusions are still reviewed by humans, but the manual preparation work was dramatically compressed.

"The quality of the prompts and harness you build determines the quality of the value you get" — the speaker's words felt perfectly illustrated by this case.

Lovable: A Custom AI Service Built on Claude

Lovable is an example of building a completely custom agent loop and calling Claude (the model) from within it, without using Claude Code at all.

What was interesting was that Anthropic itself said outright: "We don't see companies like Cursor, Lovable, or Vercel building their own harnesses as a threat." In fact, they see the growth of these companies as expanding the overall market.

Anthropic Economic Index

Something that highlighted Anthropic's identity as a research lab was the introduction of the Anthropic Economic Index — a report that periodically tracks how AI is spreading through global economic activity. I was hearing about it for the first time.

Among the points emphasized, the one that hit hardest was: "People who are extracting value from AI tend to have more refined prompts and more sophisticated ways of using models."

That resonated with me. Using Claude just as a "convenient question tool" caps its value, whereas crafting prompts and harnesses the way Novo Nordisk did is what enables transformations like 10 weeks → 10 minutes.

The report also included data on Germany:

  • Germany's usage is higher than predicted relative to its population (Claude is being used more than expected)
  • However, it doesn't stand out dramatically in country rankings — adoption is gradual

This was introduced with a laugh as "very German caution," which got laughs from the German attendees in the audience too. Coding, email writing, research, and business strategy formulation are cited as typical use cases in Germany.

This was the climax of the session. Three themes were presented as Anthropic's vision for the future of agents.

1. Multi-Agent Systems

A trend toward specialization and division of labor among agents, rather than one agent doing everything. Lagora's legal agent was cited as an example.

The explanation — "it's the same concept as humans working in teams with divided responsibilities, and agents will form teams too" — was easy to understand. The message was that when building internal tools, you should think not about "one agent doing everything" but about "multiple agents with defined roles."

2. Long-running Execution

The part where the speaker spoke with particular passion, saying "this is still underestimated by many." Personally, this was the most impactful part of the session for me.

According to METR's research, the time horizon over which Claude can operate autonomously has grown dramatically over the past year:

  • Claude 3.5 Sonnet (October 2024): Autonomous execution maxed out at a few minutes
  • Claude Opus 4.7 (most recently): Can execute autonomously for up to 12 hours

In other words: "If it can work for 12 hours, you need to give it tasks large enough to match that — otherwise you're not getting the full value of the model."

There was a memorable question:

If a colleague could work 20 or 50 hours on your behalf, wouldn't you want to give them more ambitious and challenging tasks?

If you keep using it like a ChatGPT-style quick Q&A tool, the experience doesn't change much even as model generations advance. "Redesigning your task structure and infrastructure with the premise of giving larger tasks" is said to be the key.

Applying this to my own work, I realized "I'm clearly giving Claude tasks that undersell its capabilities" — and that thought stuck with me for a while.

3. Genuine Collaboration

Just because agents can run for a long time doesn't mean you should hand everything off to them entirely.

The little quiz presented here was excellent:

Agent A: "always going to be right and never ask for help"
Agent B: "knows when to ask for help"
Which would you choose?

The entire audience chose B. Of course.

To make this a reality, you need to "give Claude an escape route" — and the selection-based confirmation UI that occasionally appears in Cowork demos is apparently exactly an implementation of this. The point that "designing in the ability to ask a human when unsure about a judgment" is necessary — otherwise long-running autonomous execution and going off the rails are just one step apart — made a lot of sense.

How to Get Started in Practice

In the closing summary, Anthropic offered some practical advice.
IMG_1553

First, find a recurring task where solving it would compress 80% of your work, and start there. For example, something like "consolidating data you're currently gathering manually across five different tools into one place."

Next, rather than breaking that task down into detailed steps like "first get this data, then analyze it over here...," hand Claude the task and the necessary tools, and observe how far it can proceed autonomously. Once you can see where Claude gets stuck, reinforce the context there or add a verification loop — that's the right approach.

And finally, the session closed with the message that the feature gap between Bedrock and the official Anthropic API is narrowing, so people in regulated industries should take another look at Bedrock.

Putting the safest AI on the most secure hardware — that's the value proposition of the Anthropic and AWS combination.

That phrase was a fitting close to the entire session.

Impressions

It was my first AWS Summit, but it turned out to be a day with far more to learn than I'd imagined. Among all the sessions, this Anthropic one was a rare opportunity to hear directly from the people behind the Claude I use every day — what they're thinking — and I'm really glad I attended.

What stuck with me long after the session ended was the message: "Give Claude bigger tasks." In my day-to-day work, there are definitely many moments where I shrink back thinking "should I really have Claude do this much?" — and I felt the need to update my approach to task design, keeping in mind that the model's capabilities are continuing to grow.

The energy in the room, the grounded and practical perspective from Anthropic — all of it was stimulating. I'd love to attend again next year.

Reference Documents


生成AI活用はクラスメソッドにお任せ

過去に支援してきた生成AIの支援実績100+を元にホワイトペーパーを作成しました。御社が抱えている課題のうち、どれが解決できて、どのようなサービスが受けられるのか?4つのフェーズに分けてまとめています。どうぞお気軽にご覧ください。

生成AI資料イメージ

無料でダウンロードする

Share this article

AWSのお困り事はクラスメソッドへ