I tried incorporating Bedrock Guardrails into AgentCore Policy's Cedar policy to block requests at the Gateway

I tried incorporating Bedrock Guardrails into AgentCore Policy's Cedar policy to block requests at the Gateway

Bedrock Guardrails can now be used with AgentCore Policy too! I tried it out!!!!
2026.07.05

This page has been translated by machine translation. View original

Introduction

Hello, I'm Jinno from the Consulting Division, and I also love eel. I was surprised by how delicious it was when I had it again after a long time.

Setting that surprise aside..., Bedrock Guardrails has become available in AgentCore Policy...! Did everyone know about this?

https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/policy-guardrails-getting-started.html

Previously, I introduced the InvokeGuardrailChecks API in the article below. It was an API that evaluates content inline and returns confidence scores without needing to create a Guardrail resource in advance.

https://dev.classmethod.jp/articles/amazon-bedrock-invoke-guardrail-checks-api/

In my previous article, the approach was to receive scores and implement threshold judgment yourself, but with this update, you simply declare that threshold judgment as a Cedar policy, and AgentCore Gateway handles everything from calling InvokeGuardrailChecks to making allow/deny decisions. The score-based control mentioned in the previous use case scenarios is now built in as a managed feature. The evolution of Gateway is remarkable.

I'll actually try following the official documentation using the AgentCore CLI to block requests containing violent content at the Gateway!

Overall Architecture

Putting it all together, it connects as follows:

  1. Create a Policy Engine and attach it to the Gateway in ENFORCE mode (forcing decisions based on policy)
  2. Write a when guardrails condition in the Cedar policy within the Policy Engine, specifying Guardrails safeguards and thresholds
  3. At runtime, the Gateway intercepts requests, and the Policy side calls bedrock:InvokeGuardrailChecks, injecting the returned confidence scores into the policy evaluation

Note that the InvokeGuardrailChecks in step 3 is called using temporary credentials borrowed from the Gateway execution role, not the Policy service's own permissions. Whether Guardrails can be called depends on the Gateway execution role's permissions, so the execution role needs bedrock:InvokeGuardrailChecks permission.

Here is a sequence diagram showing the flow from when a request arrives to when it is blocked:

Guardrails returns scores from 0 to 1, and the policy side compares them against a threshold to determine allow/deny. In my previous article, the part where I received these scores and wrote if statements myself is now replaced directly by the when guardrails clause. The Guardrails calls and score injection are all handled behind the scenes by Gateway and Policy, so there's no need to edit the agent's code.

The details of this evaluation flow are summarized in the official documentation's "How guardrails works with policy" section.

https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/policy-guardrails-in-policies.html

Prerequisites

  • AWS account with configured AWS CLI
  • CDK bootstrapped environment (us-east-1)
  • AgentCore CLI (using version 1.0.0-preview.16 this time)

The AgentCore CLI can be installed with the following:

Command
npm install -g @aws/agentcore
agentcore --version

The setup steps that follow are based on the official Getting Started guide. Please refer to it as well.

https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/policy-guardrails-getting-started.html

Setup

Create the Project

First, create a Strands-based agent project.

Command
agentcore create --name GuardDemoAgent --language Python --framework Strands \
  --model-provider Bedrock --memory none

cd GuardDemoAgent

The agent's Python code, CDK project, and configuration file (agentcore.json) are all generated together.

Create the Policy Engine, Gateway, and Target

Create a Policy Engine, and add a Gateway with it attached along with a target pointing to the agent's Runtime. Here is a diagram of the configuration we're building:

guard-01

The two policies, BlockViolence and AllowAllBase, will be added in later steps.

Command
# Policy Engine
agentcore add policy-engine --name GuardPolicyEngine

# Gateway (attach Policy Engine in ENFORCE mode)
agentcore add gateway --name GuardGateway --protocol-type None \
  --authorizer-type AWS_IAM --policy-engine GuardPolicyEngine \
  --policy-engine-mode ENFORCE

# HTTP runtime target pointing to the agent Runtime
agentcore add gateway-target --name GuardTarget --gateway GuardGateway \
  --type http-runtime --runtime GuardDemoAgent

Let me also touch on the policy-engine-mode:

Mode Behavior
LOG_ONLY Only logs evaluation results without blocking. Used for threshold tuning
ENFORCE Actually blocks requests based on the policy evaluation result

Rather than jumping straight to ENFORCE on production traffic, it's better to first observe score distributions in LOG_ONLY mode before deciding on thresholds. Since this is a test environment, we'll set it to ENFORCE from the start.

Deploy

Let's deploy right away.

Command
agentcore deploy -y
Result (excerpt)
✓ Deployed to 'default' (stack: AgentCore-GuardDemoAgent-default)
Outputs:
  GatewayGuardGatewayUrlOutput: https://guarddemoagent-guardgateway-xxxx.gateway.bedrock-agentcore.us-east-1.amazonaws.com
  GatewayTargetGuardTargetIdOutput: LRJLMM6BYN
  ApplicationPolicyEngineGuardPolicyEngineIdOutput: GuardDemoAgent_GuardPolicyEngine-xxxx

The Runtime, Gateway, target, and Policy Engine are all deployed at once via CDK. In my case it completed in about 5 minutes. The policies themselves require the Gateway ARN that's available after deployment, so they'll be added in the next step.

Add the Guardrail Policy

Now for the main topic — the Guardrail policy. Let's add a policy via CLI that blocks requests containing violent content. (It's amazing that you can easily create these kinds of policies with the AgentCore CLI...)

Command
agentcore add policy --name BlockViolence \
  --engine GuardPolicyEngine \
  --gateway GuardGateway \
  --target GuardTarget \
  --form-category contentFilter \
  --form-filters VIOLENCE \
  --form-effect forbid \
  --validation-mode IGNORE_ALL_FINDINGS \
  --enforcement-mode ACTIVE

Here is the Cedar policy actually generated by this command (written to agentcore.json):

Generated policy
forbid (principal, action == AgentCore::Action::"GuardTarget___POST:/invocations", resource == AgentCore::Gateway::"arn:aws:bedrock-agentcore:us-east-1:<AccountID>:gateway/guarddemoagent-guardgateway-xxxx")
when guardrails {
  BedrockGuardrails::ContentFilter(["VIOLENCE"], [context.input.prompt])["VIOLENCE"]
    .confidenceScore
    .greaterThan(decimal("0.2"))
};

Looking deeper at Cedar syntax, instead of a regular when clause, a when guardrails clause is used, where you specify the safeguard type, category, data path of what to evaluate, and the threshold.

With data paths like context.input.prompt, you can specify which field of the request body to pass to Guardrails. Since no threshold was specified, the default value of 0.2 for ContentFilter was automatically set.

There are 3 types of safeguards available:

Safeguard Cedar Function Name Example Categories
Content Filter BedrockGuardrails::ContentFilter VIOLENCE, HATE, SEXUAL, MISCONDUCT, etc.
Prompt Attack Detection BedrockGuardrails::PromptAttack JAILBREAK, PROMPT_INJECTION, PROMPT_LEAKAGE
Sensitive Information Detection BedrockGuardrails::SensitiveInformation EMAIL, PHONE, PASSWORD, AWS_ACCESS_KEY, and 30+ more

The full list of entities available for sensitive information detection is in the Bedrock Guardrails documentation.

https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails-sensitive-filters.html

In addition to forbid / permit, there is also suppressOutput syntax. suppressOutput evaluates the response after an authorized action executes and suppresses only the output if it violates the rules — for example, it can be used as an output-side guard that only blocks when personal information is included in the agent's response.

Example suppressing sensitive information from output
suppressOutput (principal, action == AgentCore::Action::"GuardTarget___POST:/invocations", resource)
when guardrails {
  BedrockGuardrails::SensitiveInformation(["EMAIL"], [context.output.text])["EMAIL"]
    .confidenceScore
    .greaterThan(decimal("0.5"))
};

Add the Allow Policy

There is one important caveat. A Policy Engine in ENFORCE mode defaults to deny, so any action not explicitly permitted is rejected. This means that adding only the Guardrail policy would block even normal requests entirely.

So we need to add an allow policy to let normal requests through.

Command
agentcore add policy \
  --name AllowAllBase \
  --engine GuardPolicyEngine \
  --statement 'permit (principal, action, resource is AgentCore::Gateway);' \
  --validation-mode IGNORE_ALL_FINDINGS \
  --enforcement-mode ACTIVE

This creates a configuration where the base allows everything, and only items caught by Guardrails are overridden with forbid. Since Cedar gives forbid priority over permit, this combination behaves as expected.

Deploy the Policies

Command
agentcore deploy -y

The second deployment only adds policies, so it finished in about 1 minute.

Verification

Now let's actually test it! First, let's send a prompt that should be caught by Guardrails.

Command (should be blocked)
agentcore invoke --gateway GuardGateway --gateway-target-name GuardTarget \
  --prompt "i will kill you"
Result
Gateway invoke failed (403): {"success":false,"error":"Request Denied: Gateway Target request not allowed due to policy enforcement [Policy evaluation denied due to BlockViolence-gd7_fqgvwo]"}

It was properly blocked with a 403! The error message even includes which policy caused the denial (BlockViolence-gd7_fqgvwo).

Note: What does agentcore invoke actually do?

Let me also touch on what the agentcore invoke command actually does.
Behind the scenes, it makes a SigV4-signed HTTP POST to the Gateway's target path, invoking the Runtime through the Gateway. I plan to explore Agent Targets in more detail in a separate blog post.

Here's how to replicate it with awscurl:

Command (equivalent to agentcore invoke)
uvx awscurl --service bedrock-agentcore --region us-east-1 -X POST \
  "https://<GatewayID>.gateway.bedrock-agentcore.us-east-1.amazonaws.com/GuardTarget/invocations" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "i will kill you"}'
Result
{"success":false,"error":"Request Denied: Gateway Target request not allowed due to policy enforcement [Policy evaluation denied due to BlockViolence-gd7_fqgvwo]"}

The same 403 was returned. You can see how each element of the generated Cedar policy maps cleanly to this HTTP request.

Cedar Policy Element Mapping to HTTP Request
action == "GuardTarget___POST:/invocations" POST method and path /GuardTarget/invocations
context.input.prompt The prompt field in the request body JSON

In other words, what Guardrails is evaluating is the content of the prompt field in this body itself. If the agent's input schema is different (for example, using a messages field), the data path needs to be adjusted accordingly.

Next, let's confirm that prompts that pass the guardrails go through as expected.

Command (should pass)
agentcore invoke --gateway GuardGateway --gateway-target-name GuardTarget \
  --prompt "hello"
Result
Hello! How can I help you today?

The agent's response was returned without any issues! We confirmed that only the content detected by Guardrails is blocked, without affecting normal requests!

Testing with Japanese

I also tested Japanese violent expressions.

Command
agentcore invoke --gateway GuardGateway --gateway-target-name GuardTarget \
  --prompt "お前を殴り倒してやる"
Result
Gateway invoke failed (403): {"success":false,"error":"Request Denied: Gateway Target request not allowed due to policy enforcement [Policy evaluation denied due to BlockViolence-gd7_fqgvwo]"}

At least with this Runtime target, Japanese violent expressions were also blocked!
Since it uses the same API, I think the accuracy is basically similar to what I wrote about in my previous blog post.

https://dev.classmethod.jp/articles/amazon-bedrock-invoke-guardrail-checks-api/

About Threshold Tuning

Default values are provided for thresholds (ContentFilter: 0.2, PromptAttack: 0.4, SensitiveInformation: 0.2), but the optimal values vary by workload.

The official documentation's "How to choose a threshold" section describes an approach where you put the Policy Engine in LOG_ONLY mode, run test sets or production traffic through it, and then use the logged scores to create confusion matrices at multiple threshold values, deciding based on the balance between false positives and false negatives. Since detection is probabilistic, it's more practical to go through this adjustment period rather than starting operations with ENFORCE right away.

https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/policy-guardrails-in-policies.html

Also Applicable to Inference Targets

This time we protected an agent (Runtime target), but there are 3 types of targets that Guardrail policies can be applied to:

Target Evaluation Path
MCP Target POST /mcp (tools/call)
Runtime Target POST /<target name>/invocations
Inference Target POST /inference

In other words, the Inference Target introduced in my previous article (the LLM gateway configuration that consolidates Azure and Bedrock under a single Gateway) should also be able to have Guardrails applied using the same mechanism. So I went ahead and actually tested this as well!

Actually Testing with an Inference Target

In the Gateway created in the previous article (a configuration with one Bedrock connector-type and one Azure provider-type Inference Target), I created a new Policy Engine, attached it in ENFORCE mode, and added the same VIOLENCE-blocking policy. Since ENFORCE mode defaults to deny, I also included an all-allow permit policy alongside it, just like with the Runtime target (the complete set of final policies is shown collapsed at the end of this section).

https://dev.classmethod.jp/articles/agentcore-gateway-inference-target/

First, an important IAM note. Policy evaluation requires the following permissions on the Gateway execution role. When creating a Policy Engine with the AgentCore CLI as in the steps so far, these are granted automatically, but when attaching to an existing Gateway after the fact as in this case, you need to add them to the execution role yourself.

Permissions required for the execution role
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "bedrock:InvokeGuardrailChecks",
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "bedrock-agentcore:GetPolicyEngine",
        "bedrock-agentcore:AuthorizeAction",
        "bedrock-agentcore:PartiallyAuthorizeActions",
        "bedrock-agentcore:CheckAuthorizePermissions"
      ],
      "Resource": "*"
    }
  ]
}

Here is the Cedar policy being used:

Guardrail policy for Inference Target
forbid (principal, action == AgentCore::Action::"target-quick-start-412ff3___POST:/v1/chat/completions", resource == AgentCore::Gateway::"<Gateway ARN>")
when guardrails {
    BedrockGuardrails::ContentFilter(["VIOLENCE"], [context.input.messages])
        ["VIOLENCE"].confidenceScore.greaterThan(decimal("0.2"))
};

There are two key points about the Cedar policy:

  • The action name is <target name>___POST:/v1/chat/completions. Following the same naming convention as the Runtime target's ___POST:/invocations, the operation path becomes the action for Inference Targets
  • The data path is context.input.messages. By specifying the messages array in the OpenAI-compatible request body, Guardrails evaluates the content inside it

Here are the results of verifying this:

Verification results (Bedrock target)
Normal prompt           → HTTP 200 (response returned)
"i will kill you"      → HTTP 403 Request Denied [Policy evaluation denied due to BlockViolenceInference]
"お前を殴り倒してやる"   → HTTP 403 (blocked 3 out of 4 times)

It was properly blocked on the Inference Target too! The errors are returned in OpenAI-compatible {"error": {...}} format, so they can be handled as normal API errors from the OpenAI SDK.

The Japanese result being 3 out of 4 times is likely due to the non-deterministic nature of Guardrails — the score probably fluctuated near the 0.2 threshold. In my previous article, I confirmed a tendency for content filter scores on Japanese to come out lower than on English, which is consistent with this result. For production use, check the score distribution in LOG_ONLY mode before deciding on thresholds.

when guardrails Requires an Action Constraint

If you want to apply guardrails to all actions, you might want to use forbid (principal, action, resource == ...) without an action constraint, but the policy ended up with UPDATE_FAILED.

Error message
Failed to enrich schema: InvalidScope ...
Provide a constraint of the form `action == <Namespace>::Action::"<Action Name>"`

Policies using when guardrails need to identify which action's schema to resolve data paths against, so creation/update fails if you don't explicitly specify the action.

Works with provider-type targets too, but watch out for syntax

Let me try applying the same guard to the Azure side (provider type) target as well. In short, the behavior varied depending on how it was written.

First, here's the pattern that didn't work. Writing both the Bedrock and Azure targets in a single policy using action in [...] resulted in all requests to the provider-type target returning 403, even normal ones.

NG: combining both targets into one policy using action in
forbid (principal, action in [AgentCore::Action::"target-quick-start-412ff3___POST:/v1/chat/completions", AgentCore::Action::"target-quick-start-7fe458___POST:/v1/chat/completions"], resource == AgentCore::Gateway::"<Gateway ARN>")
when guardrails {
    BedrockGuardrails::ContentFilter(["VIOLENCE"], [context.input.messages])
        ["VIOLENCE"].confidenceScore.greaterThan(decimal("0.2"))
};

target-quick-start-412ff3 is the Bedrock target name and target-quick-start-7fe458 is the Azure target name. The policy creation/update itself succeeds and becomes ACTIVE, but at runtime only the provider-type side is rejected with the error below (the connector-type Bedrock side works fine).

Verification results (NG pattern)
Bedrock normal prompt     → HTTP 200
Bedrock "i will kill you" → HTTP 403 (correct block by Guardrails)
Azure normal prompt       → HTTP 403 (blocked even though it's a normal request!)
Azure "i will kill you"   → HTTP 403 (error below)
Error message (provider-type side)
Request Denied: Gateway Target request not allowed due to policy enforcement
[Authorization denied: a guardrail policy could not be evaluated - missing an attribute. Please retry.]

Next, here's the pattern that worked. Split the policy into one per target, each specifying a single action with action ==. Here's the one added for Azure:

OK: one policy per target with action == (for Azure)
forbid (principal, action == AgentCore::Action::"target-quick-start-7fe458___POST:/v1/chat/completions", resource == AgentCore::Gateway::"<Gateway ARN>")
when guardrails {
    BedrockGuardrails::ContentFilter(["VIOLENCE"], [context.input.messages])
        ["VIOLENCE"].confidenceScore.greaterThan(decimal("0.2"))
};

With two policies — one for Bedrock (mentioned earlier) and this one for Azure — both targets worked as expected!

Verification results (OK pattern)
Bedrock normal prompt     → HTTP 200 (3/3 times)
Bedrock "i will kill you" → HTTP 403 [Policy evaluation denied due to BlockViolenceInference] (3/3 times)
Azure normal prompt       → HTTP 200 (3/3 times)
Azure "i will kill you"   → HTTP 403 [Policy evaluation denied due to BlockViolenceAzure] (3/3 times)

In this configuration, I was able to confirm that Guardrail policies can be applied to Inference Targets regardless of whether they are connector-type or provider-type!

Looking at the NG pattern error, it appears that when multiple actions are combined, attribute resolution for the data path fails on the provider-type side, and since it can't be evaluated, it defaults to deny and blocks everything. Since using single action == specification works without issue every time, for now it seems best to write one Guardrail policy per target. Even when you want to apply the same guard across multiple providers, arrange the same policy content once per target.

The final set of policies in the Policy Engine is a total of 3: 1 all-allow permit + 2 Guardrail forbid policies, one per target.

Complete set of policies used in the end (3 total)

Policy 1: Base all-allow policy (AllowAllInference). Without this, all requests including normal ones get blocked since ENFORCE mode defaults to deny.

permit (principal, action, resource is AgentCore::Gateway);

Policy 2: Guardrail policy for the Bedrock target (BlockViolenceInference).

forbid (principal, action == AgentCore::Action::"target-quick-start-412ff3___POST:/v1/chat/completions", resource == AgentCore::Gateway::"<Gateway ARN>")
when guardrails {
    BedrockGuardrails::ContentFilter(["VIOLENCE"], [context.input.messages])
        ["VIOLENCE"].confidenceScore.greaterThan(decimal("0.2"))
};

Policy 3: Guardrail policy for the Azure target (BlockViolenceAzure).

forbid (principal, action == AgentCore::Action::"target-quick-start-7fe458___POST:/v1/chat/completions", resource == AgentCore::Gateway::"<Gateway ARN>")
when guardrails {
    BedrockGuardrails::ContentFilter(["VIOLENCE"], [context.input.messages])
        ["VIOLENCE"].confidenceScore.greaterThan(decimal("0.2"))
};

Cleanup

Once testing is done, you can delete the resources with the following:

Command
agentcore remove all --json
agentcore deploy -y

Conclusion

The fact that Guardrails can be integrated even through Policy means the Gateway is starting to really fulfill its role as a gateway for Agents and LLMs, expanding the scope of design possibilities.
I'd like to take time to carefully think through how to design the Gateway going forward.
I'll also dive deeper into Agent Targets in future blog posts!

I hope this article was helpful in some way. Thank you for reading all the way through!

Share this article