[アップデート] Amazon Bedrock Guardrails の検出アクションで Detect モードが利用可能になりました

2025.04.09
こんにちは！クラウド事業本部コンサルティング部のたかくに（@takakuni_）です。
Amazon Bedrock Guardrails の検出アクションで Detect モードが利用可能になりました。（以下のポストの一部分を抽出して投稿しています）
https://aws.amazon.com/about-aws/whats-new/2025/04/amazon-bedrock-guardrails-safely-build-generative-ai-applications/
発表に合わせ AWS Blog も公開されています。
https://aws.amazon.com/blogs/aws/amazon-bedrock-guardrails-enhances-generative-ai-application-safety-with-new-capabilities/
 Detect モードDetect モードは AWS WAF の Count モードのような立ち位置です。
従来、Amazon Bedrock Guardrails では、アクションに Block (BLOCK) または Mask (ANONYMIZE) が利用できました。
今回のような検出されるかどうかを事前にチェックする方法がありませんでした。
今回の Detect (NONE) モードでは、Block や Mask を行わず、検出されるかどうかのみがトレースに記録されるモードになります。非常に便利ですね。
https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails-harmful-content-handling-options.html
 適用範囲今回の Detect モードはすべてのフィルターに対してのアクションで利用できます。次のようなイメージで選択できます。
 やってみたそれでは実際に検出モードを Detect に設定し、どのようにトレースが表示されるのか試してみます。今回は、有害カテゴリとプロンプト攻撃のフィルターを Detect モードで設定します。
マネジメントコンソールでテストしてみました。コンテンツフィルターで引っかかっているものの、ブロックは行わず、モデルにプロンプトが渡され、応答が行われています。
SDK 形式でも、どのように返信が返ってくるのか試してみます。サンプルコードがあったため、そちらを流用しながら必要な情報を付け足します。
app.py
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to use a guardrail with the <noloc>Converse</noloc> API.
"""

import logging
import json
import boto3


from botocore.exceptions import ClientError


logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


def generate_conversation(bedrock_client,
                          model_id,
                          messages,
                          guardrail_config):
    """
    Sends a message to a model.
    Args:
        bedrock_client: The Boto3 Bedrock runtime client.
        model_id (str): The model ID to use.
        messages JSON): The message to send to the model.
        guardrail_config : Configuration for the guardrail.

    Returns:
        response (JSON): The conversation that the model generated.

    """

    logger.info("Generating message with model %s", model_id)

    # Send the message.
    response = bedrock_client.converse(
        modelId=model_id,
        messages=messages,
        guardrailConfig=guardrail_config
    )

    return response


def main():
    """
    Entrypoint for example.
    """

    logging.basicConfig(level=logging.INFO,
                        format="%(levelname)s: %(message)s")

    # The model to use.
+    model_id="apac.anthropic.claude-3-5-sonnet-20241022-v2:0"

    # The ID and version of the guardrail.
+    guardrail_id = "mcfckqgoekhi"
    guardrail_version = "DRAFT"

    # Configuration for the guardrail.
    guardrail_config = {
        "guardrailIdentifier": guardrail_id,
        "guardrailVersion": guardrail_version,
        "trace": "enabled"
    }

+    text = "XXXXXXX" # ここに内容を入れる
    context_text = "Only answer with a list of songs."

    # The message for the model and the content that you want the guardrail to assess.
    messages = [
        {
            "role": "user",
            "content": [
                {
                    "text": context_text,
                },
                {
                    "guardContent": {
                        "text": {
                            "text": text
                        }
                    }
                }
            ]
        }
    ]

    try:

        print(json.dumps(messages, indent=4))

        bedrock_client = boto3.client(service_name='bedrock-runtime')

        response = generate_conversation(
            bedrock_client, model_id, messages, guardrail_config)
+        print("#########################################")
+        print("Response:")
+        print(json.dumps(response, indent=4))
+        print("#########################################")

        output_message = response['output']['message']

        if response['stopReason'] == "guardrail_intervened":
            trace = response['trace']
            print("Guardrail trace:")
            print(json.dumps(trace['guardrail'], indent=4))

        for content in output_message['content']:
            print(f"Text: {content['text']}")

    except ClientError as err:
        message = err.response['Error']['Message']
        logger.error("A client error occurred: %s", message)
        print(f"A client error occured: {message}")

    else:
        print(
            f"Finished generating text with model {model_id}.")


if __name__ == "__main__":
    main()
https://docs.aws.amazon.com/ja_jp/bedrock/latest/userguide/guardrails-use-converse-api.html
実行結果は次のとおりです。trace にガードレールの情報が記録されています。
inputAssessment にモデルの応答が行われているものの、フィルタに引っかかっていることがわかりますね。
takakuni % python app.py
[
    {
        "role": "user",
        "content": [
            {
                "text": "Only answer with a list of songs."
            },
            {
                "guardContent": {
                    "text": {
                        "text": "XXXX"
                    }
                }
            }
        ]
    }
]
INFO:__main__:Generating message with model apac.anthropic.claude-3-5-sonnet-20241022-v2:0
#########################################
Response:
{
    "ResponseMetadata": {
        "RequestId": "42c24bbc-d744-49c6-acd7-d7c2e175bda8",
        "HTTPStatusCode": 200,
        "HTTPHeaders": {
            "date": "Tue, 08 Apr 2025 23:49:54 GMT",
            "content-type": "application/json",
            "content-length": "1395",
            "connection": "keep-alive",
            "x-amzn-requestid": "42c24bbc-d744-49c6-acd7-d7c2e175bda8"
        },
        "RetryAttempts": 0
    },
    "output": {
        "message": {
            "role": "assistant",
            "content": [
                {
                    "text": "Here are some songs:\n- \"You Oughta Know\" - Alanis Morissette \n- \"Break Stuff\" - Limp Bizkit\n- \"I Don't Give A...\" - Madonna\n- \"Killing in the Name\" - Rage Against The Machine\n- \"Don't Care\" - Fall Out Boy\n- \"So What\" - Pink\n- \"You're Breaking My Heart\" - Harry Nilsson"
                }
            ]
        }
    },
    "stopReason": "end_turn",
    "usage": {
        "inputTokens": 16,
        "outputTokens": 101,
        "totalTokens": 117
    },
    "metrics": {
        "latencyMs": 3273
    },
    "trace": {
        "guardrail": {
            "inputAssessment": {
                "mcfckqgoekhi": {
                    "contentPolicy": {
                        "filters": [
                            {
                                "type": "INSULTS",
                                "confidence": "HIGH",
                                "filterStrength": "HIGH",
                                "action": "NONE"
                            }
                        ]
                    },
                    "invocationMetrics": {
                        "guardrailProcessingLatency": 184,
                        "usage": {
                            "topicPolicyUnits": 0,
                            "contentPolicyUnits": 1,
                            "wordPolicyUnits": 0,
                            "sensitiveInformationPolicyUnits": 0,
                            "sensitiveInformationPolicyFreeUnits": 0,
                            "contextualGroundingPolicyUnits": 0,
                            "contentPolicyImageUnits": 0
                        },
                        "guardrailCoverage": {
                            "textCharacters": {
                                "guarded": 4,
                                "total": 37
                            }
                        }
                    }
                }
            },
            "outputAssessments": {
                "mcfckqgoekhi": [
                    {
                        "invocationMetrics": {
                            "guardrailProcessingLatency": 200,
                            "usage": {
                                "topicPolicyUnits": 0,
                                "contentPolicyUnits": 1,
                                "wordPolicyUnits": 0,
                                "sensitiveInformationPolicyUnits": 0,
                                "sensitiveInformationPolicyFreeUnits": 0,
                                "contextualGroundingPolicyUnits": 0,
                                "contentPolicyImageUnits": 0
                            },
                            "guardrailCoverage": {
                                "textCharacters": {
                                    "guarded": 268,
                                    "total": 268
                                }
                            }
                        }
                    }
                ]
            }
        }
    }
}
#########################################
Text: Here are some songs:
- "You Oughta Know" - Alanis Morissette 
- "Break Stuff" - Limp Bizkit
- "I Don't Give A..." - Madonna
- "Killing in the Name" - Rage Against The Machine
- "Don't Care" - Fall Out Boy
- "So What" - Pink
- "You're Breaking My Heart" - Harry Nilsson
Finished generating text with model apac.anthropic.claude-3-5-sonnet-20241022-v2:0.
 まとめ以上、「Amazon Bedrock Guardrails の検出アクションで Detect モードが利用可能になりました。」でした。
事前にチェックできるのはありがたいですね。将来的に AWS WAF のコンソールのような視覚化された UI があると面白そうです。
このブログがどなたかの参考になれば幸いです。クラウド事業本部コンサルティング部のたかくに（@takakuni_）でした！
[アップデート] Amazon Bedrock Guardrails の検出アクションで Detect モードが利用可能になりました

Detect モード

適用範囲

やってみた

まとめ

関連記事

主なカテゴリ

AWSで探す

注目のテーマ

プロダクトやサービスで探す

特集やシリーズから探す

お問い合わせ

運営会社