[アップデート] Amazon Bedrock Data Automation で動画ファイルのカスタム出力（カスタムブループリント）が作成できるようになりました

2025.05.18

こんにちは！クラウド事業本部コンサルティング部のたかくに（@takakuni_）です。
What's New にはまだ出てきていないですが、Amazon Bedrock Data Automation で動画ファイルのカスタム出力（カスタムブループリント）が作成できるようになりました。
Changes  Add support for VIDEO modality to BlueprintType enum.
https://awsapichanges.com/archive/changes/81c9cc-bedrock-data-automation.html
先日、音声ファイルのカスタム出力をサポートしたばかりですが、ついにコンプリートですね。
https://dev.classmethod.jp/articles/amazon-bedrock-data-automation-extraction-custom-insights-audio/
 アップデート内容Amazon Bedrock Data Automation（以後、BDA）における、カスタム出力（カスタムブループリント）は、標準出力で取得されないフィールドを抽出/生成するための機能です。
BDA では、動画ファイルに対して以下の抽出/生成を標準出力でサポートしています。
 抽出フルオーディオトランスクリプト
ビデオからオーディオ全体のトランスクリプトを抽出

ビデオ内のテキスト
ビデオ内のテキスト検出

コンテンツモデレーション
ビデオとオーディオから露骨で有害なコンテンツを検出

ロゴ
ビデオ内で検出された認識可能なロゴを抽出

 生成ビデオサマリー
ビデオ全体のサマリーの生成

ビデオの章のサマリー
ビデオの各章のサマリーの生成

IAB タクソノミー
Interactive Advertising Bureau (IAB) カテゴリの生成

https://docs.aws.amazon.com/bedrock/latest/userguide/bda-ouput-video.html
上記の標準出力のみの場合、オーディオ全体のトランスクリプト や オーディオサマリー、ビデオ内のテキスト が書き起こされて終わりのため、ネクストアクションの洗い出しや参加者の規模、表情、画面共有した時の図の理解は別途調べ直したり、別の LLM を使う必要があったりします。
カスタムブループリントでは、モダリティ（今回だと動画）に対して LLM を利用して「アクションアイテムを洗い出してください」や「参加者の規模、表情を洗い出してください」、「画面共有時にスライドが出てきた場合、マークダウン形式に書き起こしてください」などの指示を渡し、標準出力で取得されないフィールドを抽出/生成できます。
アップデート前まで、カスタムブループリントは文字、画像、音声ファイルのみをサポートしていたのですが、今回のアップデートで動画ファイルもサポートしました。
https://docs.aws.amazon.com/bedrock/latest/userguide/creating-blueprint-video.html
 やってみるそれでは、実際にブループリントを利用して動画ファイルのカスタム出力を体験してみたいと思います。
今回はサンプルブループリントを使って、抽出の具合を確認します。カスタム出力設定からサンプルブループリントを選択します。
VIDEO モダリティのサンプルとして、 re:Invent の Keynote と Meida Search が増えていますね。今回は Keynote-Highlight をクリックします。
ブループリントを利用して、動画ファイルからの抽出/生成が始まりました。
5 分の動画に対して、2 分かからないくらいで抽出が完了しました。
Dr. Swami が登壇していますね。この Dr. Swami は、いつの Dr. Swami かご存知ですか？
2024 年の re:Invent と見せかけて、2023 年の re:Invent の動画です。
余談はさておき、ブループリントでは事前にフィールドが定義されています。
定義されているフィールドの内容は以下のとおりです。


フィールド名
説明
データ型
取得元
取得方法


broadcast-audience-engagement
The level of engagement or interaction between the speakers and the audience
String (Enum: 3)
Video
Inferred

broadcast-audience-size
The size of the audience present at the event
String (Enum: 4)
Video
Inferred

broadcast-event-details
The event or occasion where the broadcast or training session is taking place, such as the name, theme, date, and time
Custom (eventdetails)
Video
-

broadcast-event-name
The official name or title of the event where the broadcast or training session is taking place
String
-
Explicit

broadcast-event-theme
The overarching theme, topic, or subject matter of the event
String
-
Explicit

broadcast-number-of-speakers
The total number of speakers or presenters featured in the video
Number
Video
Inferred

broadcast-presentation-topics
A list of key topics, subjects, or themes covered in the presentation or training session
Array of String
Video
Inferred

broadcast-setting
The physical setting or environment where the broadcast or training session is taking place
String (Enum: 5)
Video
Inferred

broadcast-video-chapter-details
Detailed information about individual chapters within a video.
Custom (chapterdetails)
Chapter
-

chapter-key-message
Key message extracted from the video.
String
-
Explicit

chapter-title
The title of the video chapter.
String
-
Explicit

broadcast-video-speakers
The primary speaker or presenter featured in the video, including name, title, and other relevant information
Custom (speaker)
Video
-

broadcast-speakers-expertise
The speaker's area of expertise or specialization relevant to the presentation topic if evident, could be empty otherwise
String
-
Inferred

broadcast-speakers-name
The name of the speaker or presenter if evident, could be empty otherwise
String
-
Inferred

broadcast-speakers-organization
The company, institution, or organization that the speaker is affiliated with or representing if evident, could be empty otherwise
String
-
Inferred

broadcast-speakers-title
The professional title or role of the speaker, such as 'CEO', 'Professor', or 'Consultant' if evident, could be empty otherwise
String
-
Inferred

broadcast-visual-aids
A list of notable visual aids or materials used during the presentation, such as slides, diagrams, or demonstrations
Array of String
Video
Inferred

日本語にすると、次の指示が示されています。
broadcast-visual-aids で定義されているように、スライドの図の認識などはカスタム出力の醍醐味だと思います。


フィールド名
説明
データ型
取得元
取得方法


broadcast-audience-engagement
スピーカーと聴衆のエンゲージメントやインタラクションのレベル
String (Enum: 3)
Video
Inferred

broadcast-audience-size
イベントに参加している聴衆の規模
String (Enum: 4)
Video
Inferred

broadcast-event-details
イベントの詳細（名前、テーマ、日付、時間など）
カスタム (eventdetails)
Video
-

broadcast-event-name
イベントの公式名称やタイトル
String
-
Explicit

broadcast-event-theme
イベントのテーマや主題
String
-
Explicit

broadcast-number-of-speakers
登壇者またはプレゼンターの総数
Number
Video
Inferred

broadcast-presentation-topics
プレゼンテーションや研修で扱われた主なトピックのリスト
Array of String
Video
Inferred

broadcast-setting
イベントや研修の物理的な開催場所や環境
String (Enum: 5)
Video
Inferred

broadcast-video-chapter-details
動画内の各チャプターに関する詳細情報
カスタム (chapterdetails)
Chapter
-

chapter-key-message
チャプターから抽出されたキーメッセージ
String
-
Explicit

chapter-title
チャプターのタイトル
String
-
Explicit

broadcast-video-speakers
動画内の主なスピーカーの情報（名前、役職など）
カスタム (speaker)
Video
-

broadcast-speakers-expertise
スピーカーの専門分野や専門性（明らかな場合のみ）
String
-
Inferred

broadcast-speakers-name
スピーカーの名前（明らかな場合のみ）
String
-
Inferred

broadcast-speakers-organization
スピーカーの所属組織（明らかな場合のみ）
String
-
Inferred

broadcast-speakers-title
スピーカーの役職や肩書き（明らかな場合のみ）
String
-
Inferred

broadcast-visual-aids
プレゼンで使用された主なビジュアル資料（スライド、図、実演など）
Array of String
Video
Inferred

抽出（推論）タイプについては、以下のような指示の違いがあります。
明示的: BDA はインプットから直接値を抽出する必要があります。
推論: BDA はインプットに存在する情報に基づいて値を推論する必要があります。
https://docs.aws.amazon.com/bedrock/latest/userguide/idp-cases-extraction.html
数十秒待つと、次のように結果が返ってきました。
標準出力で取得できないような、broadcast-event-name や broadcast-audience-engagement が取得できていますね。
私は YouTube を見返して、動画が re:Invent 2023 のものだと気がつきましたが、どこかでフレーズとして出ているのでしょうね。
{
  "matched_blueprint": {
    "arn": "arn:aws:bedrock:us-east-1:123456789012:blueprint/6e29eb0a86fc",
    "version": "dev",
    "name": "default",
    "confidence": 1
  },
  "split_video": {
    "chapter_indices": [
      0,
      1,
      2,
      3,
      4
    ]
  },
  "inference_result": {
    "broadcast-number-of-speakers": 1,
    "broadcast-audience-size": "large crowd",
    "broadcast-presentation-topics": [
      "Generative AI",
      "Human-AI collaboration",
      "History of computing",
      "Ada Lovelace's contributions",
      "Technological innovation"
    ],
    "broadcast-video-speakers": {
      "broadcast-speakers-name": "",
      "broadcast-speakers-title": "",
      "broadcast-speakers-organization": "Amazon Web Services",
      "broadcast-speakers-expertise": "Technology and AI"
    },
    "broadcast-event-details": {
      "broadcast-event-name": "AWS re:Invent 2023",
      "broadcast-event-theme": "The symbiotic relationship between humans and AI"
    },
    "broadcast-visual-aids": [
      "Stylized human head with circuit board patterns",
      "Brain illustration",
      "Multiple screens",
      "Diagrams and quotes"
    ],
    "broadcast-audience-engagement": "passive",
    "broadcast-setting": "conference hall"
  },
  "chapters": [
    {
      "inference_result": {
        "broadcast-video-chapter-details": {
          "chapter-title": "Ada Lovelace's Vision of AI and Human-Computer Collaboration",
          "chapter-key-message": "While machines can process and analyze existing information, true creativity and intelligence originate from humans, and computers should assist humans rather than originate new ideas."
        }
      },
      "frames": [],
      "chapter_index": 2,
      "start_timecode_smpte": "00:03:26:08",
      "end_timecode_smpte": "00:04:16:20",
      "start_timestamp_millis": 206272,
      "end_timestamp_millis": 256656,
      "start_frame_index": 6182,
      "end_frame_index": 7692,
      "duration_smpte": "00:00:50:10",
      "duration_millis": 50383,
      "duration_frames": 1511
    },
    {
      "inference_result": {
        "broadcast-video-chapter-details": {
          "chapter-title": "200 Years of Technological Innovation and Ada Lovelace's Vision",
          "chapter-key-message": "Technology has evolved over 200 years through the contributions of mathematicians, computer scientists, and visionaries like Ada Lovelace, who recognized computers' potential beyond simple number crunching."
        }
      },
      "frames": [],
      "chapter_index": 1,
      "start_timecode_smpte": "00:01:43:07",
      "end_timecode_smpte": "00:03:26:07",
      "start_timestamp_millis": 103236,
      "end_timestamp_millis": 206239,
      "start_frame_index": 3094,
      "end_frame_index": 6181,
      "duration_smpte": "00:01:43:00",
      "duration_millis": 103003,
      "duration_frames": 3088
    },
    {
      "inference_result": {
        "broadcast-video-chapter-details": {
          "chapter-title": "Ada Lovelace and the Evolution of Artificial Intelligence",
          "chapter-key-message": "Ada Lovelace's contributions to computing and artificial intelligence are being recognized more than ever, with her work standing the test of time and inspiring future generations."
        }
      },
      "frames": [],
      "chapter_index": 3,
      "start_timecode_smpte": "00:04:16:21",
      "end_timecode_smpte": "00:04:48:12",
      "start_timestamp_millis": 256689,
      "end_timestamp_millis": 288388,
      "start_frame_index": 7693,
      "end_frame_index": 8643,
      "duration_smpte": "00:00:31:20",
      "duration_millis": 31698,
      "duration_frames": 951
    },
    {
      "inference_result": {
        "broadcast-video-chapter-details": {
          "chapter-title": "AWS Reinvent 2023: Generative AI and Human-Technology Partnership",
          "chapter-key-message": "The presentation explores the emerging symbiotic relationship between humans and technology, particularly focusing on how generative AI is revolutionizing human productivity and creativity, drawing parallels to natural symbiotic relationships."
        }
      },
      "frames": [],
      "chapter_index": 0,
      "start_timecode_smpte": "00:00:06:21",
      "end_timecode_smpte": "00:01:43:06",
      "start_timestamp_millis": 6706,
      "end_timestamp_millis": 103203,
      "start_frame_index": 201,
      "end_frame_index": 3093,
      "duration_smpte": "00:01:36:15",
      "duration_millis": 96496,
      "duration_frames": 2893
    },
    {
      "inference_result": {
        "broadcast-video-chapter-details": {
          "chapter-title": "Generative AI and Artificial Intelligence Evolution",
          "chapter-key-message": "The presentation focuses on the rapid innovation in generative AI and machine learning over the past couple of decades, emphasizing the current state and future developments in artificial intelligence technology."
        }
      },
      "frames": [],
      "chapter_index": 4,
      "start_timecode_smpte": "00:04:48:13",
      "end_timecode_smpte": "00:05:00:00",
      "start_timestamp_millis": 288421,
      "end_timestamp_millis": 299999,
      "start_frame_index": 8644,
      "end_frame_index": 8991,
      "duration_smpte": "00:00:31:15",
      "duration_millis": 31532,
      "duration_frames": 946
    }
  ]
}
 料金最後に料金です。30 フィールドまでは 1 分あたり $0.084 でした。音声と同じく 31 フィールド目から 1 フィールドあたり 0.0005 USD が追加で課金されるためご注意ください。
https://aws.amazon.com/bedrock/pricing/?nc1=h_ls
 まとめ以上、「Amazon Bedrock Data Automation で動画ファイルのカスタム出力（カスタムブループリント）が作成できるようになりました。」でした。
LLM で文字起こしした内容をさらに LLM で抽出するあの作業が、一度にできるのは非常に便利ですよね。文字起こし用のモデルを選べたら、嬉しいですね。
クラウド事業本部コンサルティング部のたかくに（@takakuni_）でした！

[アップデート] Amazon Bedrock Data Automation で動画ファイルのカスタム出力（カスタムブループリント）が作成できるようになりました

アップデート内容

抽出

生成

やってみる

料金

まとめ

関連記事

主なカテゴリ

AWSで探す

注目のテーマ

プロダクトやサービスで探す

特集やシリーズから探す

お問い合わせ

運営会社

フィールド名	説明	データ型	取得元	取得方法
broadcast-audience-engagement	The level of engagement or interaction between the speakers and the audience	String (Enum: 3)	Video	Inferred
broadcast-audience-size	The size of the audience present at the event	String (Enum: 4)	Video	Inferred
broadcast-event-details	The event or occasion where the broadcast or training session is taking place, such as the name, theme, date, and time	Custom (eventdetails)	Video	-
broadcast-event-name	The official name or title of the event where the broadcast or training session is taking place	String	-	Explicit
broadcast-event-theme	The overarching theme, topic, or subject matter of the event	String	-	Explicit
broadcast-number-of-speakers	The total number of speakers or presenters featured in the video	Number	Video	Inferred
broadcast-presentation-topics	A list of key topics, subjects, or themes covered in the presentation or training session	Array of String	Video	Inferred
broadcast-setting	The physical setting or environment where the broadcast or training session is taking place	String (Enum: 5)	Video	Inferred
broadcast-video-chapter-details	Detailed information about individual chapters within a video.	Custom (chapterdetails)	Chapter	-
chapter-key-message	Key message extracted from the video.	String	-	Explicit
chapter-title	The title of the video chapter.	String	-	Explicit
broadcast-video-speakers	The primary speaker or presenter featured in the video, including name, title, and other relevant information	Custom (speaker)	Video	-
broadcast-speakers-expertise	The speaker's area of expertise or specialization relevant to the presentation topic if evident, could be empty otherwise	String	-	Inferred
broadcast-speakers-name	The name of the speaker or presenter if evident, could be empty otherwise	String	-	Inferred
broadcast-speakers-organization	The company, institution, or organization that the speaker is affiliated with or representing if evident, could be empty otherwise	String	-	Inferred
broadcast-speakers-title	The professional title or role of the speaker, such as 'CEO', 'Professor', or 'Consultant' if evident, could be empty otherwise	String	-	Inferred
broadcast-visual-aids	A list of notable visual aids or materials used during the presentation, such as slides, diagrams, or demonstrations	Array of String	Video	Inferred

フィールド名	説明	データ型	取得元	取得方法
broadcast-audience-engagement	スピーカーと聴衆のエンゲージメントやインタラクションのレベル	String (Enum: 3)	Video	Inferred
broadcast-audience-size	イベントに参加している聴衆の規模	String (Enum: 4)	Video	Inferred
broadcast-event-details	イベントの詳細（名前、テーマ、日付、時間など）	カスタム (eventdetails)	Video	-
broadcast-event-name	イベントの公式名称やタイトル	String	-	Explicit
broadcast-event-theme	イベントのテーマや主題	String	-	Explicit
broadcast-number-of-speakers	登壇者またはプレゼンターの総数	Number	Video	Inferred
broadcast-presentation-topics	プレゼンテーションや研修で扱われた主なトピックのリスト	Array of String	Video	Inferred
broadcast-setting	イベントや研修の物理的な開催場所や環境	String (Enum: 5)	Video	Inferred
broadcast-video-chapter-details	動画内の各チャプターに関する詳細情報	カスタム (chapterdetails)	Chapter	-
chapter-key-message	チャプターから抽出されたキーメッセージ	String	-	Explicit
chapter-title	チャプターのタイトル	String	-	Explicit
broadcast-video-speakers	動画内の主なスピーカーの情報（名前、役職など）	カスタム (speaker)	Video	-
broadcast-speakers-expertise	スピーカーの専門分野や専門性（明らかな場合のみ）	String	-	Inferred
broadcast-speakers-name	スピーカーの名前（明らかな場合のみ）	String	-	Inferred
broadcast-speakers-organization	スピーカーの所属組織（明らかな場合のみ）	String	-	Inferred
broadcast-speakers-title	スピーカーの役職や肩書き（明らかな場合のみ）	String	-	Inferred
broadcast-visual-aids	プレゼンで使用された主なビジュアル資料（スライド、図、実演など）	Array of String	Video	Inferred