[アップデート] Amazon Q Business が、音声と動画のデータ取り込みをサポートしました

2025.03.06
こんにちは！クラウド事業本部コンサルティング部のたかくに（@takakuni_）です。
Amazon Q Business が、音声と動画のデータ取り込みをサポートしました。
https://aws.amazon.com/jp/about-aws/whats-new/2025/03/amazon-q-business-insights-audio-video-data/
これにより、従来の画像やテキストデータに加え、音声や動画データを Amazon Q Index に取り込み RAG を構築できるようになりました。
マルチモーダルな RAG アプリケーションの選択肢として、さらに拡充してきましたね。ユースケースとして、会議の録画データや研修用の動画検索が挙げられています。
Customers can unlock valuable insights from their audio and video resources. Users can now easily search for specific information within recorded meetings, training videos, podcasts, or any other audio or video content ingested into Amazon Q Business.
また、本機能は最近 GA した、Amazon Bedrock Data Automation が利用されています。
The audio and video ingestion feature uses the Bedrock Data Automation feature to process customer’s multimodal assets.
 制約事項制約事項をチェックします。ドキュメントによると、次の制約がありました。
 動画サイズは最大 10 GB まで
時間は最大 4 時間まで
英語のコンテンツのみサポート
MP4、MOV、AVI 形式サポート
最大解像度 1080p を推奨
 音声サイズは最大 2 GB まで
時間は最大 4 時間まで
英語のコンテンツのみサポート
MP3、MP4、FLAC、WAV 形式をサポート
また処理に関する考慮事項として、次が記載されています。
画像を含む文書は、テキストのみのファイルよりも処理に時間がかかる
コネクタからのコンテンツ抽出には増分同期を使用する
完全同期または個別のインポートでファイルを再処理する
https://docs.aws.amazon.com/amazonq/latest/qbusiness-ug/Audio-video-extraction.html
 料金料金を確認してみましょう。インデックス価格が今回の取り込みに発生する料金です。
Starter、Enterprise Index ともに、以下の料金が発生します。
オーディオ $0.006/分
ビデオ $0.050/分
https://aws.amazon.com/q/business/pricing/?nc1=h_ls
Amazon Bedrock Data Automation を使っているのもあり、同価格での提供でした。
https://aws.amazon.com/bedrock/pricing/
 やってみたそれでは、実際に Amazon Q Business に音声データを取り込んでみたいと思います。
Anthropic’s Claude 3.7 Sonnet の News をダウンロードして取り込んでみます。
https://aws.amazon.com/blogs/aws/anthropics-claude-3-7-sonnet-the-first-hybrid-reasoning-model-is-now-available-in-amazon-bedrock/
 Amazon Q BusinessAmazon Q Business のデータソースの画面で Upload files を選択します。
Multi-media content configuration が増えていますね。今回は音声ファイルのため Audio files を選択し、ファイルのアップロードを行います。
ファイルのアップロードが完了したら Done をクリックします。
ソースは音声ファイルのみであることがわかります。
Anthropic Claude 3.7 Sonnet の特徴を聞いてみました。
アップロードした音声ファイルに基づいて適切な回答が得られていることがわかります。
 回答# Features of Claude 3.7 Sonnet 

Claude 3.7 Sonnet is Anthropic's most intelligent model to date, now available in Amazon Bedrock with several notable capabilities. 

It is the first Claude model to offer hybrid reasoning, integrating reasoning as a core capability within a single model rather than using separate models for quick answers and complex problems. This approach more closely resembles human cognition. 

The model operates in two distinct modes: standard mode (an improved version of Claude 3.5 Sonnet) and extended thinking mode, which can be toggled in Amazon Bedrock. In extended thinking mode, Claude 3.7 takes additional time to analyze problems in detail, plan solutions, and consider multiple perspectives before responding. 

Claude 3.7 Sonnet features an adjustable reasoning budget, allowing users to control the balance between speed, cost, and performance by allocating more tokens to reasoning for complex problems or limiting tokens for faster responses. 

The model offers significantly expanded output capacity - over 15 times longer than its predecessor - supporting outputs up to 128K tokens long (64K generally available, 128K in beta). This enhanced capacity is particularly useful when requesting more detail, multiple examples, or additional context. 

Claude 3.7 Sonnet excels at coding, achieving an industry-leading 70.3% on SWE bench verified in standard mode, and outperforms Claude 3.5 Sonnet across most benchmarks. 

The model can be used for various industry applications including creating advanced AI assistants, medical imaging analysis, research summarization, complex financial modeling, and serving as a coding companion that reviews code and suggests improvements.
ソースドキュメントを見ると抽出されたテキストが確認できます。
必要に応じて、 Amazon Q 側で要約していたりしますね。
Claude などの固有名詞はもう少しですが、それ以外はかなり聞き取れているのではないかと思いました。
Direct transcription
"case.\[spk_0\]: Claw 3 and 7 sonnet in action. As for any new model, I have to request access in the Amazon bedrock console. In the navigation pane, I choose model access under bedrock configurations. Then I choose modify Model Access to request access for Claud 3.7 Sonic. Base models showing 37"
Time stamp: 00:00-03:06

Direct transcription
"Instead of using separate models, one for quick answers and another for solving complex problems, Claw 3.7 Sonic integrates reasoning as a core capability within a single model. This combination is more similar to how the human brain works"
Time stamp: 00:00-06:08

Direct transcription
"Instead of using separate models, one for quick answers and another for solving complex problems, Claw 3.7 Sonic integrates reasoning as a core capability within a single model. This combination is more similar to how the human brain works. After all, we use the same"
Time stamp: 00:00-06:08

Direct transcription
"Instead of using separate models, one for quick answers and another for solving complex problems, Claw 3.7 Sonic integrates reasoning as a core capability within a single model. This combination is more similar to how the human brain works. After all, we use the same"
Time stamp: 00:00-06:08

Direct transcription
"two modes, standard and extended thinking mode, which can be toggled in Amazon bedrock. In standard mode, Cloud 3.7 Sonic is an improved version of Cloud 3.5 Sonic. In extended thinking mode, claw 3.7 sonic takes additional time to analyze problems in detail, plan solutions, and consider multiple"
Time stamp: 00:00-06:08

Q summary
be toggled in Amazon bedrock. In standard mode, Cloud 3.7 Sonic is an improved version of Cloud 3.5 Sonic. In extended thinking mode, claw 3.7 sonic takes additional time to analyze problems in detail, plan solutions, and consider multiple perspectives before providing a response, allowing it
Time stamp: 03:58-04:22

Direct transcription
"In extended thinking mode, claw 3.7 sonic takes additional time to analyze problems in detail, plan solutions, and consider multiple perspectives before providing a response, allowing it to make further gains in performance. You can control speed and cost by choosing when to use reasoning"
Time stamp: 03:06-06:08

Q summary
its predecessor compared to claw 3.5 Soid, this model offers significantly expanded output length. This enhanced capacity is particularly useful when you explicitly request more detail, ask for multiple examples, or request additional context or background information. To achieve long outputs
Time stamp: 03:06-03:58

Direct transcription
"its predecessor compared to claw 3.5 Soid, this model offers significantly expanded output length. This enhanced capacity is particularly useful when you explicitly request more detail, ask for multiple examples, or request additional context or background information. To achieve long outputs"
Time stamp: 00:00-06:08

Direct transcription
"an industry leading 70.3% for standard mode on SWE bench verified. Claw 3.7 sonnet also performs better than Claw 3.5 sonnet across the majority of benchmarks. These enhanced capabilities make Cloud 3.7 sonnet ideal for powering AI agents and complex workflows.\[spk_0\]: Clawed 3.7 signed"
Time stamp: 06:08-07:52
 まとめ以上、「Amazon Q Business が、音声と動画のデータ取り込みをサポートしました。」でした。
このブログがどなたかの参考になれば幸いです。
クラウド事業本部コンサルティング部のたかくに（@takakuni_）でした！
[アップデート] Amazon Q Business が、音声と動画のデータ取り込みをサポートしました

制約事項

動画

音声

料金

やってみた

Amazon Q Business

回答

まとめ

関連記事

主なカテゴリ

AWSで探す

注目のテーマ

プロダクトやサービスで探す

特集やシリーズから探す

お問い合わせ

運営会社