Amazon Nova で Bedrock Model Distillation を使ってみた

AWS re:Invent 2024

#Amazon Bedrock

#生成AI

#Amazon Nova

森田力

2024.12.13

こんにちは、森田です。
本記事では、re:Invent 2024 で発表された Amazon Bedrock Model Distillation を実際に使ってみたいと思います。
https://dev.classmethod.jp/articles/amazon-bedrock-model-distillation-aws-reinvent
 Amazon Bedrock Model Distillation とはAmazon Bedrock で蒸留モデルを作成することができる機能です。
蒸留モデルを作ることで応答速度やコストの観点でメリットがあるとされています。
詳しくは以下も併せてご参照ください。
https://dev.classmethod.jp/articles/bedrock-model-distillation-re-growth-2024-awsreinvent-regrowth-fukuoka
 ざっくりと仕組みModel Distillation では、生徒モデルを学習させるためのデータが必要となります。
また、このデータは、教師モデルにデータセットを入力することで作成されます。
 やってみた データの準備各入力データについては、以下のフォーマットで記述します。
{
	"schemaVersion": "bedrock-conversation-2024",
	"system": [
		{
			"text": "A chat between a curious User and an artificial intelligence Bot. The Bot gives helpful, detailed, and polite answers to the User's questions."
		}
	],
	"messages": [
		{
			"role": "user",
			"content": [
				{
					"text": "What is the weather like today?"
				}
			]
		},
		{
			"role": "assistant",
			"content": [
				{
					"text": "Today's weather is sunny."
				}
			]
		}
	]
}
schemaVersionとしてbedrock-conversation-2024を指定するという点がポイントです。
また、Amazon Nova のみ、マルチターンの会話を対応しています。
これらのデータを複数個用意してデータセットを作成します。
データセットにする際に、JSONL形式で格納します。
sample.jsonl（LLMで生成）
{"schemaVersion":"bedrock-conversation-2024","system":[{"text":"A chat between a curious User and an artificial intelligence Bot. The Bot gives helpful, detailed, and polite answers to the User's questions."}],"messages":[{"role":"user","content":[{"text":"What is the weather like today?"}]},{"role":"assistant","content":[{"text":"Today's weather is sunny."}]}]}
{"schemaVersion":"bedrock-conversation-2024","system":[{"text":"A chat between a curious User and an artificial intelligence Bot. The Bot gives helpful, detailed, and polite answers to the User's questions about AWS services."}],"messages":[{"role":"user","content":[{"text":"What is AWS?"}]},{"role":"assistant","content":[{"text":"AWS (Amazon Web Services) is Amazon's cloud computing platform that provides a wide range of IT services through the internet. These services include computing, storage, databases, and more. Users only pay for what they use, making it a cost-effective solution for businesses of all sizes to build and run their IT systems."}]}]}
この時のデータセットのサンプル数としては、100個以上必要なのでご注意ください。
作成したJSONLファイルは、S3バケットへ格納します。
!今回は、手動でデータセットを用意しましたが、モデル呼び出しログをデータセットしても利用できます。
 ジョブの作成続いて、蒸留ジョブを作成します。
今回は、バージニアリージョンのマネジメントコンソールから行います。
ジョブの作成時に、教師モデルと生徒モデルの指定を行います。
教師モデルをNova Proとすると、Nova Micro、Nova Liteが選択できるようでした。
Directly upload to S3 locationを選択して、JSONLファイルのS3パスを指定します。
あとは、ロールの設定などを行い、ジョブの作成をクリックします。
ジョブの完了まで待ちます。（100個のデータで学習させた場合、1~2時間ほどかかりました。）
ジョブが正常に完了すると、以下のようにカスタムモデルとして表示されます。
モデル名をクリックすると、蒸留モデルのARNが確認できます。
モデルを呼び出すためには、プロビジョンドスループットが必要となります。
最小請求単位 1 時間なので、利用する際にはご注意ください。
https://dev.classmethod.jp/articles/bedrock-pt-min-unit
 さいごに今回は、データセットを実際に作成して学習させてみました。
データセットのフォーマットで結構ハマりましたが、なんとかモデルの作成までできました。
モデル呼び出しログを使ったモデル蒸留もどこかのタイミングでやってみたいと思います。

Amazon Nova で Bedrock Model Distillation を使ってみた

Amazon Bedrock Model Distillation とは

ざっくりと仕組み

やってみた

データの準備

ジョブの作成

さいごに

関連記事

主なカテゴリ

AWSで探す

注目のテーマ

プロダクトやサービスで探す

特集やシリーズから探す

お問い合わせ

運営会社