この記事は公開されてから1年以上経過しています。情報が古い可能性がありますので、ご注意ください。
今朝方、AWS Summits 2015 at San Francistoでの『Amazon Machine Learning』について速報記事を2本(発表された&試してみた)投稿致しましたが、『今から使える』このサービス、AWS CLIについても既に対応している様です。と言う訳で当エントリではAWS CLIでAmazon Machine Learningについてどのような操作が出来るのか、コマンド内容からざっくりとではありますが眺めてみたいと思います。割と小ネタです。
- 【新サービス】Amazon Machine Learning(機械学習サービス) がAWS Summits 2015 San Francisco で発表されました。 #AWSSummit | Developers.IO
- 【新サービス】Amazon Machine Learningを試してみた #AWSSummit #AmazonML | Developers.IO
目次
事前準備:AWS CLIのアップグレード
まずはAWS CLI本体の更新です。以下コマンドでサクッと実行してください。
$ sudo pip install --upgrade awscli
$ aws --version
aws-cli/1.7.21 Python/2.7.6 Darwin/13.4.0
Amazon Machine Learning: 対応コマンド一覧
awsの後に続けるサービス名はmachinelearningで関連コマンドが表示される模様です。用意されているコマンドは全部で25個ありました。
$ aws machinelearning (Tabを2回で補完表示実行)
create-batch-prediction describe-data-sources
create-data-source-from-rds describe-evaluations
create-data-source-from-redshift describe-ml-models
create-data-source-from-s3 get-batch-prediction
create-evaluation get-data-source
create-ml-model get-evaluation
create-realtime-endpoint get-ml-model
delete-batch-prediction predict
delete-data-source update-batch-prediction
delete-evaluation update-data-source
delete-ml-model update-evaluation
delete-realtime-endpoint update-ml-model
describe-batch-predictions
DataSourcesに関するもの
機械学習の学習対象元となるデータソース(DataSource)に関するコマンドは以下の通り。てっきりS3/Redshiftのみ対応かと思いきや、RDSも対応しているんですね!管理コンソール上ではRDSを選択するインタフェースは存在していなかったので、現時点でRDSをデータソースとする場合はAWS CLI経由で行う必要があるという事になるのでしょうか。
create-data-source-from-rds
create-data-source-from-redshift
create-data-source-from-s3
describe-data-sources
get-data-source
update-data-source
delete-data-source
Modelに関するもの
学習のベースとなるモデル(Model)に関するコマンドは以下の通り。
create-ml-model
delete-ml-model
get-ml-model
update-ml-model
describe-ml-models
ちなみに以下はcreate-ml-modelのコマンドヘルプを出してみたものになります。
NAME
create-ml-model -
DESCRIPTION
Creates a new MLModel using the data files and the recipe as informa-
tion sources.
An MLModel is nearly immutable. Users can only update the MLModelName
and the ScoreThreshold in an MLModel without creating a new MLModel .
create-ml-model is an asynchronous operation. In response to cre-
ate-ml-model , Amazon Machine Learning (Amazon ML) immediately returns
and sets the MLModel status to PENDING . After the MLModel is created
and ready for use, Amazon ML sets the status to COMPLETED .
You can use the get-ml-model operation to check progress of the
MLModel during the creation operation.
create-ml-model requires a DataSource with computed statistics,
which can be created by setting ComputeStatistics to true in cre-
ate-data-source-from-rds , create-data-source-from-s3 , or cre-
ate-data-source-from-redshift operations.
SYNOPSIS
create-ml-model
--ml-model-id <value>
[--ml-model-name <value>]
--ml-model-type <value>
[--parameters <value>]
--training-data-source-id <value>
[--recipe <value>]
[--recipe-uri <value>]
[--cli-input-json <value>]
[--generate-cli-skeleton]
OPTIONS
--ml-model-id (string)
A user-supplied ID that uniquely identifies the MLModel .
--ml-model-name (string)
A user-supplied name or description of the MLModel .
--ml-model-type (string)
The category of supervised learning that this MLModel will address.
Choose from the following types:
o Choose REGRESSION if the MLModel will be used to predict a numeric
value.
o Choose BINARY if the MLModel result has two possible values.
o Choose MULTICLASS if the MLModel result has a limited number of
values.
For more information, see the Amazon Machine Learning Developer
Guide .
--parameters (map)
A list of the training parameters in the MLModel . The list is
implemented as a map of key/value pairs.
The following is the current set of training parameters:
o sgd.l1RegularizationAmount - Coefficient regularization L1 norm.
It controls overfitting the data by penalizing large coefficients.
This tends to drive coefficients to zero, resulting in sparse fea-
ture set. If you use this parameter, start by specifying a small
value such as 1.0E-08. The value is a double that ranges from 0 to
MAX_DOUBLE. The default is not to use L1 normalization. The param-
eter cannot be used when L2 is specified. Use this parameter spar-
ingly.
o sgd.l2RegularizationAmount - Coefficient regularization L2 norm.
It controls overfitting the data by penalizing large coefficients.
This tends to drive coefficients to small, nonzero values. If you
use this parameter, start by specifying a small value such as
1.0E-08. The valuseis a double that ranges from 0 to MAX_DOUBLE.
The default is not to use L2 normalization. This cannot be used
when L1 is specified. Use this parameter sparingly.
o sgd.maxPasses - Number of times that the training process tra-
verses the observations to build the MLModel . The value is an
integer that ranges from 1 to 10000. The default value is 10.
o sgd.maxMLModelSizeInBytes - Maximum allowed size of the model.
Depending on the input data, the size of the model might affect
its performance. The value is an integer that ranges from 100000
to 2147483648. The default value is 33554432.
Shorthand Syntax:
--parameters key_name=string,key_name2=string
JSON Syntax:
{"string": "string"
...}
--training-data-source-id (string)
The DataSource that points to the training data.
--recipe (string)
The data recipe for creating MLModel . You must specify either the
recipe or its URI. If you dont specify a recipe or its URI, Amazon
ML creates a default.
--recipe-uri (string)
The Amazon Simple Storage Service (Amazon S3) location and file name
that contains the MLModel recipe. You must specify either the recipe
or its URI. If you dont specify a recipe or its URI, Amazon ML cre-
ates a default.
--cli-input-json (string) Performs service operation based on the JSON
string provided. The JSON string follows the format provided by --gen-
erate-cli-skeleton. If other arguments are provided on the command
line, the CLI values will override the JSON-provided values.
--generate-cli-skeleton (boolean) Prints a sample input JSON to stan-
dard output. Note the specified operation is not run if this argument
is specified. The sample input can be used as an argument for
--cli-input-json.
OUTPUT
MLModelId -> (string)
A user-supplied ID that uniquely identifies the MLModel . This value
should be identical to the value of the MLModelId in the request.
CREATE-ML-MODEL()
Evaluationに関するもの
機械学習に於ける評価(Evaluation)に関するコマンドは以下の通り。
create-evaluation
get-evaluation
delete-evaluation
update-evaluation
describe-evaluations
コマンドヘルプは以下の通り(例としてcreate-evaluationを挙げてみています)。割とシンプルで使い易そうな感じですね。
NAME
create-evaluation -
DESCRIPTION
Creates a new Evaluation of an MLModel . An MLModel is evaluated on a
set of observations associated to a DataSource . Like a DataSource for
an MLModel , the DataSource for an Evaluation contains values for the
Target Variable. The Evaluation compares the predicted result for each
observation to the actual outcome and provides a summary so that you
know how effective the MLModel functions on the test data. Evaluation
generates a relevant performance metric such as BinaryAUC, Regression-
RMSE or MulticlassAvgFScore based on the corresponding MLModelType :
BINARY , REGRESSION or MULTICLASS .
create-evaluation is an asynchronous operation. In response to cre-
ate-evaluation , Amazon Machine Learning (Amazon ML) immediately
returns and sets the evaluation status to PENDING . After the Evalua-
tion is created and ready for use, Amazon ML sets the status to COM-
PLETED .
You can use the get-evaluation operation to check progress of the
evaluation during the creation operation.
SYNOPSIS
create-evaluation
--evaluation-id <value>
[--evaluation-name <value>]
--ml-model-id <value>
--evaluation-data-source-id <value>
[--cli-input-json <value>]
[--generate-cli-skeleton]
OPTIONS
--evaluation-id (string)
A user-supplied ID that uniquely identifies the Evaluation .
--evaluation-name (string)
A user-supplied name or description of the Evaluation .
--ml-model-id (string)
The ID of the MLModel to evaluate.
The schema used in creating the MLModel must match the schema of the
DataSource used in the Evaluation .
--evaluation-data-source-id (string)
The ID of the DataSource for the evaluation. The schema of the Data-
Source must match the schema used to create the MLModel .
--cli-input-json (string) Performs service operation based on the JSON
string provided. The JSON string follows the format provided by --gen-
erate-cli-skeleton. If other arguments are provided on the command
line, the CLI values will override the JSON-provided values.
--generate-cli-skeleton (boolean) Prints a sample input JSON to stan-
dard output. Note the specified operation is not run if this argument
is specified. The sample input can be used as an argument for
--cli-input-json.
OUTPUT
EvaluationId -> (string)
The user-supplied ID that uniquely identifies the Evaluation . This
value should be identical to the value of the EvaluationId in the
request.
CREATE-EVALUATION()
Predictionに関するもの
機械学習に於ける予測(Prediction)に関するコマンドは以下の通り。
predict
create-batch-prediction
get-batch-prediction
delete-batch-prediction
describe-batch-predictions
update-batch-prediction
ここでは、サンプルコマンドとしてpredictというものを見てみたいと思います。
NAME
predict -
DESCRIPTION
Generates a prediction for the observation using the specified MLModel
.
NOTE:
Note
Not all response parameters will be populated because this is depen-
dent on the type of requested model.
SYNOPSIS
predict
--ml-model-id <value>
--record <value>
--predict-endpoint <value>
[--cli-input-json <value>]
[--generate-cli-skeleton]
OPTIONS
--ml-model-id (string)
A unique identifier of the MLModel .
--record (map)
A map of variable name-value pairs that represent an observation.
Shorthand Syntax:
--record key_name=string,key_name2=string
JSON Syntax:
{"string": "string"
...}
--predict-endpoint (string)
--cli-input-json (string) Performs service operation based on the JSON
string provided. The JSON string follows the format provided by --gen-
erate-cli-skeleton. If other arguments are provided on the command
line, the CLI values will override the JSON-provided values.
--generate-cli-skeleton (boolean) Prints a sample input JSON to stan-
dard output. Note the specified operation is not run if this argument
is specified. The sample input can be used as an argument for
--cli-input-json.
OUTPUT
Prediction -> (structure)
The output from a predict operation:
o Details - Contains the following attributes: DetailsAt-
tributes.PREDICTIVE_MODEL_TYPE - REGRESSION | BINARY | MULTICLASS
DetailsAttributes.ALGORITHM - SGD
o PredictedLabel - Present for either a BINARY or MULTICLASS MLModel
request.
o PredictedScores - Contains the raw classification score corre-
sponding to each label.
o PredictedValue - Present for a REGRESSION MLModel request.
predictedLabel -> (string)
The prediction label for either a BINARY or MULTICLASS MLModel .
predictedValue -> (float)
The prediction value for REGRESSION MLModel .
predictedScores -> (map)
Provides the raw classification score corresponding to each
label.
key -> (string)
value -> (float)
details -> (map)
Provides any additional details regarding the prediction.
key -> (string)
Contains the key values of DetailsMap : PredictiveModelType -
Indicates the type of the MLModel . Algorithm - Indicates the
algorithm was used for the MLModel .
value -> (string)
PREDICT()
create-batch-predictionが以下の様なコマンド概要となっています。両者を比較すると
- predict:指定されたモデルを使用して観察するための予測を生成
- create-batch-prediction:観測のグループの予測を生成
となっているので(訳:Google先生)、それぞれ予測を作成するのだけれど作成の際の条件が異なる感じなのでしょう。ちょっと今時点では両者の違い、使いどころはピンと来てはいないですが、この辺りは追々理解を深めて行こうと思います。
NAME
create-batch-prediction -
DESCRIPTION
Generates predictions for a group of observations. The observations to
process exist in one or more data files referenced by a DataSource .
This operation creates a new BatchPrediction , and uses an MLModel and
the data files referenced by the DataSource as information sources.
create-batch-prediction is an asynchronous operation. In response to
create-batch-prediction , Amazon Machine Learning (Amazon ML) immedi-
ately returns and sets the BatchPrediction status to PENDING . After
the BatchPrediction completes, Amazon ML sets the status to COMPLETED .
You can poll for status updates by using the get-batch-prediction
operation and checking the Status parameter of the result. After the
COMPLETED status appears, the results are available in the location
specified by the OutputUri parameter.
その他
上記以外のコマンドは以下2つが該当する模様。
create-realtime-endpoint
delete-realtime-endpoint
以下がcreate-realtime-endpointの内容です。MLModelが処理を行う際に利用するエンドポイントを作るコマンドの様ですね。先述のエントリでも裏で作成されていたのでしょうか。(今エントリ執筆時点では一旦要素を消しちゃったので確認出来ていない)この辺りも改めて要素を作成したタイミングで確かめてみたいと思います。
NAME
create-realtime-endpoint -
DESCRIPTION
Creates a real-time endpoint for the MLModel . The endpoint contains
the URI of the MLModel ; that is, the location to send real-time pre-
diction requests for the specified MLModel .
SYNOPSIS
create-realtime-endpoint
--ml-model-id <value>
[--cli-input-json <value>]
[--generate-cli-skeleton]
OPTIONS
--ml-model-id (string)
The ID assigned to the MLModel during creation.
--cli-input-json (string) Performs service operation based on the JSON
string provided. The JSON string follows the format provided by --gen-
erate-cli-skeleton. If other arguments are provided on the command
line, the CLI values will override the JSON-provided values.
--generate-cli-skeleton (boolean) Prints a sample input JSON to stan-
dard output. Note the specified operation is not run if this argument
is specified. The sample input can be used as an argument for
--cli-input-json.
OUTPUT
MLModelId -> (string)
A user-supplied ID that uniquely identifies the MLModel . This value
should be identical to the value of the MLModelId in the request.
RealtimeEndpointInfo -> (structure)
The endpoint information of the MLModel
PeakRequestsPerSecond -> (integer)
The maximum processing rate for the real-time endpoint for
MLModel , measured in incoming requests per second.
CreatedAt -> (timestamp)
The time that the request to create the real-time endpoint for
the MLModel was received. The time is expressed in epoch time.
EndpointUrl -> (string)
The URI that specifies where to send real-time prediction
requests for the MLModel .
NOTE:
Note
The application must wait until the real-time endpoint is
ready before using this URI.
EndpointStatus -> (string)
The current status of the real-time endpoint for the MLModel .
This element can have one of the following values:
o NONE - Endpoint does not exist or was previously deleted.
o READY - Endpoint is ready to be used for real-time predic-
tions.
o UPDATING - Updating/creating the endpoint.
CREATE-REALTIME-ENDPOINT()
まとめ
以上、AWS CLIのMachine Learningに関するコマンド探訪エントリでした。文字数的には大分増えてしまいましたが、大体がコマンドヘルプの転記なので中身的には大した事無いですw コマンドをこうして眺めてみる事で、どのような要素でサービスが構成されているかを確認する事も出来るので良いですね。ゆくゆくはこのAWS CLIコマンドを使って色々自動化する処理も組み込んで行きたいところです。こちらからは以上です。