【新サービス】Amazon Machine LearningのAWS CLI コマンドを確認してみた #AWSSummit #AmazonML
今朝方、AWS Summits 2015 at San Francistoでの『Amazon Machine Learning』について速報記事を2本(発表された&試してみた)投稿致しましたが、『今から使える』このサービス、AWS CLIについても既に対応している様です。と言う訳で当エントリではAWS CLIでAmazon Machine Learningについてどのような操作が出来るのか、コマンド内容からざっくりとではありますが眺めてみたいと思います。割と小ネタです。
- 【新サービス】Amazon Machine Learning(機械学習サービス) がAWS Summits 2015 San Francisco で発表されました。 #AWSSummit | Developers.IO
- 【新サービス】Amazon Machine Learningを試してみた #AWSSummit #AmazonML | Developers.IO
目次
事前準備:AWS CLIのアップグレード
まずはAWS CLI本体の更新です。以下コマンドでサクッと実行してください。
$ sudo pip install --upgrade awscli $ aws --version aws-cli/1.7.21 Python/2.7.6 Darwin/13.4.0
Amazon Machine Learning: 対応コマンド一覧
awsの後に続けるサービス名はmachinelearningで関連コマンドが表示される模様です。用意されているコマンドは全部で25個ありました。
$ aws machinelearning (Tabを2回で補完表示実行) create-batch-prediction describe-data-sources create-data-source-from-rds describe-evaluations create-data-source-from-redshift describe-ml-models create-data-source-from-s3 get-batch-prediction create-evaluation get-data-source create-ml-model get-evaluation create-realtime-endpoint get-ml-model delete-batch-prediction predict delete-data-source update-batch-prediction delete-evaluation update-data-source delete-ml-model update-evaluation delete-realtime-endpoint update-ml-model describe-batch-predictions
DataSourcesに関するもの
機械学習の学習対象元となるデータソース(DataSource)に関するコマンドは以下の通り。てっきりS3/Redshiftのみ対応かと思いきや、RDSも対応しているんですね!管理コンソール上ではRDSを選択するインタフェースは存在していなかったので、現時点でRDSをデータソースとする場合はAWS CLI経由で行う必要があるという事になるのでしょうか。
create-data-source-from-rds create-data-source-from-redshift create-data-source-from-s3 describe-data-sources get-data-source update-data-source delete-data-source
Modelに関するもの
学習のベースとなるモデル(Model)に関するコマンドは以下の通り。
create-ml-model delete-ml-model get-ml-model update-ml-model describe-ml-models
ちなみに以下はcreate-ml-modelのコマンドヘルプを出してみたものになります。
NAME create-ml-model - DESCRIPTION Creates a new MLModel using the data files and the recipe as informa- tion sources. An MLModel is nearly immutable. Users can only update the MLModelName and the ScoreThreshold in an MLModel without creating a new MLModel . create-ml-model is an asynchronous operation. In response to cre- ate-ml-model , Amazon Machine Learning (Amazon ML) immediately returns and sets the MLModel status to PENDING . After the MLModel is created and ready for use, Amazon ML sets the status to COMPLETED . You can use the get-ml-model operation to check progress of the MLModel during the creation operation. create-ml-model requires a DataSource with computed statistics, which can be created by setting ComputeStatistics to true in cre- ate-data-source-from-rds , create-data-source-from-s3 , or cre- ate-data-source-from-redshift operations. SYNOPSIS create-ml-model --ml-model-id <value> [--ml-model-name <value>] --ml-model-type <value> [--parameters <value>] --training-data-source-id <value> [--recipe <value>] [--recipe-uri <value>] [--cli-input-json <value>] [--generate-cli-skeleton] OPTIONS --ml-model-id (string) A user-supplied ID that uniquely identifies the MLModel . --ml-model-name (string) A user-supplied name or description of the MLModel . --ml-model-type (string) The category of supervised learning that this MLModel will address. Choose from the following types: o Choose REGRESSION if the MLModel will be used to predict a numeric value. o Choose BINARY if the MLModel result has two possible values. o Choose MULTICLASS if the MLModel result has a limited number of values. For more information, see the Amazon Machine Learning Developer Guide . --parameters (map) A list of the training parameters in the MLModel . The list is implemented as a map of key/value pairs. The following is the current set of training parameters: o sgd.l1RegularizationAmount - Coefficient regularization L1 norm. It controls overfitting the data by penalizing large coefficients. This tends to drive coefficients to zero, resulting in sparse fea- ture set. If you use this parameter, start by specifying a small value such as 1.0E-08. The value is a double that ranges from 0 to MAX_DOUBLE. The default is not to use L1 normalization. The param- eter cannot be used when L2 is specified. Use this parameter spar- ingly. o sgd.l2RegularizationAmount - Coefficient regularization L2 norm. It controls overfitting the data by penalizing large coefficients. This tends to drive coefficients to small, nonzero values. If you use this parameter, start by specifying a small value such as 1.0E-08. The valuseis a double that ranges from 0 to MAX_DOUBLE. The default is not to use L2 normalization. This cannot be used when L1 is specified. Use this parameter sparingly. o sgd.maxPasses - Number of times that the training process tra- verses the observations to build the MLModel . The value is an integer that ranges from 1 to 10000. The default value is 10. o sgd.maxMLModelSizeInBytes - Maximum allowed size of the model. Depending on the input data, the size of the model might affect its performance. The value is an integer that ranges from 100000 to 2147483648. The default value is 33554432. Shorthand Syntax: --parameters key_name=string,key_name2=string JSON Syntax: {"string": "string" ...} --training-data-source-id (string) The DataSource that points to the training data. --recipe (string) The data recipe for creating MLModel . You must specify either the recipe or its URI. If you dont specify a recipe or its URI, Amazon ML creates a default. --recipe-uri (string) The Amazon Simple Storage Service (Amazon S3) location and file name that contains the MLModel recipe. You must specify either the recipe or its URI. If you dont specify a recipe or its URI, Amazon ML cre- ates a default. --cli-input-json (string) Performs service operation based on the JSON string provided. The JSON string follows the format provided by --gen- erate-cli-skeleton. If other arguments are provided on the command line, the CLI values will override the JSON-provided values. --generate-cli-skeleton (boolean) Prints a sample input JSON to stan- dard output. Note the specified operation is not run if this argument is specified. The sample input can be used as an argument for --cli-input-json. OUTPUT MLModelId -> (string) A user-supplied ID that uniquely identifies the MLModel . This value should be identical to the value of the MLModelId in the request. CREATE-ML-MODEL()
Evaluationに関するもの
機械学習に於ける評価(Evaluation)に関するコマンドは以下の通り。
create-evaluation get-evaluation delete-evaluation update-evaluation describe-evaluations
コマンドヘルプは以下の通り(例としてcreate-evaluationを挙げてみています)。割とシンプルで使い易そうな感じですね。
NAME create-evaluation - DESCRIPTION Creates a new Evaluation of an MLModel . An MLModel is evaluated on a set of observations associated to a DataSource . Like a DataSource for an MLModel , the DataSource for an Evaluation contains values for the Target Variable. The Evaluation compares the predicted result for each observation to the actual outcome and provides a summary so that you know how effective the MLModel functions on the test data. Evaluation generates a relevant performance metric such as BinaryAUC, Regression- RMSE or MulticlassAvgFScore based on the corresponding MLModelType : BINARY , REGRESSION or MULTICLASS . create-evaluation is an asynchronous operation. In response to cre- ate-evaluation , Amazon Machine Learning (Amazon ML) immediately returns and sets the evaluation status to PENDING . After the Evalua- tion is created and ready for use, Amazon ML sets the status to COM- PLETED . You can use the get-evaluation operation to check progress of the evaluation during the creation operation. SYNOPSIS create-evaluation --evaluation-id <value> [--evaluation-name <value>] --ml-model-id <value> --evaluation-data-source-id <value> [--cli-input-json <value>] [--generate-cli-skeleton] OPTIONS --evaluation-id (string) A user-supplied ID that uniquely identifies the Evaluation . --evaluation-name (string) A user-supplied name or description of the Evaluation . --ml-model-id (string) The ID of the MLModel to evaluate. The schema used in creating the MLModel must match the schema of the DataSource used in the Evaluation . --evaluation-data-source-id (string) The ID of the DataSource for the evaluation. The schema of the Data- Source must match the schema used to create the MLModel . --cli-input-json (string) Performs service operation based on the JSON string provided. The JSON string follows the format provided by --gen- erate-cli-skeleton. If other arguments are provided on the command line, the CLI values will override the JSON-provided values. --generate-cli-skeleton (boolean) Prints a sample input JSON to stan- dard output. Note the specified operation is not run if this argument is specified. The sample input can be used as an argument for --cli-input-json. OUTPUT EvaluationId -> (string) The user-supplied ID that uniquely identifies the Evaluation . This value should be identical to the value of the EvaluationId in the request. CREATE-EVALUATION()
Predictionに関するもの
機械学習に於ける予測(Prediction)に関するコマンドは以下の通り。
predict create-batch-prediction get-batch-prediction delete-batch-prediction describe-batch-predictions update-batch-prediction
ここでは、サンプルコマンドとしてpredictというものを見てみたいと思います。
NAME predict - DESCRIPTION Generates a prediction for the observation using the specified MLModel . NOTE: Note Not all response parameters will be populated because this is depen- dent on the type of requested model. SYNOPSIS predict --ml-model-id <value> --record <value> --predict-endpoint <value> [--cli-input-json <value>] [--generate-cli-skeleton] OPTIONS --ml-model-id (string) A unique identifier of the MLModel . --record (map) A map of variable name-value pairs that represent an observation. Shorthand Syntax: --record key_name=string,key_name2=string JSON Syntax: {"string": "string" ...} --predict-endpoint (string) --cli-input-json (string) Performs service operation based on the JSON string provided. The JSON string follows the format provided by --gen- erate-cli-skeleton. If other arguments are provided on the command line, the CLI values will override the JSON-provided values. --generate-cli-skeleton (boolean) Prints a sample input JSON to stan- dard output. Note the specified operation is not run if this argument is specified. The sample input can be used as an argument for --cli-input-json. OUTPUT Prediction -> (structure) The output from a predict operation: o Details - Contains the following attributes: DetailsAt- tributes.PREDICTIVE_MODEL_TYPE - REGRESSION | BINARY | MULTICLASS DetailsAttributes.ALGORITHM - SGD o PredictedLabel - Present for either a BINARY or MULTICLASS MLModel request. o PredictedScores - Contains the raw classification score corre- sponding to each label. o PredictedValue - Present for a REGRESSION MLModel request. predictedLabel -> (string) The prediction label for either a BINARY or MULTICLASS MLModel . predictedValue -> (float) The prediction value for REGRESSION MLModel . predictedScores -> (map) Provides the raw classification score corresponding to each label. key -> (string) value -> (float) details -> (map) Provides any additional details regarding the prediction. key -> (string) Contains the key values of DetailsMap : PredictiveModelType - Indicates the type of the MLModel . Algorithm - Indicates the algorithm was used for the MLModel . value -> (string) PREDICT()
create-batch-predictionが以下の様なコマンド概要となっています。両者を比較すると
- predict:指定されたモデルを使用して観察するための予測を生成
- create-batch-prediction:観測のグループの予測を生成
となっているので(訳:Google先生)、それぞれ予測を作成するのだけれど作成の際の条件が異なる感じなのでしょう。ちょっと今時点では両者の違い、使いどころはピンと来てはいないですが、この辺りは追々理解を深めて行こうと思います。
NAME create-batch-prediction - DESCRIPTION Generates predictions for a group of observations. The observations to process exist in one or more data files referenced by a DataSource . This operation creates a new BatchPrediction , and uses an MLModel and the data files referenced by the DataSource as information sources. create-batch-prediction is an asynchronous operation. In response to create-batch-prediction , Amazon Machine Learning (Amazon ML) immedi- ately returns and sets the BatchPrediction status to PENDING . After the BatchPrediction completes, Amazon ML sets the status to COMPLETED . You can poll for status updates by using the get-batch-prediction operation and checking the Status parameter of the result. After the COMPLETED status appears, the results are available in the location specified by the OutputUri parameter.
その他
上記以外のコマンドは以下2つが該当する模様。
create-realtime-endpoint delete-realtime-endpoint
以下がcreate-realtime-endpointの内容です。MLModelが処理を行う際に利用するエンドポイントを作るコマンドの様ですね。先述のエントリでも裏で作成されていたのでしょうか。(今エントリ執筆時点では一旦要素を消しちゃったので確認出来ていない)この辺りも改めて要素を作成したタイミングで確かめてみたいと思います。
NAME create-realtime-endpoint - DESCRIPTION Creates a real-time endpoint for the MLModel . The endpoint contains the URI of the MLModel ; that is, the location to send real-time pre- diction requests for the specified MLModel . SYNOPSIS create-realtime-endpoint --ml-model-id <value> [--cli-input-json <value>] [--generate-cli-skeleton] OPTIONS --ml-model-id (string) The ID assigned to the MLModel during creation. --cli-input-json (string) Performs service operation based on the JSON string provided. The JSON string follows the format provided by --gen- erate-cli-skeleton. If other arguments are provided on the command line, the CLI values will override the JSON-provided values. --generate-cli-skeleton (boolean) Prints a sample input JSON to stan- dard output. Note the specified operation is not run if this argument is specified. The sample input can be used as an argument for --cli-input-json. OUTPUT MLModelId -> (string) A user-supplied ID that uniquely identifies the MLModel . This value should be identical to the value of the MLModelId in the request. RealtimeEndpointInfo -> (structure) The endpoint information of the MLModel PeakRequestsPerSecond -> (integer) The maximum processing rate for the real-time endpoint for MLModel , measured in incoming requests per second. CreatedAt -> (timestamp) The time that the request to create the real-time endpoint for the MLModel was received. The time is expressed in epoch time. EndpointUrl -> (string) The URI that specifies where to send real-time prediction requests for the MLModel . NOTE: Note The application must wait until the real-time endpoint is ready before using this URI. EndpointStatus -> (string) The current status of the real-time endpoint for the MLModel . This element can have one of the following values: o NONE - Endpoint does not exist or was previously deleted. o READY - Endpoint is ready to be used for real-time predic- tions. o UPDATING - Updating/creating the endpoint. CREATE-REALTIME-ENDPOINT()
まとめ
以上、AWS CLIのMachine Learningに関するコマンド探訪エントリでした。文字数的には大分増えてしまいましたが、大体がコマンドヘルプの転記なので中身的には大した事無いですw コマンドをこうして眺めてみる事で、どのような要素でサービスが構成されているかを確認する事も出来るので良いですね。ゆくゆくはこのAWS CLIコマンドを使って色々自動化する処理も組み込んで行きたいところです。こちらからは以上です。