【新サービス】Amazon Machine LearningのAWS CLI コマンドを確認してみた #AWSSummit #AmazonML

2015.04.10

この記事は公開されてから1年以上経過しています。情報が古い可能性がありますので、ご注意ください。

今朝方、AWS Summits 2015 at San Francistoでの『Amazon Machine Learning』について速報記事を2本(発表された&試してみた)投稿致しましたが、『今から使える』このサービス、AWS CLIについても既に対応している様です。と言う訳で当エントリではAWS CLIでAmazon Machine Learningについてどのような操作が出来るのか、コマンド内容からざっくりとではありますが眺めてみたいと思います。割と小ネタです。

目次

事前準備:AWS CLIのアップグレード

まずはAWS CLI本体の更新です。以下コマンドでサクッと実行してください。

$ sudo pip install --upgrade awscli
$ aws --version
aws-cli/1.7.21 Python/2.7.6 Darwin/13.4.0

Amazon Machine Learning: 対応コマンド一覧

awsの後に続けるサービス名はmachinelearningで関連コマンドが表示される模様です。用意されているコマンドは全部で25個ありました。

$ aws machinelearning (Tabを2回で補完表示実行)
create-batch-prediction            describe-data-sources 
create-data-source-from-rds        describe-evaluations 
create-data-source-from-redshift   describe-ml-models 
create-data-source-from-s3         get-batch-prediction 
create-evaluation                  get-data-source 
create-ml-model                    get-evaluation 
create-realtime-endpoint           get-ml-model 
delete-batch-prediction            predict 
delete-data-source                 update-batch-prediction 
delete-evaluation                  update-data-source 
delete-ml-model                    update-evaluation 
delete-realtime-endpoint           update-ml-model
describe-batch-predictions

DataSourcesに関するもの

機械学習の学習対象元となるデータソース(DataSource)に関するコマンドは以下の通り。てっきりS3/Redshiftのみ対応かと思いきや、RDSも対応しているんですね!管理コンソール上ではRDSを選択するインタフェースは存在していなかったので、現時点でRDSをデータソースとする場合はAWS CLI経由で行う必要があるという事になるのでしょうか。

create-data-source-from-rds
create-data-source-from-redshift
create-data-source-from-s3
describe-data-sources
get-data-source
update-data-source
delete-data-source

Modelに関するもの

学習のベースとなるモデル(Model)に関するコマンドは以下の通り。

create-ml-model
delete-ml-model
get-ml-model
update-ml-model
describe-ml-models

ちなみに以下はcreate-ml-modelのコマンドヘルプを出してみたものになります。

NAME
       create-ml-model -

DESCRIPTION
       Creates  a  new MLModel using the data files and the recipe as informa-
       tion sources.

       An MLModel is nearly immutable. Users can only update  the  MLModelName
       and the ScoreThreshold in an MLModel without creating a new MLModel .

       create-ml-model  is  an  asynchronous  operation.  In  response to cre-
       ate-ml-model , Amazon Machine Learning (Amazon ML) immediately  returns
       and  sets  the MLModel status to PENDING . After the MLModel is created
       and ready for use, Amazon ML sets the status to COMPLETED .

       You can use the   get-ml-model  operation  to  check  progress  of  the
       MLModel during the creation operation.
          create-ml-model  requires  a  DataSource  with  computed statistics,
          which can be created by setting ComputeStatistics to true  in   cre-
          ate-data-source-from-rds  ,   create-data-source-from-s3  , or  cre-
          ate-data-source-from-redshift operations.

SYNOPSIS
            create-ml-model
          --ml-model-id <value>
          [--ml-model-name <value>]
          --ml-model-type <value>
          [--parameters <value>]
          --training-data-source-id <value>
          [--recipe <value>]
          [--recipe-uri <value>]
          [--cli-input-json <value>]
          [--generate-cli-skeleton]

OPTIONS
       --ml-model-id (string)
          A user-supplied ID that uniquely identifies the MLModel .

       --ml-model-name (string)
          A user-supplied name or description of the MLModel .

       --ml-model-type (string)
          The category of supervised learning that this MLModel will  address.
          Choose from the following types:

          o Choose REGRESSION if the MLModel will be used to predict a numeric
            value.

          o Choose BINARY if the MLModel result has two possible values.

          o Choose MULTICLASS if the MLModel result has a  limited  number  of
            values.

          For  more  information,  see  the  Amazon Machine Learning Developer
          Guide .

       --parameters (map)
          A list of the training parameters in  the  MLModel  .  The  list  is
          implemented as a map of key/value pairs.

          The following is the current set of training parameters:

          o sgd.l1RegularizationAmount  -  Coefficient regularization L1 norm.
            It controls overfitting the data by penalizing large coefficients.
            This tends to drive coefficients to zero, resulting in sparse fea-
            ture set. If you use this parameter, start by specifying  a  small
            value such as 1.0E-08. The value is a double that ranges from 0 to
            MAX_DOUBLE. The default is not to use L1 normalization. The param-
            eter cannot be used when L2 is specified. Use this parameter spar-
            ingly.

          o sgd.l2RegularizationAmount - Coefficient regularization  L2  norm.
            It controls overfitting the data by penalizing large coefficients.
            This tends to drive coefficients to small, nonzero values. If  you
            use  this  parameter,  start  by  specifying a small value such as
            1.0E-08. The valuseis a double that ranges from 0  to  MAX_DOUBLE.
            The  default  is  not to use L2 normalization. This cannot be used
            when L1 is specified. Use this parameter sparingly.

          o sgd.maxPasses - Number of times that  the  training  process  tra-
            verses  the  observations  to  build the MLModel . The value is an
            integer that ranges from 1 to 10000. The default value is 10.

          o sgd.maxMLModelSizeInBytes - Maximum allowed  size  of  the  model.
            Depending  on  the  input data, the size of the model might affect
            its performance. The value is an integer that ranges  from  100000
            to 2147483648. The default value is 33554432.

       Shorthand Syntax:

          --parameters key_name=string,key_name2=string

       JSON Syntax:

          {"string": "string"
            ...}

       --training-data-source-id (string)
          The DataSource that points to the training data.

       --recipe (string)
          The  data  recipe for creating MLModel . You must specify either the
          recipe or its URI. If you dont specify a recipe or its  URI,  Amazon
          ML creates a default.

       --recipe-uri (string)
          The Amazon Simple Storage Service (Amazon S3) location and file name
          that contains the MLModel recipe. You must specify either the recipe
          or  its URI. If you dont specify a recipe or its URI, Amazon ML cre-
          ates a default.

       --cli-input-json (string) Performs service operation based on the  JSON
       string  provided. The JSON string follows the format provided by --gen-
       erate-cli-skeleton. If other arguments  are  provided  on  the  command
       line, the CLI values will override the JSON-provided values.

       --generate-cli-skeleton  (boolean)  Prints a sample input JSON to stan-
       dard output. Note the specified operation is not run if  this  argument
       is  specified.  The  sample  input  can  be  used  as  an  argument for
       --cli-input-json.

OUTPUT
       MLModelId -> (string)
          A user-supplied ID that uniquely identifies the MLModel . This value
          should be identical to the value of the MLModelId in the request.



                                                             CREATE-ML-MODEL()

Evaluationに関するもの

機械学習に於ける評価(Evaluation)に関するコマンドは以下の通り。

create-evaluation
get-evaluation 
delete-evaluation
update-evaluation
describe-evaluations

コマンドヘルプは以下の通り(例としてcreate-evaluationを挙げてみています)。割とシンプルで使い易そうな感じですね。

NAME
       create-evaluation -

DESCRIPTION
       Creates  a  new Evaluation of an MLModel . An MLModel is evaluated on a
       set of observations associated to a DataSource . Like a DataSource  for
       an  MLModel  , the DataSource for an Evaluation contains values for the
       Target Variable. The Evaluation compares the predicted result for  each
       observation  to  the  actual outcome and provides a summary so that you
       know how effective the MLModel functions on the test  data.  Evaluation
       generates  a relevant performance metric such as BinaryAUC, Regression-
       RMSE or MulticlassAvgFScore based on the  corresponding  MLModelType  :
       BINARY , REGRESSION or MULTICLASS .

       create-evaluation  is  an  asynchronous  operation. In response to cre-
       ate-evaluation  ,  Amazon  Machine  Learning  (Amazon  ML)  immediately
       returns  and  sets the evaluation status to PENDING . After the Evalua-
       tion is created and ready for use, Amazon ML sets the  status  to  COM-
       PLETED .

       You  can  use  the   get-evaluation  operation to check progress of the
       evaluation during the creation operation.

SYNOPSIS
            create-evaluation
          --evaluation-id <value>
          [--evaluation-name <value>]
          --ml-model-id <value>
          --evaluation-data-source-id <value>
          [--cli-input-json <value>]
          [--generate-cli-skeleton]

OPTIONS
       --evaluation-id (string)
          A user-supplied ID that uniquely identifies the Evaluation .

       --evaluation-name (string)
          A user-supplied name or description of the Evaluation .

       --ml-model-id (string)
          The ID of the MLModel to evaluate.

          The schema used in creating the MLModel must match the schema of the
          DataSource used in the Evaluation .

       --evaluation-data-source-id (string)
          The ID of the DataSource for the evaluation. The schema of the Data-
          Source must match the schema used to create the MLModel .

       --cli-input-json (string) Performs service operation based on the  JSON
       string  provided. The JSON string follows the format provided by --gen-
       erate-cli-skeleton. If other arguments  are  provided  on  the  command
       line, the CLI values will override the JSON-provided values.

       --generate-cli-skeleton  (boolean)  Prints a sample input JSON to stan-
       dard output. Note the specified operation is not run if  this  argument
       is  specified.  The  sample  input  can  be  used  as  an  argument for
       --cli-input-json.

OUTPUT
       EvaluationId -> (string)
          The user-supplied ID that uniquely identifies the Evaluation .  This
          value  should  be  identical to the value of the EvaluationId in the
          request.



                                                           CREATE-EVALUATION()

Predictionに関するもの

機械学習に於ける予測(Prediction)に関するコマンドは以下の通り。

predict
create-batch-prediction
get-batch-prediction
delete-batch-prediction
describe-batch-predictions
update-batch-prediction

ここでは、サンプルコマンドとしてpredictというものを見てみたいと思います。

NAME
       predict -

DESCRIPTION
       Generates  a prediction for the observation using the specified MLModel
       .

       NOTE:
          Note

          Not all response parameters will be populated because this is depen-
          dent on the type of requested model.

SYNOPSIS
            predict
          --ml-model-id <value>
          --record <value>
          --predict-endpoint <value>
          [--cli-input-json <value>]
          [--generate-cli-skeleton]
          
OPTIONS
       --ml-model-id (string)
          A unique identifier of the MLModel .

       --record (map)
          A map of variable name-value pairs that represent an observation.

       Shorthand Syntax:

          --record key_name=string,key_name2=string

       JSON Syntax:

          {"string": "string"
            ...}

       --predict-endpoint (string)

       --cli-input-json  (string) Performs service operation based on the JSON
       string provided. The JSON string follows the format provided by  --gen-
       erate-cli-skeleton.  If  other  arguments  are  provided on the command
       line, the CLI values will override the JSON-provided values.

       --generate-cli-skeleton (boolean) Prints a sample input JSON  to  stan-
       dard  output.  Note the specified operation is not run if this argument
       is specified.  The  sample  input  can  be  used  as  an  argument  for
       --cli-input-json.

OUTPUT
       Prediction -> (structure)
          The output from a predict operation:

          o Details   -   Contains   the   following   attributes:  DetailsAt-
            tributes.PREDICTIVE_MODEL_TYPE - REGRESSION | BINARY |  MULTICLASS
            DetailsAttributes.ALGORITHM - SGD

          o PredictedLabel - Present for either a BINARY or MULTICLASS MLModel
            request.

          o PredictedScores - Contains the  raw  classification  score  corre-
            sponding to each label.

          o PredictedValue - Present for a REGRESSION MLModel request.

          predictedLabel -> (string)
              The prediction label for either a BINARY or MULTICLASS MLModel .

          predictedValue -> (float)
              The prediction value for REGRESSION MLModel .

          predictedScores -> (map)
              Provides the raw  classification  score  corresponding  to  each
              label.

              key -> (string)

              value -> (float)

          details -> (map)
              Provides any additional details regarding the prediction.

              key -> (string)
                 Contains the key values of DetailsMap : PredictiveModelType -
                 Indicates the type of the MLModel . Algorithm - Indicates the
                 algorithm was used for the MLModel .

              value -> (string)



                                                                     PREDICT()

create-batch-predictionが以下の様なコマンド概要となっています。両者を比較すると

  • predict:指定されたモデルを使用して観察するための予測を生成
  • create-batch-prediction:観測のグループの予測を生成

となっているので(訳:Google先生)、それぞれ予測を作成するのだけれど作成の際の条件が異なる感じなのでしょう。ちょっと今時点では両者の違い、使いどころはピンと来てはいないですが、この辺りは追々理解を深めて行こうと思います。

NAME
       create-batch-prediction -

DESCRIPTION
       Generates  predictions for a group of observations. The observations to
       process exist in one or more data files referenced by  a  DataSource  .
       This  operation creates a new BatchPrediction , and uses an MLModel and
       the data files referenced by the DataSource as information sources.

       create-batch-prediction is an asynchronous operation.  In  response  to
       create-batch-prediction  ,  Amazon Machine Learning (Amazon ML) immedi-
       ately returns and sets the BatchPrediction status to  PENDING  .  After
       the BatchPrediction completes, Amazon ML sets the status to COMPLETED .

       You can poll for status  updates  by  using  the   get-batch-prediction
       operation  and  checking  the Status parameter of the result. After the
       COMPLETED status appears, the results are  available  in  the  location
       specified by the OutputUri parameter.

その他

上記以外のコマンドは以下2つが該当する模様。

create-realtime-endpoint           
delete-realtime-endpoint

以下がcreate-realtime-endpointの内容です。MLModelが処理を行う際に利用するエンドポイントを作るコマンドの様ですね。先述のエントリでも裏で作成されていたのでしょうか。(今エントリ執筆時点では一旦要素を消しちゃったので確認出来ていない)この辺りも改めて要素を作成したタイミングで確かめてみたいと思います。

NAME
       create-realtime-endpoint -

DESCRIPTION
       Creates  a  real-time  endpoint for the MLModel . The endpoint contains
       the URI of the MLModel ; that is, the location to send  real-time  pre-
       diction requests for the specified MLModel .

SYNOPSIS
            create-realtime-endpoint
          --ml-model-id <value>
          [--cli-input-json <value>]
          [--generate-cli-skeleton]

OPTIONS
       --ml-model-id (string)
          The ID assigned to the MLModel during creation.

       --cli-input-json  (string) Performs service operation based on the JSON
       string provided. The JSON string follows the format provided by  --gen-
       erate-cli-skeleton.  If  other  arguments  are  provided on the command
       line, the CLI values will override the JSON-provided values.

       --generate-cli-skeleton (boolean) Prints a sample input JSON  to  stan-
       dard  output.  Note the specified operation is not run if this argument
       is specified.  The  sample  input  can  be  used  as  an  argument  for
       --cli-input-json.
       
OUTPUT
       MLModelId -> (string)
          A user-supplied ID that uniquely identifies the MLModel . This value
          should be identical to the value of the MLModelId in the request.

       RealtimeEndpointInfo -> (structure)
          The endpoint information of the MLModel

          PeakRequestsPerSecond -> (integer)
              The maximum processing  rate  for  the  real-time  endpoint  for
              MLModel , measured in incoming requests per second.

          CreatedAt -> (timestamp)
              The  time  that the request to create the real-time endpoint for
              the MLModel was received. The time is expressed in epoch time.

          EndpointUrl -> (string)
              The URI  that  specifies  where  to  send  real-time  prediction
              requests for the MLModel .

              NOTE:
                 Note

                 The  application  must  wait  until the real-time endpoint is
                 ready before using this URI.

          EndpointStatus -> (string)
              The current status of the real-time endpoint for the  MLModel  .
              This element can have one of the following values:

              o NONE - Endpoint does not exist or was previously deleted.

              o READY  -  Endpoint  is  ready to be used for real-time predic-
                tions.

              o UPDATING - Updating/creating the endpoint.



                                                    CREATE-REALTIME-ENDPOINT()

まとめ

以上、AWS CLIのMachine Learningに関するコマンド探訪エントリでした。文字数的には大分増えてしまいましたが、大体がコマンドヘルプの転記なので中身的には大した事無いですw コマンドをこうして眺めてみる事で、どのような要素でサービスが構成されているかを確認する事も出来るので良いですね。ゆくゆくはこのAWS CLIコマンドを使って色々自動化する処理も組み込んで行きたいところです。こちらからは以上です。