【新サービス】Amazon Machine LearningのAWS CLI コマンドを確認してみた #AWSSummit #AmazonML

この記事は公開されてから1年以上経過しています。情報が古い可能性がありますので、ご注意ください。

事前準備：AWS CLIのアップグレード

まずはAWS CLI本体の更新です。以下コマンドでサクッと実行してください。

$sudo pip install --upgrade awscli$ aws --version
aws-cli/1.7.21 Python/2.7.6 Darwin/13.4.0


Amazon Machine Learning: 対応コマンド一覧

awsの後に続けるサービス名はmachinelearningで関連コマンドが表示される模様です。用意されているコマンドは全部で25個ありました。

\$ aws machinelearning (Tabを2回で補完表示実行)
create-batch-prediction            describe-data-sources
create-data-source-from-rds        describe-evaluations
create-data-source-from-redshift   describe-ml-models
create-data-source-from-s3         get-batch-prediction
create-evaluation                  get-data-source
create-ml-model                    get-evaluation
create-realtime-endpoint           get-ml-model
delete-batch-prediction            predict
delete-data-source                 update-batch-prediction
delete-evaluation                  update-data-source
delete-ml-model                    update-evaluation
delete-realtime-endpoint           update-ml-model
describe-batch-predictions


DataSourcesに関するもの

create-data-source-from-rds
create-data-source-from-redshift
create-data-source-from-s3
describe-data-sources
get-data-source
update-data-source
delete-data-source


Modelに関するもの

create-ml-model
delete-ml-model
get-ml-model
update-ml-model
describe-ml-models


ちなみに以下はcreate-ml-modelのコマンドヘルプを出してみたものになります。

NAME
create-ml-model -

DESCRIPTION
Creates  a  new MLModel using the data files and the recipe as informa-
tion sources.

An MLModel is nearly immutable. Users can only update  the  MLModelName
and the ScoreThreshold in an MLModel without creating a new MLModel .

create-ml-model  is  an  asynchronous  operation.  In  response to cre-
ate-ml-model , Amazon Machine Learning (Amazon ML) immediately  returns
and  sets  the MLModel status to PENDING . After the MLModel is created
and ready for use, Amazon ML sets the status to COMPLETED .

You can use the   get-ml-model  operation  to  check  progress  of  the
MLModel during the creation operation.
create-ml-model  requires  a  DataSource  with  computed statistics,
which can be created by setting ComputeStatistics to true  in   cre-
ate-data-source-from-rds  ,   create-data-source-from-s3  , or  cre-
ate-data-source-from-redshift operations.

SYNOPSIS
create-ml-model
--ml-model-id <value>
[--ml-model-name <value>]
--ml-model-type <value>
[--parameters <value>]
--training-data-source-id <value>
[--recipe <value>]
[--recipe-uri <value>]
[--cli-input-json <value>]
[--generate-cli-skeleton]

OPTIONS
--ml-model-id (string)
A user-supplied ID that uniquely identifies the MLModel .

--ml-model-name (string)
A user-supplied name or description of the MLModel .

--ml-model-type (string)
The category of supervised learning that this MLModel will  address.
Choose from the following types:

o Choose REGRESSION if the MLModel will be used to predict a numeric
value.

o Choose BINARY if the MLModel result has two possible values.

o Choose MULTICLASS if the MLModel result has a  limited  number  of
values.

For  more  information,  see  the  Amazon Machine Learning Developer
Guide .

--parameters (map)
A list of the training parameters in  the  MLModel  .  The  list  is
implemented as a map of key/value pairs.

The following is the current set of training parameters:

o sgd.l1RegularizationAmount  -  Coefficient regularization L1 norm.
It controls overfitting the data by penalizing large coefficients.
This tends to drive coefficients to zero, resulting in sparse fea-
ture set. If you use this parameter, start by specifying  a  small
value such as 1.0E-08. The value is a double that ranges from 0 to
MAX_DOUBLE. The default is not to use L1 normalization. The param-
eter cannot be used when L2 is specified. Use this parameter spar-
ingly.

o sgd.l2RegularizationAmount - Coefficient regularization  L2  norm.
It controls overfitting the data by penalizing large coefficients.
This tends to drive coefficients to small, nonzero values. If  you
use  this  parameter,  start  by  specifying a small value such as
1.0E-08. The valuseis a double that ranges from 0  to  MAX_DOUBLE.
The  default  is  not to use L2 normalization. This cannot be used
when L1 is specified. Use this parameter sparingly.

o sgd.maxPasses - Number of times that  the  training  process  tra-
verses  the  observations  to  build the MLModel . The value is an
integer that ranges from 1 to 10000. The default value is 10.

o sgd.maxMLModelSizeInBytes - Maximum allowed  size  of  the  model.
Depending  on  the  input data, the size of the model might affect
its performance. The value is an integer that ranges  from  100000
to 2147483648. The default value is 33554432.

Shorthand Syntax:

--parameters key_name=string,key_name2=string

JSON Syntax:

{"string": "string"
...}

--training-data-source-id (string)
The DataSource that points to the training data.

--recipe (string)
The  data  recipe for creating MLModel . You must specify either the
recipe or its URI. If you dont specify a recipe or its  URI,  Amazon
ML creates a default.

--recipe-uri (string)
The Amazon Simple Storage Service (Amazon S3) location and file name
that contains the MLModel recipe. You must specify either the recipe
or  its URI. If you dont specify a recipe or its URI, Amazon ML cre-
ates a default.

--cli-input-json (string) Performs service operation based on the  JSON
string  provided. The JSON string follows the format provided by --gen-
erate-cli-skeleton. If other arguments  are  provided  on  the  command
line, the CLI values will override the JSON-provided values.

--generate-cli-skeleton  (boolean)  Prints a sample input JSON to stan-
dard output. Note the specified operation is not run if  this  argument
is  specified.  The  sample  input  can  be  used  as  an  argument for
--cli-input-json.

OUTPUT
MLModelId -> (string)
A user-supplied ID that uniquely identifies the MLModel . This value
should be identical to the value of the MLModelId in the request.

CREATE-ML-MODEL()


Evaluationに関するもの

create-evaluation
get-evaluation
delete-evaluation
update-evaluation
describe-evaluations


コマンドヘルプは以下の通り(例としてcreate-evaluationを挙げてみています)。割とシンプルで使い易そうな感じですね。

NAME
create-evaluation -

DESCRIPTION
Creates  a  new Evaluation of an MLModel . An MLModel is evaluated on a
set of observations associated to a DataSource . Like a DataSource  for
an  MLModel  , the DataSource for an Evaluation contains values for the
Target Variable. The Evaluation compares the predicted result for  each
observation  to  the  actual outcome and provides a summary so that you
know how effective the MLModel functions on the test  data.  Evaluation
generates  a relevant performance metric such as BinaryAUC, Regression-
RMSE or MulticlassAvgFScore based on the  corresponding  MLModelType  :
BINARY , REGRESSION or MULTICLASS .

create-evaluation  is  an  asynchronous  operation. In response to cre-
ate-evaluation  ,  Amazon  Machine  Learning  (Amazon  ML)  immediately
returns  and  sets the evaluation status to PENDING . After the Evalua-
tion is created and ready for use, Amazon ML sets the  status  to  COM-
PLETED .

You  can  use  the   get-evaluation  operation to check progress of the
evaluation during the creation operation.

SYNOPSIS
create-evaluation
--evaluation-id <value>
[--evaluation-name <value>]
--ml-model-id <value>
--evaluation-data-source-id <value>
[--cli-input-json <value>]
[--generate-cli-skeleton]

OPTIONS
--evaluation-id (string)
A user-supplied ID that uniquely identifies the Evaluation .

--evaluation-name (string)
A user-supplied name or description of the Evaluation .

--ml-model-id (string)
The ID of the MLModel to evaluate.

The schema used in creating the MLModel must match the schema of the
DataSource used in the Evaluation .

--evaluation-data-source-id (string)
The ID of the DataSource for the evaluation. The schema of the Data-
Source must match the schema used to create the MLModel .

--cli-input-json (string) Performs service operation based on the  JSON
string  provided. The JSON string follows the format provided by --gen-
erate-cli-skeleton. If other arguments  are  provided  on  the  command
line, the CLI values will override the JSON-provided values.

--generate-cli-skeleton  (boolean)  Prints a sample input JSON to stan-
dard output. Note the specified operation is not run if  this  argument
is  specified.  The  sample  input  can  be  used  as  an  argument for
--cli-input-json.

OUTPUT
EvaluationId -> (string)
The user-supplied ID that uniquely identifies the Evaluation .  This
value  should  be  identical to the value of the EvaluationId in the
request.

CREATE-EVALUATION()


Predictionに関するもの

predict
create-batch-prediction
get-batch-prediction
delete-batch-prediction
describe-batch-predictions
update-batch-prediction


ここでは、サンプルコマンドとしてpredictというものを見てみたいと思います。

NAME
predict -

DESCRIPTION
Generates  a prediction for the observation using the specified MLModel
.

NOTE:
Note

Not all response parameters will be populated because this is depen-
dent on the type of requested model.

SYNOPSIS
predict
--ml-model-id <value>
--record <value>
--predict-endpoint <value>
[--cli-input-json <value>]
[--generate-cli-skeleton]

OPTIONS
--ml-model-id (string)
A unique identifier of the MLModel .

--record (map)
A map of variable name-value pairs that represent an observation.

Shorthand Syntax:

--record key_name=string,key_name2=string

JSON Syntax:

{"string": "string"
...}

--predict-endpoint (string)

--cli-input-json  (string) Performs service operation based on the JSON
string provided. The JSON string follows the format provided by  --gen-
erate-cli-skeleton.  If  other  arguments  are  provided on the command
line, the CLI values will override the JSON-provided values.

--generate-cli-skeleton (boolean) Prints a sample input JSON  to  stan-
dard  output.  Note the specified operation is not run if this argument
is specified.  The  sample  input  can  be  used  as  an  argument  for
--cli-input-json.

OUTPUT
Prediction -> (structure)
The output from a predict operation:

o Details   -   Contains   the   following   attributes:  DetailsAt-
tributes.PREDICTIVE_MODEL_TYPE - REGRESSION | BINARY |  MULTICLASS
DetailsAttributes.ALGORITHM - SGD

o PredictedLabel - Present for either a BINARY or MULTICLASS MLModel
request.

o PredictedScores - Contains the  raw  classification  score  corre-
sponding to each label.

o PredictedValue - Present for a REGRESSION MLModel request.

predictedLabel -> (string)
The prediction label for either a BINARY or MULTICLASS MLModel .

predictedValue -> (float)
The prediction value for REGRESSION MLModel .

predictedScores -> (map)
Provides the raw  classification  score  corresponding  to  each
label.

key -> (string)

value -> (float)

details -> (map)
Provides any additional details regarding the prediction.

key -> (string)
Contains the key values of DetailsMap : PredictiveModelType -
Indicates the type of the MLModel . Algorithm - Indicates the
algorithm was used for the MLModel .

value -> (string)

PREDICT()


create-batch-predictionが以下の様なコマンド概要となっています。両者を比較すると

• predict：指定されたモデルを使用して観察するための予測を生成
• create-batch-prediction：観測のグループの予測を生成

NAME
create-batch-prediction -

DESCRIPTION
Generates  predictions for a group of observations. The observations to
process exist in one or more data files referenced by  a  DataSource  .
This  operation creates a new BatchPrediction , and uses an MLModel and
the data files referenced by the DataSource as information sources.

create-batch-prediction is an asynchronous operation.  In  response  to
create-batch-prediction  ,  Amazon Machine Learning (Amazon ML) immedi-
ately returns and sets the BatchPrediction status to  PENDING  .  After
the BatchPrediction completes, Amazon ML sets the status to COMPLETED .

You can poll for status  updates  by  using  the   get-batch-prediction
operation  and  checking  the Status parameter of the result. After the
COMPLETED status appears, the results are  available  in  the  location
specified by the OutputUri parameter.



その他

create-realtime-endpoint
delete-realtime-endpoint


NAME
create-realtime-endpoint -

DESCRIPTION
Creates  a  real-time  endpoint for the MLModel . The endpoint contains
the URI of the MLModel ; that is, the location to send  real-time  pre-
diction requests for the specified MLModel .

SYNOPSIS
create-realtime-endpoint
--ml-model-id <value>
[--cli-input-json <value>]
[--generate-cli-skeleton]

OPTIONS
--ml-model-id (string)
The ID assigned to the MLModel during creation.

--cli-input-json  (string) Performs service operation based on the JSON
string provided. The JSON string follows the format provided by  --gen-
erate-cli-skeleton.  If  other  arguments  are  provided on the command
line, the CLI values will override the JSON-provided values.

--generate-cli-skeleton (boolean) Prints a sample input JSON  to  stan-
dard  output.  Note the specified operation is not run if this argument
is specified.  The  sample  input  can  be  used  as  an  argument  for
--cli-input-json.

OUTPUT
MLModelId -> (string)
A user-supplied ID that uniquely identifies the MLModel . This value
should be identical to the value of the MLModelId in the request.

RealtimeEndpointInfo -> (structure)
The endpoint information of the MLModel

PeakRequestsPerSecond -> (integer)
The maximum processing  rate  for  the  real-time  endpoint  for
MLModel , measured in incoming requests per second.

CreatedAt -> (timestamp)
The  time  that the request to create the real-time endpoint for
the MLModel was received. The time is expressed in epoch time.

EndpointUrl -> (string)
The URI  that  specifies  where  to  send  real-time  prediction
requests for the MLModel .

NOTE:
Note

The  application  must  wait  until the real-time endpoint is

EndpointStatus -> (string)
The current status of the real-time endpoint for the  MLModel  .
This element can have one of the following values:

o NONE - Endpoint does not exist or was previously deleted.

o READY  -  Endpoint  is  ready to be used for real-time predic-
tions.

o UPDATING - Updating/creating the endpoint.

CREATE-REALTIME-ENDPOINT()