Azure Machine LearningのリソースをVS Codeから手軽に管理できる拡張機能を使ってみる #VSCodejp #VSCode #AzureMachineLearning

Mr.Mo

2020.07.13

この記事は公開されてから1年以上経過しています。情報が古い可能性がありますので、ご注意ください。

こんにちは、Mr.Moです。

先日、Azure Machine Learning 用の VS Code 拡張機能でアップデートがあり、下記のアップデートに関する記事も拝見しておりました。せっかくですので本エントリで使っているところをまとめたいと思います。

https://devblogs.microsoft.com/python/enhance-your-azure-machine-learning-experience-with-the-vs-code-extension/

Azure Machine Learning 用の VS Code拡張機能とは

Azure Machine Learning for Visual Studio Code 拡張機能を使用すると、Visual Studio Code インターフェイスから Azure Machine Learning サービスを使用して、クラウドまたはエッジに機械学習モデルを簡単に構築、トレーニング、デプロイすることができます。この拡張機能の以前のバージョンは、Visual Studio Code Tools for AIという名前でリリースされていました。 Azure Machine Learning サービスを使用すると、以下のことが可能になります。・機械学習モデルの構築とトレーニングを高速化し、クラウドやエッジに簡単にデプロイすることができます。・TensorFlow、PyTorch、Jupyter などの最新のオープンソース・テクノロジーを利用することができます。・クラウド上の大規模な GPU 対応クラスタを利用して、ローカルで実験を行い、迅速にスケールアップまたはアウトすることができます。・自動化された機械学習とハイパーパラメータのチューニングでデータサイエンスを高速化します。・実験を追跡し、モデルを管理し、統合されたCI/CDツールで簡単にデプロイできます。この拡張機能をインストールすると、Visual Studio Code から直接このワークフローの多くを実行できます。

https://marketplace.visualstudio.com/items?itemName=ms-toolsai.vscode-ai

Azure Machine Learningのリソースを普段使い慣れているVS Codeから管理することができる優れた拡張機能になります。

さっそく使ってみる

今回は下記の資料をベースに作業を進めたいと思います。

https://github.com/microsoft/c9-dev-intro-data-science.git

事前準備

前提ですが、環境構築が楽なので開発環境はVisual Studio Codespacesを使用しています。

Visual Studio Codespacesが起動したら、Terminalで下記のコマンドを実行して必要な環境を構築します。

wget https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh
bash Anaconda3-2020.02-Linux-x86_64.sh
source ~/.bashrc
conda create -n my-azuremltest python=3.7 pandas jupyter seaborn scikit-learn keras tensorflow
conda activate my-azuremltest 
rm Anaconda3-2020.02-Linux-x86_64.sh

また、Azure Machine Learningの拡張機能を含めた下記の３つの拡張機能が必要になります。

ワークスペースの作成

まずワークスペースを作成しましょう。下記のように Azure Machine Learningの拡張機能を使用してVS Codeの画面上で作業可能です。

下記動画の途中で「Enterprise」の設定を選択していることに注意してください。後ほど出てくるAutoMLは「Enterprise」でしか使えない機能です。

作成が無事に完了すると下記の画面のようになります。

実験の作成

次にExperiments(実験)を作成します。手順は下記の通りです。

ちなみに作成したExperiments(実験)はVS Code上でアクティブにするか設定が可能で、後ほどコードで使用する設定ファイルと同期するようになってますので少し気にしておいてください。

Compute clusterの作成

トレーニングに使用するCompute clusterを作成します。こちらもVS Codeの画面上で作成できてしまいます。

データセットの作成

データセットには今回、コチラのデータを使用します。

データセットの作成が完了すると、VS Code上からデータの閲覧も可能です。

トレーニングの実行

ここからはコードも実行していきます。Python拡張機能を使用するとVS Code上でJupyter Notebookを使用することができます。詳しくは下記のエントリを参考にしてください。

[VS Code Python拡張] データサイエンスチュートリアルをやりながらVS CodeでのJupyter Notebookの使い方をマスターする

下記のコードを実行します。

import azureml.core
import pandas as pd
import numpy as np
import logging

print("AzureML SDK Version: ", azureml.core.VERSION)

from azureml.core import Workspace, Experiment

# aml_config か config.json が必要
ws = Workspace.from_config()

from azureml.core import Dataset

time_column_name = 'date'

dataset = Dataset.get_by_name(workspace=ws,name='MyWorkSp-data').with_timestamp_columns(fine_grain_timestamp=time_column_name)
dataset.take(5).to_pandas_dataframe().reset_index(drop=True)

from datetime import datetime
train = dataset.time_before(datetime(2012,8,31), include_boundary=True)
train.to_pandas_dataframe().tail(5).reset_index(drop=True)

test = dataset.time_after(datetime(2012,9,1), include_boundary=True)
test.to_pandas_dataframe().head(5).reset_index(drop=True)

from azureml.core.compute import ComputeTarget

compute_target = ComputeTarget(workspace=ws, name='MyCluster')

from azureml.train.automl import AutoMLConfig
from azureml.train.automl.constants import Tasks, SupportedModels

target_column_name = 'cnt'

time_series_settings = {
    'time_column_name': time_column_name,
    'max_horizon': 14,
    'country_or_region': 'JP',
    'target_lags': 'auto',
    'drop_column_names': ['casual', 'registered']
}

automl_config = AutoMLConfig(task=Tasks.FORECASTING,
                            primary_metric='normalized_root_mean_squared_error',
                            blacklist_models=[SupportedModels.Classification.ExtraTrees],
                            experiment_timeout_minutes=30,
                            training_data=train,
                            label_column_name=target_column_name,
                            compute_target=compute_target,
                            enable_early_stopping=True,
                            n_cross_validations=3,
                            max_concurrent_iterations=4,
                            max_cores_per_iteration=-1,
                            verbosity=logging.INFO,
                            **time_series_settings)

from azureml.core import Experiment

experiment = Experiment(workspace=ws, name='MyWorkSp-exp')
remote_run = experiment.submit(automl_config, show_output=False)
remote_run

remote_run.wait_for_completion()

コードを実行後、Azure Machine Learning Studio画面を開くリンクが表示されていますね。

リンクを開くと今回実行した実験の詳細な情報をこちらでも確認することができますので適宜使用してみてください。（また別の機会にAzure Machine Learning Studioでの機能もまとめたいと思っています）

モデルの登録

トレーニングの結果、一番良かったモデルを登録していきます。best_runの結果からモデル名(AutoMLb963fb5c833)も取得し

best_run, fitted_model = remote_run.get_output()

best_run

model_name = best_run.properties['model_name']
model_name

一番結果の良かった「73」のモデルをダウンロードします。

さらにダウンロードしたモデルを今回使用するモデルとして登録していきます。

エンドポントの作成

最後にエンドポイントの設定もしてしまいましょう。拡張機能のおかげで、ここまでの作業のかなりの部分を画面上で設定することができました。

使用したコードはコチラに記載しております。

まとめ

使い慣れているVS Codeの画面上でAzure Machine Learningのリソースを管理できるのは便利ですね。こちらの拡張機能は毎月のように更新されていたりと益々便利になっていきそうな気がします。（根拠はありませんが）今後どんどん注力して開発が進んでいくのではないかと思っています！

Azure Machine LearningのリソースをVS Codeから手軽に管理できる拡張機能を使ってみる #VSCodejp #VSCode #AzureMachineLearning

Azure Machine Learning 用の VS Code拡張機能とは

さっそく使ってみる

事前準備

ワークスペースの作成

実験の作成

Compute clusterの作成

データセットの作成

トレーニングの実行

モデルの登録

エンドポントの作成

まとめ

参考

関連記事

主なカテゴリ

AWSで探す

注目のテーマ

プロダクトやサービスで探す

特集やシリーズから探す

お問い合わせ

運営会社