Amazon EchoでiPhoneを探す

音声アシスタント特集

#Alexa Skills Kit

#Amazon Echo

#AWS Lambda

#Amazon Alexa

#AWS

横田慎介

2017.03.17

この記事は公開されてから1年以上経過しています。情報が古い可能性がありますので、ご注意ください。

はじめに

AlexaにSkillを追加して、Amazon EchoからiPhoneを鳴らせるようにしたので、その方法を紹介します。

Amazon Echoに向かって、

Alexa, ask iPhone finder

というと自分のAppleデバイスの一覧が読み上げられ、

Number 4

と鳴らしたいデバイスの番号を答えることで、iPhoneなどが鳴ります。

また、デバイスと番号の対応付けを覚えている場合は、

Alexa, ask iPhone finder where is number 0

と最初からデバイスを指定することで、いきなり対象のデバイスが鳴ります。
それでは作り方を見ていきます。

Alexa Skillの基礎については、過去の投稿もあるのでよろしければご参照ください。
【Alexa初心者向け】Alexa Skill Kitを噛み砕いて解説してみる
 AWS Lambdaを使ってAmazon Echoに機能追加してみた

概要

全体の構成は下の図のようになります。

音声の入出力端末であるAmazon EchoがユーザからiPhoneを探すよう要求を受け、その音声データを音声解析サービスであるAmazon Alexaに送ります。
Alexaは音声を解析し、文字に変換し、ユーザの発話内容から要求の意図を汲み取り、それをエンドポイントであるAWS Lambdaに送ります。
AWS Lambdaは指定された端末を鳴らすようにAppleのiCloudにリクエストを投げます。
iCloudが指定された端末を鳴らすことでユーザはiPhoneを見つけることができます。

今回作成するのは、Alexaの拡張機能であるAlexa SkillとAlexaからイベントを受け取るLambdaファンクションです。
Lambdaファンクションから作って行きます。

エンドポイント(AWS Lambda)の設定

今回はAWS Lambdaをエンドポイントとして利用しました。
Lambdaファンクションで行うのは、次の二つです。

デバイスの一覧を返す
指定されたデバイスを鳴らす

Lambdaファンクションの作成

実行ファイルは後で作ることにして、先にAWS Lambdaファンクションを作成します。
バージニアリージョンのLambdaの画面を開き、AlexaのBlueprintからファンクションを作成します。

言語はPython2.7を使うので、「alexa-skills-kit-color-expert-python」というブループリントを選択します。

「Runtime」がPython2.7になっていることを確認してください。
「Name」と「Description」は分かりやすいものを入力します。
「Lambda function code」は下の写真ではzipファイルをアップロードしていますが、とりあえずはBlueprintの初期値のままで構いません。のちほど、実行コードを作成した後でアップロードします。

次に実行コード内で利用する環境変数を指定します。

Key	Valueの内容	必須	暗号化
APPLE_ID	{自分のアップルID}	必須	する
APPLE_PASSWORD	{自分のアップルパスワード}	必須	する
APPLICATION_ID	{Alexa SkillのID}	必須	する
TARGET_DEVICE_NAME	{デバイス名初期値}	任意	しない

「APPLICATION_ID」はのちほどAlexa Skillの設定をする時に割り振られるので、今は空欄にしておいてください。
「TARGET_DEVICE_NAME」を指定しておくと"Ask iPhone finder"と呼びかけた時に、この環境変数のあたいのデイバイスが鳴ります。指定がない場合は、デイバイスの一覧が読み上げられます。鳴らしたいデバイスがあらかじめ決まっているときはこの環境変数を作っておくと楽です。
Appleのアカウント情報は漏れると嫌なので「Enable encryption helpers」を有効にして暗号化します。
AWS KMSのキーがない場合は、案内に従って作成します。
「TARGET_DEVICE_NAME」以外は「Encrypt」をクリックして暗号化を有効にします。

「Handler」と「Role」の設定を行います。
「Handler」は初期値のまま、つまり「lambda_function.lambda_handler」にします。
「Role」は「Create new role from template(s)」を選択し、「KMS decryption permissions」を付与します。先ほどの環境変数を複合化する必要があるからです。

これでLambdaファンクションの設定は一旦完了です。実行コードとAlexa Skillの設定が終わった後に一部修正します。

Lambdaファンクションで実行するコードの作成

言語はPython2.7を使うのでした。
また、iCloudを操作するためにpyicloudというライブラリを利用しています。
作業用ディレクトリを作成し、必要なライブラリをインストールします。
ライブラリもコードと一緒にLambdaにアップロードする必要があるので、"pip install"に"-t ."をつけて、カレントディレクトリにライブラリがインストールされるようにします。

yokota.shinsuke% pyenv virtualenv 2.7.10 env27
yokota.shinsuke% mkdir iphone-finder
yokota.shinsuke% cd iphone-finder
yokota.shinsuke% pyenv local env27
(env27) yokota.shinsuke% pip install -t . pyicloud
Collecting pyicloud
  Downloading pyicloud-0.9.1.tar.gz
(中略)
  Building wheels for collected packages: pyicloud
  Running setup.py bdist_wheel for pyicloud ... done
  Stored in directory: /Users/yokota.shinsuke/Library/Caches/pip/wheels/ad/86/66/c4384bc3598b9a864ba178da21d64ced0a8a461b638fc14fae
Successfully built pyicloud
Installing collected packages: requests, keyring, keyrings.alt, click, six, pytz, tzlocal, certifi, bitstring, pyicloud
Successfully installed bitstring-3.1.5 certifi-2017.1.23 click-6.7 keyring-8.7 keyrings.alt-1.3 pyicloud-0.9.1 pytz-2016.10 requests-2.13.0 six-1.10.0 tzlocal-1.3

Lambdaファンクションでの実行ファイルをlambda_function.pyという名前で作ります。

(env27) yokota.shinsuke% cat lambda_function.py

# -*- coding: utf-8 -*-

from __future__ import print_function
from pyicloud import PyiCloudService
from base64 import b64decode
import boto3
import os

APPLICATION_ID = boto3.client('kms').decrypt(CiphertextBlob=b64decode(os.environ['APPLICATION_ID']))['Plaintext']
APPLE_ID = boto3.client('kms').decrypt(CiphertextBlob=b64decode(os.environ['APPLE_ID']))['Plaintext']
APPLE_PASSWORD = boto3.client('kms').decrypt(CiphertextBlob=b64decode(os.environ['APPLE_PASSWORD']))['Plaintext']

# --------------- Helpers that build all of the responses ----------------------

def build_speechlet_response(title, output, reprompt_text, should_end_session):
    return {
        'outputSpeech': {
            'type': 'PlainText',
            'text': output
        },
        'card': {
            'type': 'Simple',
            'title': "SessionSpeechlet - " + title,
            'content': "SessionSpeechlet - " + output
        },
        'reprompt': {
            'outputSpeech': {
                'type': 'PlainText',
                'text': reprompt_text
            }
        },
        'shouldEndSession': should_end_session
    }


def build_response(session_attributes, speechlet_response):
    return {
        'version': '1.0',
        'sessionAttributes': session_attributes,
        'response': speechlet_response
    }


# --------------- Functions that control the skill's behavior ------------------

def get_help_response(intent, session):
    session_attributes = session['attributes']
    card_title = "Help"
    speech_output = "I will find your Apple device.Say 'List devices'"
    reprompt_text = session_attributes
    should_end_session = False

    return build_response(session_attributes, build_speechlet_response(
        card_title, speech_output, reprompt_text, should_end_session))


def done_response(target_device):
    session_attributes = {}
    card_title = "Done"
    speech_output = "%s will sound soon" % target_device[1]
    reprompt_text = None
    should_end_session = True

    return build_response(session_attributes, build_speechlet_response(
        card_title, speech_output, reprompt_text, should_end_session))


def no_device_response():
    session_attributes = {}
    card_title = "Done"
    speech_output = "No device on your account."
    reprompt_text = None
    should_end_session = True

    return build_response(session_attributes, build_speechlet_response(
        card_title, speech_output, reprompt_text, should_end_session))


def select_device_response(intent, session):
    devices = get_devices(session)
    session_attributes = {'devices': devices}
    card_title = "Select your device"
    speech_output = "Tell me which device do you want to find"
    for (index, device) in enumerate(devices):
        speech_output += ", %s is Number %i, " %(device[1], index)
    reprompt_text = speech_output
    should_end_session = False

    return build_response(session_attributes, build_speechlet_response(
        card_title, speech_output, reprompt_text, should_end_session))


def handle_session_end_request():
    session_attributes = {}
    card_title = "Session Ended"
    speech_output = "Have a nice day! "
    should_end_session = True
    reprompt_text = None

    return build_response(session_attributes, build_speechlet_response(
        card_title, speech_output, reprompt_text, should_end_session))


def get_devices(session):
    if session.get('attributes', {}) and "devices" in session.get('attributes', {}):
        devices = session['attributes']['devices']
    else:
        api = PyiCloudService(APPLE_ID, APPLE_PASSWORD)
        devices = []
        for (id, device) in api.devices.items():
            devices.append((id, str(device)))

    if len(devices) == 0:
        return no_device_response()
    else:
        return devices


def play_device(intent, session):
    devices = get_devices(session)
    target_device = None

    if intent is None and 'TARGET_DEVICE_NAME' in os.environ:
        candidate_devices = [d for d in devices if d[1] == os.environ['TARGET_DEVICE_NAME']]
        if len(candidate_devices) > 0:
            target_device = candidate_devices[0]
    elif 'TargetDeviceNumber' in intent['slots']:
        target_index = int(intent['slots']['TargetDeviceNumber']['value'])
        if target_index >= 0 and target_index < len(devices):
            target_device = devices[target_index]

    if target_device is None:
        return select_device_response(intent, session)
    else:
        api = PyiCloudService(APPLE_ID, APPLE_PASSWORD)
        api.devices[target_device[0]].play_sound()
        return done_response(target_device)


# --------------- Events ------------------

def on_session_started(session_started_request, session):
    """ Called when the session starts """

    print("on_session_started requestId=" + session_started_request['requestId']
          + ", sessionId=" + session['sessionId'])


def on_launch(launch_request, session):
    """ Called when the user launches the skill without specifying what they
    want
    """
    print("on_launch requestId=" + launch_request['requestId'] +
          ", sessionId=" + session['sessionId'])

    # 探すデバイスが環境変数で指定されている場合は、いきなり鳴らす
    # 指定がない場合はデバイスのリストを返す
    if 'TARGET_DEVICE_NAME' in os.environ:
        return play_device(None, session)
    else:
        return select_device_response(None, session)


def on_intent(intent_request, session):
    """ Called when the user specifies an intent for this skill """

    print("on_intent requestId=" + intent_request['requestId'] +
          ", sessionId=" + session['sessionId'])

    intent = intent_request['intent']
    intent_name = intent_request['intent']['name']

    # Dispatch to your skill's intent handlers
    if intent_name == "TargetDeviceIsIntent":
        return play_device(intent, session)
    elif intent_name == "ListMyDevicesIntent":
        return select_device_response(intent, session)
    elif intent_name == "AMAZON.HelpIntent":
        return get_help_response(intent, session)
    elif intent_name == "AMAZON.CancelIntent" or intent_name == "AMAZON.StopIntent":
        return handle_session_end_request()
    else:
        raise ValueError("Invalid intent")


def on_session_ended(session_ended_request, session):
    """ Called when the user ends the session.

    Is not called when the skill returns should_end_session=true
    """
    print("on_session_ended requestId=" + session_ended_request['requestId'] +
          ", sessionId=" + session['sessionId'])
    # add cleanup logic here


# --------------- Main handler ------------------

def lambda_handler(event, context):
    """ Route the incoming request based on type (LaunchRequest, IntentRequest,
    etc.) The JSON body of the request is provided in the event parameter.
    """
    print("event.session.application.applicationId=" +
          event['session']['application']['applicationId'])

    """
    Prevent someone else from configuring a skill that sends requests to this
    function.
    """
    if (event['session']['application']['applicationId'] != APPLICATION_ID):
        raise ValueError("Invalid Application ID")

    if event['session']['new']:
        on_session_started({'requestId': event['request']['requestId']},
                           event['session'])

    if event['request']['type'] == "LaunchRequest":
        return on_launch(event['request'], event['session'])
    elif event['request']['type'] == "IntentRequest":
        return on_intent(event['request'], event['session'])
    elif event['request']['type'] == "SessionEndedRequest":
        return on_session_ended(event['request'], event['session'])

これで実行コードの準備はできたのでzipで固めて、先ほどのLambdaファンクションにアップロードします。

(env27) yokota.shinsuke% zip -r ~/Desktop/iphone-finder.zip *                                            [~/lab/iphone-finder]
  adding: __pycache__/ (stored 0%)
  adding: __pycache__/bitstring.cpython-35.pyc (deflated 69%)
  adding: __pycache__/six.cpython-35.pyc (deflated 60%)
  adding: bitstring-3.1.5.dist-info/ (stored 0%)
  adding: bitstring-3.1.5.dist-info/DESCRIPTION.rst (deflated 52%)
(中略)
  adding: tzlocal-1.3.dist-info/WHEEL (stored 0%)
  adding: tzlocal-1.3.dist-info/zip-safe (stored 0%)

AWSのコンソールでLambdaファンクションの設定画面に戻り、zipファイルをアップロードしましょう。

Lambdaファンクションの動作確認

Lambdaのテスト機能を使って、テスト用イベントを発行してみます。
上部の「Action」から「Configure test event」を選んでください。

次のようなイベントを登録し、「Save」します。

{
  "session": {
    "sessionId": "SessionId.daad8f9b-118e-466a-b6c3-95bdcd431a35",
    "application": {
      "applicationId": "dummy"
    },
    "attributes": {},
    "user": {
      "userId": "dummy"
    },
    "new": true
  },
  "request": {
    "type": "IntentRequest",
    "requestId": "EdwRequestId.7ed3c419-398d-4c74-8cc0-9da6f3d4a51a",
    "locale": "en-US",
    "timestamp": "2017-03-17T06:18:03Z",
    "intent": {
      "name": "ListMyDevicesIntent",
      "slots": {}
    }
  },
  "version": "1.0"
}

Lambdaファンクションの環境変数「APPLICATION_ID」の値をテストイベントに合わせて「dummy」にした上で「Encrypt」し保存、テストして見ましょう。

問題なく設定できていれば、「Execution Result」が「Succeeded」になります。

エラーが出た場合はデバッグしてください。

これでエンドポイントの準備ができたので、次はAlexa Skillの設定を行います。
Amazon Echoから受け取った音声をいま作ったエンドポイントにつなぐ部分です。

Alexa Skillの設定

Alexa Skillの作成はAmazon Developer Console上で行います。

利用するAmazon Echoで使っているAmazonのアカウントでログインおよび登録を行ってください。
今回は作成するAlexa Skillは一般には公開しません。なので、Alexa Skills Storeからはダウンロードできないのですが、Amazon Developerと同じアカウントで登録されているAmazon Echoでは非公開のAlexa Skillも利用できます。

Skillの作成

ログインしたら、上部の「Alexa」から「Alexa Skill Kit」を選択してください。

新しくSkillを作成します。

最初は「Skill Information」の設定です。
「Skill Type」は「Custom Interaction Model」を選び、「Name」は「iPhoneFinder」、「Invocation Name」は「iPhone Finder」とします。
「Name」はAlexa Skills Storeで表示される名前で、「Invocation Name」はこのスキルを呼び出すときの名前になります。
この設定だとAmazon Echoに"Alexa, ask iPhone Finder"と呼びかけることで、このスキルが発動します。呼びやすい単語を選びましょう。

次は「Interaction Model」の設定です。
ここで、エンドポイントとAlexa、そしてAlexaとユーザがどのようなやり取りを行うか定義します。

まずは「Intent Schema」を定義します。

{
  "intents": [
    {
      "intent": "TargetDeviceIsIntent",
      "slots": [
        {
          "name": "TargetDeviceNumber",
          "type": "AMAZON.NUMBER"
        }
      ]
    },
    {
      "intent": "ListMyDevicesIntent"
    }
  ]
}

"intent"は日本語に訳すと「意図」です。
先ほど設定したLambdaファンクションはAlexaから"intent"を受け取り、"intent"に応じた処理を行います。
上の例では2つのintentが定義されています。
"TargetDeviceIsIntent"は特定のデバイスを鳴らしたいというintentです。このintentではデバイスを特定するために"slots"を持っています。"TargetDeviceIsIntent"は数値("AMAZON.NUMBER")が入る"TargetDeviceNumber"という"slot"を持っています。この値は"intent"情報としてエンドポイント(Lambdaファンクション)に渡されます。
もう一つの"intent"、"ListMyDevicesIntent"は自分のデバイスのリストが欲しいというintentです。付加情報は必要ないので"slots"を持っていません。
エンドポイントに渡される全ての"intent"を定義したものがIntent Schemaです。
ここでAlexaとエンドポイントでやり取りされる情報の定義を行なっています。

つぎに、「Sample Utterances」を列挙していきます。

TargetDeviceIsIntent number {TargetDeviceNumber}
TargetDeviceIsIntent find {TargetDeviceNumber}
TargetDeviceIsIntent find number {TargetDeviceNumber}
TargetDeviceIsIntent where is number {TargetDeviceNumber}
TargetDeviceIsIntent where is {TargetDeviceNumber}
ListMyDevicesIntent list
ListMyDevicesIntent list my iphone
ListMyDevicesIntent my iphone
ListMyDevicesIntent list my devices
ListMyDevicesIntent list my iphone

"utterance"は日本語だと「発言」という意味ですが、ここでは"intent"の発話表現という趣旨だと思います。
例えば、自分のデバイスのリストが欲しいというintentである"ListMyDevicesIntent"を表現する場合、単に"List"と言うこともあれば、"List my devices"と発言することもあるでしょう。
上の例ではどちらの"utterance"も"ListMyDevicesIntent"に対応づけられているので、Alexaはこれらの発言を"ListMyDevicesIntent"と解釈して、エンドポイントに伝えます。
実際にAmazon Echoに話しかけるときは

Alexa, open iPhone Finder for list

や

Alexa, ask iPhone Finder list my devices

となる必要があります。
"Alexa"がAmazon Echoの起動キーワード、"open"や"ask"はスキルの発動キーワードです。このキーワードの後にスキルの"Invocation Name"が呼ばれ、その後にここで定義している"utterance"が続きます。
発動キーワードは文脈によって自然なものを選べるようにたくさん用意されています。
発動キーワード一覧
「Sample Utterances」はこれらの"utterance"を集めたものです。
できるだけ多くの場面でAlexaがユーザの意図を汲み取れるようにするために、できるだけたくさん列挙した方が良いでしょう。

次の「Configuration」のページで先ほど作成したLamda FanctionのARNを入力します。
バージニアリージョンで作ったのでリージョンは「North America」を選びます。

これでAmazon EchoからAlexaに届いた情報をLambdaファンクションに送れるようになりました。

この時点ですでにこのスキルのApplication Idは発行されており「Skill Information」のページで確認できます。

AWSコンソールからLambdaファンクションの設定ページに行き、"amzn1"から始まるApplication Idを環境変数「APPLICATION_ID」に入力し「Encrypt」後に保存します。

これでLambdaファンクションの設定は完了です。