Voice User Interface design tips for Alexa: Developers.IO 2017 WORLD in VANCOUVER #cmdevio2017 #reinvent

2018.01.05

この記事は公開されてから1年以上経過しています。情報が古い可能性がありますので、ご注意ください。

Classmethod Canada held a tech event Cloud & IoT Technologies - Developers.IO on December 18, 2017 in Vancouver.

In this session 'Voice User Interface design tips for Alexa', I talked about basic concepts of Voice User Interface (VUI) and some tips of designing VUI for Alexa. As you can see many report articles on this blog, various sessions dealt with Amazon Alexa on the last re:Invent.

This article describes the content of the following presentation.

Introduction

I have been building Alexa Skills since last spring and have released some simple skills for the US marketplace. And I have also designed Voice User Interfaces for Alexa skills for Japanese enterprise clients.

In this session, I would like to talk about some basic elements of designing Voice User Interfaces.

  1. What is Alexa VUI Design
  2. Process of Designing VUI for Alexa skills
  3. VUI Design Tips for Alexa skills

When you develop an Alexa Skill, you will have to think about VUI design. I would like to share some ideas based on our experiences with such a situation.

1. What is Alexa VUI Design

VUI designing is one of the system development tasks. This is where we define the conversations between the user and Alexa. Ideally, we should make the conversation faster, easier, and more fun. In addition, we should predict various errors in the conversation beforehand and create routes to deal with them.

Look at the following conversation:

This is an example of an interaction between the user and Alexa. In this skill, Alexa provides an insurance fee estimate for a trip abroad. The user needs to tell the destination and time period of the trip to Alexa.

A user will invoke the skill by saying, 'Alexa, start ABC-Travel insurance.' Alexa will respond, 'Hi, I can provide an insurance fee for your trip. Where would you like to go?' The user can say, 'Korea,' then Alexa will ask her, 'Sounds nice. How long will you be there?' The user can answer, 'One week.'

Some parts of this conversation will be subject to VUI design. First of all, the invocation name used to call the skill. And what information the skill needs to achieve a user's purpose, and what kinds of utterances the skill accepts.

In summary, the VUI design is an activity to define the following things:

  • Invocation name of the skill
  • All routes that users can take to achieve a purpose
  • How to support users who are confused
  • How to improve the skill to make interactions more human

2. Process of Designing VUI for Alexa Skills

The VUI design process consists of five steps:

  1. Gathering user requirements
  2. Gathering system requirements
  3. creating script 4, Creating dialog flow
  4. Defining interactive models

Let's go through them in order.

1. Gathering user requirements

The first thing to do is to identify the purpose for people to use the skill. This is similar to the requirement definition that we do in general system development project. We do this by writing out the requirements of the system's users in simple terms.

Answering the following questions helps you identify the user requirements:

  • Why do people want to use that skill?
  • What will people do before, during, and after using the skill?
  • What will people get by using the skill?

2. Gathering system requirements

The next step is system requirements gathering for the skill. This process includes defining skill functions. The most important thing is to define the functions while adhering to the user requirements. You must include functions to fulfill the purpose of using the skill. In the other words, it should not contain functions that are not related to the purpose of use.

I will describe about the following things in order:

  • Identify the skill scope
  • Defining Input / Output of the skill
  • Identify intersystem links

Identify the skill scope

Identify what the skill will or will not do, and decide the scope of the skill.

For example, this insurance skill includes the following functions:

  • To provide an insurance fee estimate
  • You can change the destination and the travel period for another estimate

However, it does not include the following functions:

  • Presenting various insurance plans
  • The user actually is able to purchase an insurance product which this skill provides

Defining output / input

Write out items of information that the skill provides to users. This is the Alexa skill's output. Also, let's write out items of information that the skill needs to achieve the user's purpose. This is the Alexa skill's input.

In this example, the user gives Alexa the destination and duration of the trip as input. Also, after getting the estimate, the user will answer whether he or she would like another trip estimate. This is also one of the inputs. Alexa tells the user the estimated amount of insurance and asks the user whether to continue estimating for other trips.

Identify intersystem links

Also, let's check if there are other systems to interact with the skill. Other systems are, for example, Amazon S3 for storing the skill's contents files, a user authentication system for existing member services, and backend databases and so on.

3. Creating Scripts

The third thing to do is create scripts. A script is sequence of sentences that represents dialogue between Alexa and a user. Writing scripts helps to design natural interactions.

In this step, you can create scripts for the most simple route to achieve the user's purpose. You do not need to try to cover all paths.

4. Creating Dialogue Flow

The fourth thing to do is to create a dialog flow. We will extend the script created in the previous step so that the skill can respond more widely to user utterances.

  • Support more varieties of inputs.
  • Provide more user guidance

Look at this example.

The path on the yellow background on the left is the main route that we already considered in the previous step. In this step, we will create routes with a blue background. For example, in the central route, we deal with the case where the user gives a travel destination and a period at the same time. In the route on the right, we are considering how the skill leads the user when he or she has not decided on the destination.

5. Defining interactive models

The last thing to do is define Intents, Slots, and Sample Utterances.1

Intents represent what users can ask the skill to do. There are some generic intents for various skills. For example, Help, Cancel, Stop, StartOver, Repeat, etcetera. On the other hand, in this example, the skill needs the Intents to set a trip destination and duration.

And, SetDestination Intent has a slot value for receiving the country name of the destination specified by the user.

Sample utterances are the phrases that the user can speak. It is recommended that you define more than 30 utterances per intent.

3. VUI Design Tips for Alexa Skills

I will explain seven tips for VUI design in this session.

  1. Be concise, be precise
  2. Use natural language
  3. Add variety
  4. Be friendly
  5. Use lists effectively
  6. Handle various problems
  7. Provide contextual help

These contents are based on best practices recommended by Amazon's official VUI design guide and our knowledge obtained from trial and error when developing customer skills.

I'll give good and bad examples for each tip.

1. Be concise, be precise

First, let's make concise and precise Alexa phrases. Users don't want to wait for Alexa to finish speaking long lines. Since Alexa devices are hands-free, users might not be in front of a computer listening carefully to phrases when they use our skills. Also, long lines are easy for users to misunderstand, and it is easy for users to make utterances unexpected by developers.

Try to use the One-breath test. Try actually speaking the utterance and see whether it can be said within one breath. If you can say it in one breath, that's perfect.

A bad example

User: 'I'll go to Victoria.'

Alexa: 'Sorry, I won't be able to estimate insurance fee for the destination you designated because this skill is for a trip abroad so please try again with another destination.'

Alexa returns a long phrase that we cannot say in a breath.

A good example

User: 'I'll go to Victoria.'

Alexa: 'Sorry, domestic travel is not supported right now.'

Alexa simply says 'Domestic travel is not supported.'

2. Use spoken language

Use spoken language in Alexa prompts for natural interactions. Do not use phrases like an old robot.

Let Alexa make human-like interactions. It helps users more easily understand Alexa and draws human-like responses from users.

A bad example

Alexa: 'Let's estimate your insurance fee. Say, "I'd like to go to a city name" to say where you'd like to go. Say, "I'll travel a specific number of days" to say how long you'll travel. What would you like to do?'

It is not natural to directly communicate to users the phrases expected by the skill.

A good example

Alexa: 'Let's estimate your insurance fee. Where would you like to go?'

In this case, the user can respond more freely.

3. Add variety

To spice thing up, let's add complexity and surprises to our skills. This is a different story from the principle that Alexa's phrases should be kept simple. Attaching variations in expressions and vocabulary, let your users experience non-mechanical conversations.

Especially, it is effective to add random phrases to the route the user repeatedly uses. For example, the first and last greetings of the skill and the phrases that tell the user if their answer is correct or incorrect in quiz skills. Speechcon would be useful for that.

A bad example

This is dialogue from a quiz skill.

Alexa: 'Question. What is at the end of a rainbow?'

User: 'Doub-LU.'

Alexa: 'Correct. Next question. What kind of salad do honeymooners order at a restaurant?'

User: 'Lettuce alone.'

Alexa: 'Correct. Next question. …'

Alexa's response when the user answers the quiz is always "correct". Don't you think this is too mechanical?

A good example

These are some improved phrases.

Alexa ‘Question. What is at the end of a rainbow?’

User ‘Doub-LU.’

Alexa ‘Bingo. Next question. What kind of salad do honeymooners order at a restaurant?’

User ‘Lettuce alone.’

Alexa ‘Well done. Next question. …’

We randomized Alexa's words.

4. Be friendly

Avoid phrases that are too polite. Alexa is not a servant, it is a user friendly-assistant. Think of Alexa skills as a plug-in that extends Alexa's original functions. It is skills that train the original Alexa and increase what Alexa can do. Therefore, it would be natural to make it seamless with the original personality.

If you do not know the original personality of Alexa, try some of the built-in functions. Let's ask her the weather, to tell some jokes, and to sing a song.

A bad example

This is the last phrase of the skill.

Alexa: 'Thank you for trying ABC-Travel insurance, and we look forward to your inquiry.'

Of course, the user can understand the meaning, but it seems like an email.

A good example

Instead, let's use a familiar spoken phrase.

Alexa: 'Have a safe trip. See you soon.'

This is simpler than the previous one.

5. Use lists effectively

If you want to present multiple choices to the user, consider using a list. When using a list, pay attention to the overall length. Limit items of the list to five. If the list is too long, the user may forget the first item when the user finishes listening to the last item. Also, items in the list should be nouns so that users can easily call them.

A bad example

Alexa: 'I can estimate the insurance fee for your next trip. If a public alert has appeared in your destination, I can tell you. And if you want, I will show you the best places for vacations. What would like to do?'

In this bad example, Alexa explains the function of the skill with a very long phrase. It is hard for users to know how to answer.

A good example

Alexa: 'I can provide the following information: insurance fee estimate, foreign country alerts, and popular vacation spots. What would like to do?'

In this example, the functions of the same skill are expressed as a list. Using noun phrases in list items makes it easier for users to understand how to respond.

6. Handle various problems

If the user deviates from the route expected by the skill, let's help get the conversation back on track with a natural and gentle phrase.

If Alexa was unable to interpret the user's utterance, tell them honestly and ask them to say it again.

  • GOOD: ‘I couldn’t understand you.’

  • BAD: ‘I couldn’t hear that.’

On the other hand, if the user requests an unsupported function, please give a concrete error message.

A bad example

Alexa: 'OK, your insurance fee would be $30.'

User: 'I'll take it.'

Alexa: 'Sorry, I can't hear that.'

User: 'I'LL TAKE IT, ALEXA!'

The user is trying to buy the insurance that Alexa provided. But this skill does not support the function to purchase insurance yet.

If Alexa responds that she could not hear it, and cause them to speak in a louder voice.

A good example

Alexa: 'OK, your insurance fee would be $30.'

User: 'I'll take it.'

Alexa: 'Sorry, I can't sell this product right now. Please visit the ABC-Travel website.'

User: 'OK.'

In this good example, Alexa specifically tells the user about the unsupported function.

7. Provide contextual help

Let's provide contextual help. Not all help phrases may be appropriate in all situations. Think about what messages effectively support the user in all routes and all timings.

Avoid providing excessive extra information to users in help. Presenting unnecessary information at that time may cause additional problems for users.

A bad example

This is one of Alexa's help phrases.

Alexa: 'Would you like an estimate for another trip?'

User: 'Help.'

Alexa: 'I can estimate insurance fees based on the information you provide me.'

User: '??' -> time out

After providing an insurance fee estimate to the user, Alexa asked whether the user wants to continue estimating. When the user said 'help', Alexa only provided a general hint. It is just a description of the skill, which will not be able to help the user in this situation.

A good example

Alexa: 'Would you like an estimate for another trip?'

User: 'Help.'

Alexa: 'I can estimate insurance fees based on another destination or time period. Would you like a new estimate?'

User: 'I'm OK, thanks.'

In this conversation, Alexa provided the user with details that much the situation. As we can see, by preparing detailed help phrases according to the situation, we can create more user-friendly skills.

Recap

I first talked about what VUI design is. VUI design is the process of creating a dialogue between the user and Alexa, while making it faster, easier and more fun, and able to handle errors.

Next, I introduced how to design VUI for Alexa. There were five steps in that process.

Then, we looked through seven VUI design tips.

Reference

The following reference materials are available online. Please look at them if you want to learn more.

Remark

As Alexa skill is also one of a system, we are able to apply our knowledge and techniques which we obtained from another system development. However, we will probably learn by trial and error about designing of Voice User Interfaces unless we had already experienced the relevant area. I hope you will use these tips for developing Alexa skills.

See you,

Yuki Torigata @torazuka


  1. As another session on the same day 'Getting started with Amazon Alexa' provided much information about this, I skipped details of this topic.