Categories
alexa

The Perils of Alexa certification

I’ve got my first skill up on Amazon’s Alexa service. It’s called Travel Stories and it’s not very exciting. To be honest, it could have done with more work, but there’s an offer of free Alexa  Echo Dots in return for submitting a new skill. So this post will be about two things: my experience with the certification process; and Amazon’s desperate drive to get developers producing new skills.

One of the biggest boasts about Alexa compared with other voice interfaces is that it has many more skills available – 15,000 compared with hundreds on Apple. The problem is that a lot of these skills are not very good. Amazon has templates for basic voice apps with the idea that developers will customise these. It means there are a lot of skills which recite pieces of trivia. Travel Stories is pretty much one of these in its first incarnation.

In an BBC new article, Amazon’s race to make Alexa smarter, Leo Kelion pointed out that “For all the promise of compelling new ways to control home appliances or on-demand news updates from major media brands, there seem to be a mountain of apps dedicated to delivering “fun facts”, gags, wacky noises and a vast range of ambient sounds.

When Amazon’s representative was challenged on this, he said “I guess I would not agree with the thesis that some of the skills are not sticky – many of them are… You never know which one of those Cat Facts is going to turn into the next big thing. There are many examples of that out there.

The lack of an example here, the absence of a well-known case where a cat skill became famous, underlines the issue. There are a lot of not-very-interesting skills for Alexa.

Alongside prizes, Amazon’s push for new skills also includes huge discounts on the AWS services needed. Amazon have also began directly paying the makers of successful skills. There’s a lot of encouragement to work with the platform.

On a more positive note, Amazon have a fast efficient certification service that aims to get skills moved through as quickly as possible. I originally submitted my new skill at the start of July. At the Amazon Workshop we’d been warned that 70% of skills failed because example phrases do not occur in the sample utterances. According to Amazon I failed for two reasons

* “The example phrases that you choose to present to users in the companion app must be selected from your sample utterances. These sample utterances should not include the wake word or any relevant launch phrasing.
* “If the session closes after launching the skill, a core functionality must be completed without prompting users to speak.

The first of these was clumsiness on my part – somehow I fell for a trap I’d be warned about. I’m still not sure why my sample utterances failed and the ones Amazon suggested worked. I’m sure this check could be automated in some way before certification. The response was short and to the point so that a few minutes work, editing the code and my sample utterances, was all I needed to get through the second time.

You can now add my skill to your Alexa device, but I’m not sure why you would. Certification is more about quantity and quality. But I do have interesting plans for my skill, and I’ll be working on them soon.

Categories
alexa

A Day at an Alexa Workshop

Back at the start of July, I attended an Alexa workshop in Brighton. I planned to blog about it soon after, but client work took over my life a little, sending me to LA and Dublin in the process. The session was a good one and I wanted to note some of the interesting things I learned for future reference. It happened in the ballroom of a seafront hotel (note chandeliers, and those dangling lengths of cloth that probably have an actual name) and was run by David Low, an Alexa evangelist from Amazon, who’d previously worked on the Skyscanner Alexa skill.

I’m cautious about in-person training sessions, in case the pace is slower than a day’s reading and tinkering at home – particularly when some of the Amazon tutorials are so good. Here, I learned a lot of interesting information, of which this is some of the highlights:

  • The example skills we looked at were written in Javascript, rather than Java, as I’m used to. It’s notable how much more concise the Javascript code was.
  • Previously, I knew little of the Echo device’s technical details and it was interesting to find out about how Alexa works. Apparently the seven directional microphones are set up so one listens to the speech, and the other six record the background noise and filter it out. This explains how good Alexa is at hearing me over music.
  • Amazon have given a lot of thought to how they would like people to think about Alexa. A session like this is good for communicating this vision. For example, “Alexa lives inside the Echo” or “Not building apps, building conversations”.
  • Apparently about 70% of skills fail certification because their example phrases do not occur in the sample utterances. This rule seems self-evident, but somehow people miss it. Even knowing this, my first attempt at creating a skill failed the certification stage. I’ll have more to say about that soon.
  • This is possibly a spoiler for the workshop, but we were told that Alexa had received 250,000 marriage proposals. It’s interesting to see that number, and that Amazon feel this is a positive thing. I feel uncomfortable about the gendering of Voice UIs and chatbots, something I want to spend more time thinking about
  • Some services have 270,000 sample utterances, which is far larger than I had expected.
  • I sometimes feel lost in the Alexa skill store, so it was good to have a discussion of some successful use cases:
    • There is some interesting research around the benefits of Alexa for older people.
    • Alexa is working to help guests in hotels.
    • It was also interesting to hear how the Skyscanner app worked, allowing the user to pose questions such as “Where can I go for £100 this weekend?”
  • One of the interesting things about devices like Alexa is that functionality can be unlocked over time. Apparently there are several colours available in Alexa’s glowing ring, which may be used for different things. There are also possibilities such as additional voices. Push notifications/events are a more complicated issue, with Amazon trying to work through the privacy and interruption issues.
Categories
alexa

Alexa, Please

I sometimes feel uncomfortable giving orders to Alexa. I know that she is a series of scripts, and have a good idea of the technology involved, but I still dislike barking demands at her.

I’ve read a couple of articles about parents who were concerned about their children’s interactions with Alexa. In a post entitled, Amazon Echo Is Magical. It’s Also Turning My Kid Into an Asshole, Hunter Walk suggested Alexa needed a mode that required please and thank you, to help children learn manners. These words currently have no effect on how Alexa works, and are filtered out before a request is sent to a skill.

In a piece on “bot-mania”, Dan Grover looked at the recent excitement over bots, placing it into a historical context. It’s a fascinating piece, talking in detail about how freetext chat may not be the best option for most requirements. Once particular passage jumped out at me:

This notion of a bot handling [tasks like ordering pizza] is a curious kind of skeumorphism. In the same way that a contact book app… may have presented contacts as little cards with drop shadows and ring holes… conversational UI, too, has applied an analog metaphor to a digital task and brought along details that, in this form, no longer serve any purpose. Things like the small pleasantries in the above exchange like “please” and “thank you”, to asking for various pizza-related choices sequentially and separately (rather than all at once). These vestiges of human conversation no longer provide utility (if anything, they impede the task).

A skeumorph, as defined in wikipedia, is “a derivative object that retains ornamental design cues (attributes) from structures that are inherent to the original“. As an example, it gives the swiping gesture for turning pages on tablets, or the shutter sound on digital cameras. However, these skeumorphs sometimes have their own uses, for example the shutter sound notifies people that a photograph has been taken.

In one discussion of the please/thank-you issue (Parents are worried the Amazon Echo is conditioning their kids to be rude) an investment firm founder called Manu Kumar explained why he felt it important to be nice to devices. “One of my metrics for determining how nice someone is is by watching how they interact with a waiter. In a similar way, even if the AI or tech doesn’t care about it, other people around us are going to experience how we interact with it.

For a while I thought it would be good if Amazon gave discounts to people who are well-mannered to Alexa. Then it occurred to me that, despite the rigorous codes about thank-you in English society, this is not universal. If you look at basic phrases translated into Hindi (ie omniglot), the word for thank you is given as dhanyavād, but this misses a subtlety. Deepak Singh wrote an article in the Atlantic, ‘I’ve Never Thanked My Parents for Anything’, where he talked about the status of thank you in Hindi.

In India, people—especially when they are your elders, relatives, or close friends—tend to feel that by thanking them, you’re violating your intimacy with them and creating formality and distance that shouldn’t exist. They may think that you’re closing off the possibility of relying on each other in the future. Saying dhanyavaad to strangers helps initiate a cycle of exchange and familiarity. But with family and friends, dhanyavaad can instead chill relations because you are already intimate and in a cycle of exchange.

All of this discussion may seem obscure, but there is an interesting issue around the way we respond to devices. Alexa behaves with a personality and explicitly presents herself as female. Even if she is a batch of scripts, we are supposed to respond to her as an entity. There is a question of how we learn to behave with such creatures, and how we factor this into thinking about designing skills – where the skill is accessed via Alexa, at a strange remove – all requests involving Alexa being asked to pass the question on to the skill.

I still think it is important to be polite to Alexa. But I’m prepared to accept that it is irrelevant to her.

Categories
alexa

Placedreamer – a more interesting application

The problem with most Alexa apps is that they’re simple text bots with voice UIs.

Obviously, Alexa’s clever hacks make her skills a little more interesting than the same thing on the command line; such as the system used to match thousands of different phrases to a user’s intention. But a lot of the skills available are boring – particularly the ‘facts’ type of skill, where Alexa recites a random piece of information. What would be interesting would be an application that would not work outside of an Alexa device.

A more interesting problem

Take as an example the tarot app I built in my recent tutorial. It doesn’t do anything particularly novel – we could do the same with a twitter bot or a Bash script. As well as speech, Alexa provides the ability to play sounds, as well as some clever ways of handling streaming (something Tim O’Reilly praised in his celebration of Alexa). An interesting skill would make use of such things.

Rather than take the tarot skill further, I wanted to work on something more interesting. I asked friends on Facebook what they thought I should do. Tom suggested “Pipe in birdsong from the last distant place you travelled to. City sounds from another timezone.

This sounded like a great idea. I like the idea of Alexa as a device that can occupy a strange, eidetic space – something to talk to when you can’t sleep. There are online field recordings available, and I’ve got photos which can be added to the response cards. This is a somewhat whimsical application, but that’s what attracted me to Alexa in the first place – a device that is placed in intimate, home spaces, and is always listening into conversations. (Although this is only for her name/wake-word, it can still prove disconcerting)

A problem with invocation names

My previous skill had the invocation name ‘tarot’. Which was OK for testing, but won’t pass Amazon’s requirements for invocation names, which states that “One-word invocation names are not allowed”.

Following this rule, I first set up the invocation name for this skill to be ‘Place dreamer’. Actually summoning this skill proved difficult. Place was too easily confused with Play, which Alexa saw as a more likely word, and would hear “Place Dreamer” as “Play Streamer”.

Don’t Believe the Hype

VUIs have a huge potential for providing certain types of information. I like asking Alexa if it’s going to rain – it saves me having to grab my phone to look at the weather forecast while trying to leave the house. I can also see how great a VUI will be when I’m driving – I hate setting off in my car and realising I’ve misconfigured my satnav. But one of the big problems I’m having with VUIs is being told how this is the next big thing.

I’m personally not interested in bots for most applications. I find it hard to trust that constrained conversational pathways will be better than tools like google. One book I’m reading about bots sounds the same as late-nineties books on the topic. Just because technologies have improved does not by itself mean this is definitely the era of VUIs and chatbots. Either the application needs to be appropriate or the interface very well crafted.

Categories
alexa

First Steps with Alexa

Amazon’s Alexa is a virtual assistant available on the Amazon Echo, as well as (with a little struggle) on the Raspberry Pi. I find voice interfaces fascinating – they’re a staple of sci-fi, and the natural way people communicate with one another. Email and IM are great, but look at how often people switch from these to picking up the phone for a discussion. Voice works.

One of the great things about Alexa is how easy it is to build a skill (as new Alexa abilities are called). There’s an Amazon tutorial that explains how to get a skill up and running in five minutes.  There are also lots of useful blog posts on the subject, of which this is another. It will be the first in a series of posts. This initial version simply returns a randomised String to demonstrate how easy it is to get Alexa to work.

All the code here is in a git repository. I’m going to draw attention to specific parts of the code, but not repeat the entire contents of the repo.

Pre-requisites

Working through this tutorial requires a basic Java development environment (with gradle and git) and an Amazon account with AWS and Amazon Developer access.

The Basic Application

At their simplest level, Alexa skills receive a String and return a String. Amazon handles the parsing of the voice, and even does some neat things to contextualise what it has heard – for example, adding metadata about dates and times in the request. The skill simply needs to use this input to generate a response (sometimes marked up with Speech Synthesis Markup Language metadata).

The application I’m working on is a simple Tarot card reader. I became interested in tarot through the Hexen deck, which I saw at the V&A’s recent You Say You Want a Revolution exhibition. The cards are based on the history of the Internet and counter-culture:

I don’t believe in the tarot as a fortune-telling method, but I’m interested in it as, in Italo Calvino’s phrase, “a machine for telling stories”. And I like the idea of having a piece of technology like Alexa reading and interpreting the cards. I mean, that’s science fiction, right?

Building the Basic Code

The first tag in the repository is for a simple Java class returning a string. There is no Alexa code at this point – when this is added in the next step, it is simply as an interface for this simple code. We could as easily hook up the basic class to a twitter bot, or a web-server, or even an email auto-responder. The core code does not depend upon Alexa. To have a look at the basic code, clone the repository and then checkout the initial commit:

git clone git@github.com:orbific/alexa-tarot-skill.git
cd alexa-tarot-skill
git checkout basic-tarot-class

The first version of the TarotCardPicker creates a deck of 78 cards, then selects a random one. At this points we’re ignoring things like reversed cards and interpretations in favour of simplicity. We’re also ignoring lots of things like internationalisation, unit tests and javadoc which would be expected if this were to be used as production code.

It’s possible to compile and run this piece of code.

gradle build
java -jar build/libs/alexa-tarot-skill-1.0.jar

Running the two commands above from the project’s root directory results in a line of text stating the name of the card that has been picked. The next stage is to make this code available via Alexa.

Adding the Alexa/Lambda files

The easiest way to get going with a skill is to use Amazon Web Services (AWS) to host the code. The specific part of AWS that is used for Alexa is AWS Lambda. This is a serverless code environment and basically means a piece of code can run without having to worry about infrastructure. You don’t need to use Lambda, but it makes life a lot easier – running Alexa skills from web services means having to deal with certificates between Alexa and another environment.

To see the basic framework code, use the appropriate checkout:

git checkout basic-alexa-framework

There are five files needed to wrap Alexa around the simple Tarot code:

  1. SampleUtterances.txt – the next file lists all the different ways a user might contact the service. These are grouped into ‘intents’. If you wanted to ask for a coffee you might say “Please give me a coffee”, “I’d like a coffee” or “I demand coffee”, all of which have the same Intent. This file maps statements to intents.
  2. IntentSchema.json this describes the intents the skill expects to receive. We can also use in-built intents, which allow different apps to have similar functionality. For example, I might ask Alexa to tell me how the Tarot card functionality behaves, which would trigger a HELP intent.
  3. TarotSpeechletRequestStreamHandler.java this class is a subclass of SpeechletRequestStreamHandler. The documentation explains “This class provides the handler required when hosting the service as an AWS Lambda function” – it’s a link between AWS lambda and Alexa. It’s pretty much a piece of boilerplate code.
  4. TarotSpeechlet.java the Speechlet is a simple interface that defines the Alexa behaviour: this implements the SpeechletV2 interface which is the main provider of the Alexa behaviours. In this case, it receives an intent and provides a response.
  5. log.properties this file defines the logging for the lambda function.

Code structure

As stated above, there are two Java classes. As the link between Alexa and the application code,  TarotSpeechletRequestStreamHandler is fairly straightforward. It contains a static initializer which copies the APP_ID environment variable to a supportedApplicationIds variable, which can then be passed through the constructor to the superclass.

public class TarotSpeechletRequestStreamHandler extends SpeechletRequestStreamHandler {

private static final Set<String> supportedApplicationIds = new HashSet<String>();

static {
String appId = System.getenv("APP_ID");
supportedApplicationIds.add(appId);
}

public TarotSpeechletRequestStreamHandler() {
super(new TarotSpeechlet(), supportedApplicationIds);
}
}

Obviously, it would be possible to have the APP_ID hardcoded here, but this saves a little messing around. The reason for checking the application ID is to ensure that the request has come from our Alexa skill. It would be possible for someone who knew our endpoints to try sending requests, using our code to do the hard work for them (and be billed for it). Every request for Alexa includes an application ID, that can then be checked.

The Alexa code, the interesting bit, is contained within the TarotSpeechlet. This particular class is extremely simple, but it demonstrates the basic behaviour of Alexa. There are four overridden methods from the interface which are described in the javadoc:

  • onLaunch is “Entry point for Speechlets for handling a speech initiated request to start the skill without providing an Intent”. We’re ignoring this functionality.
  • onSessionStarted, onSessionEnded are two callbacks for when a session begins or ends. Sessions are outside the scope of this particular post, but I’ll write about them later. This allows the creation of skills that include a series of interactions.
  • onIntent is called when a speech request linked to an intent is sent. Here we check for a single intent and if that is not found then an appropriate response is made.

The onIntent method is relatively simple:

@Override
public SpeechletResponse onIntent(SpeechletRequestEnvelope<IntentRequest> requestEnvelope) {
  Intent intent = requestEnvelope.getRequest().getIntent();
  String intentName = (intent != null) ? intent.getName() : null;
  if ("SingleCardIntent".equals(intentName)) {
    return getSingleCardResponse();
  } else {
    return getUnknownCommandResponse();
  }
}

Note that the response is created as a SimpleCard – this is used in the Alexa app to add contextual information to the speech. I’ve also stripped all of the logging to make it simpler.

Putting the application live

At this point, we have all the code that we need to put this application live. There is still a little bit of work required to get the new skill connected to Alexa. I’m not going to outline all of the steps in detail, but will list the basic steps as at the time of writing (late June 2017). I’m also going to assume that an AWS account is available. AWS Lambda is available on the free tier.

The first step is to produce a ‘fat jar’ locally. This is jar file that contains all of the projects dependencies. The build.gradle file contains a jar target that builds this far jar.

gradle build

The resulting file, ./build/libs/alexa-tarot-skill-fat-1.0.jar, is 9.4MB.

The basic steps for the upload are described on Amazon’s page, Deploying a Sample Custom Skill to AWS Lambda. There are three stages to this process:

Creating the Lambda function

Log into AWS lambda, and create a new lambda function. Currently, this needs to be in one of two specific regions, US East or EU. The first option is to select a blueprint. Use a blank function.

The second stage is to create the appropriate trigger. From the list available, select “Alexa Skills Kit”. To generate the list, click on the dotted-rectangle to the left of the Lambda logo.

The bulk of the configuration is contained in the next screen, Configure function.

The required options are as follows:

  • Name and description are set as needed
  • The runtime drop down is changed to ‘Java 8’, which updates the options below in the form.
  • The Upload button is used to upload the far jar from earlier to AWS
  • No environment variables are set at this point. We do not have the APP_ID that will be provided by Alexa, so the skill will not work properly yet.
  • The handler is the packaged classname for the SpeechletRequestStreamHandler. In this case, it will be com.riddlefox.tarot.TarotSpeechletRequestStreamHandler
  • The role needs to be set. If there is no existing role for the account, the drop-down opens a new page in the browser to set up a new role.
  • The Tags and Advanced Settings can be ignored.

Pressing next results in a summary page, and the Create Function button can be pressed to complete the work. This takes a little time to work. Once it is complete, look at the settings for the function to find the ARN. This will be needed in the next stage.

Linking Alexa to the Lambda function

The next task is to create an Alexa skill that connects to the newly-created Lambda function. This requires an account on the Alexa developer portal.  Log in to this and select the Alexa Skills Kit option. This has a button, Add a new skill. Press this to begin the process. The resulting form looks quite complicated, but not all of the tabs need to be filled in.

Skill Information

  • If necessary, change the language from English (US) to English (UK) as this can improve the information received from Alexa.
  • The name should be filled in as needed
  • Invocation name is the name that needs to be spoken to contact the skill.
  • Press Save

Interaction Model

This section defines the way in which Alexa interacts with the Lambda skill. It requires us to copy the information from the resource files. The intent schema requires the content of the IntentSchema.json file:

Further down the page we need to provide our sample utterances. These are in the SampleUtterances.txt file:

Press the save button to make sure everything is as expected, then press next.

Configuration

This tab defines the link to AWS Lambda.  Set the service endpoint type to be AWS, pick the region and add the ARN in the (untitled) text box:

Press save and next.

At this point, the application ID will be available. This is a long string that begins “amzn1.ask.skill”. This now needs to be copied across to the environment variables for the lambda function.

Adding the app ID to the lambda function

Returning to AWS lambda, open the configuration for the function that has been created. Add a new environment variable, called APP_ID, which contains the application ID from earlier. Save this, and the Alexa skill has been set up.

Demonstrating the Skill

Having set up the skill, how do we confirm that it works? Go back to the Alexa developer page and open the Skill in question. On the Test tab, there is a ‘Service Simulator’ section. Entering an utterance here and pressing the Ask Tarot Demo button lets us see the request and response:

The final stage is to test it on the Alexa device. Because this skill is still at the developer stage, it can only be tested on Alexa devices connected to the developer account. Saying “Alexa, ask Tarot,  give me a card” will result in Alexa saying the name of a card.

 If you do not have a physical Alexa-enabled device, it is also possible to test Alexa through a web browser at echosim.io. Note that this still requires a valid Amazon account.

Summary

This post has shown how to set up a very simple application in Alexa. There’s not much more to it, other than demonstrating the set-up, but future posts will focus on how to get Alexa to do more interesting things.

If you have any questions, or anything is not clear, then please leave a comment and I will update the text.

Categories
alexa

Getting Alexa working on the Raspberry Pi

640

I wasn’t that interested in the Amazon Alexa until I saw its product-placement appearance in Mr Robot. Grace Gummer’s character, Dominique DiPierro. lies in bed, unable to sleep, talking to the device.

It’s an odd scene, and the character’s isolation is probably not the best advert for a new piece of technology. The thing that fascinated me was the potential for empathy in the voice interface. DiPerro took some comfort from the machine. This might be a kinder interface than the command line (which a voice interface is, ultimately). This is possibly a strange place to start looking at a new piece of technology, but there you go.

The main reason I’d not looked at Alexa sooner was a disappointment with the command-line bots I’ve seen over the years. Most of them seemed stilted and artificial. Rather than being a natural interaction, I’d be fighting a parser, like I was playing an 80’s era text adventure. I assumed Alexa would be more of the same. Mr Robot suggested other possibilities. Voice is a natural interface – it’s one we use in daily life. We speak to people when we’re with them rather than turning to writing notes as being efficient.

The echo Dot is about £44.99, but I had a Raspberry Pi at home to experiment with. This also offers the possibility of changing the wake word, rather than choosing between ‘Alexa’, ‘Echo’, ‘Amazon’ or ‘Computer’. The Alexa on raspberry pi tutorial contains most of the information needed. I’m not going to write a tutorial for a tutorial, but I wanted to note a few things:

  1. The tutorial lists monitor, mouse and keyboard as pre-requisites, but these aren’t needed if you SSH into the Pi. This worked perfectly as long as I remembered to plug the Pi into the router before turning it on. It even set itself up on the network with a namer of raspberrypi.lan – which was very helpful, as I’m not that great at configuring networks.
  2. As Jez Nicholson pointed out to me, this is a “slightly hostile ‘frontier land’ coding environment”. The java client spits out stack traces to the command line in a way that’s a little surprising from a major company. And my first attempts to set up the Pi were frustrated by an unhelpful error when authenticating the device on the Amazon site – no indication what it was, just the assurance that “we’re already working on the problem”.
  3. I eventually found out what the issue was after googling the issue. It turned out I’d not copied my DeviceID exactly. If you’re writing an API, you have to give as much feedback as possible to the user. APIs really should return detailed errors or trigger an automated ticket creation (rather than claim you’re working on the problem). You could even email me when it is fixed.
  4. The default setup for kitt-ai meant that the wake word detection was far less sensitive than Alexa herself. It was a little difficult to get the device to acknowledge my repeated calls of its name, like I had a poorly-trained dog in the flat. I’m not sure what the neghbours make of my increasingly loud calls of “Alexa”.
  5. Alexa was happy to give me the weather – in Washington State. When asked “where am I”, it insisted I was in Seattle. I guess there are some settings need altering somewhere.
  6. One thing that feels like its missing on voice interfaces (or, at least, this implementation) is a return key. There was a short pause after I finished speaking. When speaking to humans we can usually figure out when one of us is done talking. Also, the need to get information from servers prevents a flowing interaction between us.

Now I’ve got this working, which means I too can ask Alexa when the world will end. I now need to set up a simple Amazon skill. Which looks like it could be as much fun as setting up Alexa was in the first place.