All posts by admin

A Day at an Alexa Workshop

Back at the start of July, I attended an Alexa workshop in Brighton. I planned to blog about it soon after, but client work took over my life a little, sending me to LA and Dublin in the process. The session was a good one and I wanted to note some of the interesting things I learned for future reference. It happened in the ballroom of a seafront hotel (note chandeliers, and those dangling lengths of cloth that probably have an actual name) and was run by David Low, an Alexa evangelist from Amazon, who’d previously worked on the Skyscanner Alexa skill.

I’m cautious about in-person training sessions, in case the pace is slower than a day’s reading and tinkering at home – particularly when some of the Amazon tutorials are so good. Here, I learned a lot of interesting information, of which this is some of the highlights:

  • The example skills we looked at were written in Javascript, rather than Java, as I’m used to. It’s notable how much more concise the Javascript code was.
  • Previously, I knew little of the Echo device’s technical details and it was interesting to find out about how Alexa works. Apparently the seven directional microphones are set up so one listens to the speech, and the other six record the background noise and filter it out. This explains how good Alexa is at hearing me over music.
  • Amazon have given a lot of thought to how they would like people to think about Alexa. A session like this is good for communicating this vision. For example, “Alexa lives inside the Echo” or “Not building apps, building conversations”.
  • Apparently about 70% of skills fail certification because their example phrases do not occur in the sample utterances. This rule seems self-evident, but somehow people miss it. Even knowing this, my first attempt at creating a skill failed the certification stage. I’ll have more to say about that soon.
  • This is possibly a spoiler for the workshop, but we were told that Alexa had received 250,000 marriage proposals. It’s interesting to see that number, and that Amazon feel this is a positive thing. I feel uncomfortable about the gendering of Voice UIs and chatbots, something I want to spend more time thinking about
  • Some services have 270,000 sample utterances, which is far larger than I had expected.
  • I sometimes feel lost in the Alexa skill store, so it was good to have a discussion of some successful use cases:
    • There is some interesting research around the benefits of Alexa for older people.
    • Alexa is working to help guests in hotels.
    • It was also interesting to hear how the Skyscanner app worked, allowing the user to pose questions such as “Where can I go for £100 this weekend?”
  • One of the interesting things about devices like Alexa is that functionality can be unlocked over time. Apparently there are several colours available in Alexa’s glowing ring, which may be used for different things. There are also possibilities such as additional voices. Push notifications/events are a more complicated issue, with Amazon trying to work through the privacy and interruption issues.

Microservices for the monolith

I found this post in my drafts folder. It’s about three years old, and I was going to delete it; but I had a quick read and there were a few things that are still worth saying.

It dates back to when there was a lot of excitement about microservices and I was researching them for an employer. Ultimately, the technical demands of maintaining microservices were too much for a company that had bigger problems – but the principle involved (microservices as service-oriented architecture done right) were incredibly useful.

Foremost among these useful principles was the idea of treating all interactions as inherently asynchronous and unreliable. The principles in the Fallacies of Distributed Systems are a useful caution for almost all projects. Although I’m not convinced Conway’s Law is either valid or useful.

Now that we’ve got big data out of the way, it looks as if the next big hype will be microservices. The term has been around for a while, most notably in a January 2013 talk by James Lewis of Thoughtworks. The idea started to become popular around February 2014, with Martin Fowler publishing a post on the subject in March. Discussion is gathering, with a conference, Mu-Con, planned for the end of November and an O’Reilly book on the topic due for publication in March 2015.

As with Big Data, there is a similar lack of clarity about the definition. How tiny is micro? Definitions have ranged to a couple of screens of code, to a small amount of code that a single developer can understand in a few days. James Lewis suggested about 1000 lines long.

Regardless of the problematic definition, obvious characteristics of such services are emerging, with JSON and REST are forerunners for the protocols. Netflix have been open about their successes with this architecture, releasing some amazing tools and documentation.

A purist microservice architecture is not going to be helpful for most companies. In his early presentation, James Lewis mentioned the importance of Conway’s law. Microservices require highly-skilled developers, clearly defined structure and rigourous processes. Most companies are unlikely to have the structure, processes and calibre of developer to make full use of these architectures, let alone convert their existing architecture to fit in with this new way of working.

Despite this, microservices are still relevant to every Java developer, even if you’re working with a monolithic ball-of-mud architecture. It’s a rare system that doesn’t need to integrate with some external service. Microservices teach us to treat any external system as potentially unreliable and incompetent. If the service takes too long to reply, how does that effect our SLAs?

Netflix are so confident of the resilience of their server eco-system that they have introduced the chaos monkey. This turns off services randomly during business hours, to make sure that service continues uninterrupted. Because failures are going to happen so you should treat that as a fact of life. You system might be up 99.9% of the time, but it’s the 0.01% that gets remembered in your annual review.

The tools Netflix have produced to deal with these issues are designed to work with thousands of small services, but something like Hystrix is usable for a single integration. You need to be asking; how might this fail? When should I time out? And you can get this power with just a few Spring annotations.

These tools and architectures will be essential for every Java developer. When I first started writing software, everyone knew how important automated tests were, but test harnesses were difficult to produce. The creation of Junit made tests simple to write and this has altered the way Java software is developed, enabling refactoring, continuous delivery, and TDD.

Testing was important before Junit, and there are developers who still don’t use automated tests, but but Junit has revolutionised Java development. The software I am writing now is considerably more sophisticated that what I was writing 14 years ago. In part that is down to new tools like unit testing and Spring.

The tools designed for microservices are simple enough that they should be understood by every developer. The questions faced by Netflix should be considered for any integration: how do I handle failure, what do I do if this is not available?

The scale and size of the systems being build nowadays are incredible, and this has been enabled by the range and skill of the tools that have been open-sourced. Every developer needs to understand the tools created for microservices. Simple architectures should be as reliable as complicated ones.

The Complexity of Java

Java has become much more complicated over the years. I started working with the language back in 2000. I’d been a database developer for a couple of years, working with Oracle, but wanted to create more general applications. I learned enough to pass my first Java job interview with two books: Laura Lemay’s Teach Yourself Java in 21 days (which had been recently revised for Java 2!) and the first half of Wrox’s Professional Java Server Programming.

Together, these two books contained most of the Java knowledge I needed to do my job. I quickly picked up a lot of other things like CVS and Unix, but Java was definitely a lot simpler back then. I would say that the Java I needed as a professional developer back then included :

  • Core Java
  • JDBC
  • MySQL
  • HTML
  • CVS

The builds were done through  makefiles if they were scripted at all. It was fairly easy for a new developer to get working professionally. I mean, two books contained most of the information you needed – along information on how to do graphics, animation and applets. And the Wrox book also found time to cram in chapters on esoterica like Jini and Javaspaces. You could learn a lot of Java in 21 days.

(I wish I still had my old copies of these books. Living in Brighton involved moving frequently between small rooms and a lot of books had to be abandoned).

Over the last 15 years Core Java has become more complicated. The addition of things like generics and lambdas were much needed but make the language much more complicated. And the basic skills a developer needs where I’ve worked recently are much more complicated:

  • Core Java
  • Eclipse or equivalent IDE
  • Hibernate
  • MySQL
  • HTML
  • XML/JSON
  • REST
  • Junit and a mocking framework
  • git
  • Maven
  • Spring

The applications that can be built with modern Java are impressive and far beyond the scale of what would have been possible in 2000. I think it would be impossible now to write any large scale Java application without a decent IDE. And Java is much more complicated than before.

A lot has improved too, and it’s great to escape the horrors of classpath config, which has disappeared in place of easier options. But the point remains: I know a fair few people who learned to code under their own steam and ended up with successful careers. I imagine that is much more difficult nowadays. Back in 2000, applets were an easy way to learn to code and you could get going with notepad and a compiler. Modern Java is probably not a good beginner’s language.

Discipline vs. scheduling

One of the best statements I’ve read on developer discipline (which for me includes testing, documentation etc) came from Robert C. Martin:

“You know what you believe by observing yourself in a crisis. If in a crisis you follow your disciplines, then you truly believe in those disciplines. On the other hand, if you change your behaviour in a crisis, then you don’t truly believe in your normal behaviour” (from Chapter 11 in the Clean Coder)

There’s a slight subtlety here, in that you can sometimes gain time by dropping process, but this gain quickly evaporates as technical debt builds up. And, if you believe in these disciplines, you will schedule time to make up for this. As Martin explains:

“If you follow the discipline of Test Driven Development in non-crisis times but abandon it during a crisis, then you don’t really trust that TDD is helpful. If you keep your code clean during normal times but make messes in a crisis, then you don’t really believe that messes slow you down. If you pair in a crisis but don’t normally pair, then you believe pairing is more efficient than non-pairing. Choose disciplines that you feel comfortable following in a crisis. Then follow them all the time. Following these disciplines is the best way to avoid getting into a crisis.”

Discipline is often abandoned due to scheduling pressure. Martin discusses how to respond to such pressure in the second chapter of the Clean Coder. Developers often give an estimate only to be pressured to produce the output more quickly. The temptation is to give in to this pressure and promise to “try”. This is dangerous:

“If you are not holding back some energy in reserve, if you don’t have a new plan, if you aren’t going to change your behavior, and if you are reasonably confident in your original estimate, then promising to try is fundamentally dishonest. You are lying. And you are probably doing it to save face and to avoid a confrontation.”

Overtime is one response to this, but it can easily spiral out of control, since this is one of the few places where leverage can be applied. It also suffers from diminishing returns. There is good evidence that working more than 40 hours a week over a long time is harmful to projects.

So how can projects speed up? By the time an unrealistic deadline has become solid, it’s usually too late. Deadlines are often particularly problematic in scrum. It takes time for a scrum team to settled into a cadence where their work-rate becomes predictable. If the scope and team size have been fixed, then there is no way to hit the deadline without distorting the process.

What should you do when you have an urgent deadline that looks unachievable? In such a situation, failure has happened before development begins. Most times, this is not recognised until the developers are at work – resulting in crunch time, and estimates made to fit the deadline. In such situations, the best thing to do is to deal with the current project as best you can; and to look at future projects, making sure that these are not being scheduled without a good idea of the development needed.

The best answer to the factorial code interview question

Whenever I interview a developer, I always ask them to write code on a whiteboard. Nothing too complicated – I expect everyone uses an IDE these days and the candidate is also probably feeling a little nervous.

The big interview cliche is asking for a method producing a Fibonacci sequence. Even with telephone screening, this still eliminates more candidates than it should. A well-prepared interview candidate should have practised that one already, which means it’s still a useful test. Whether or not someone can do it, there are lots of interesting follow-up questions.

Some candidates get flustered trying to understand the Fibonacci sequence. I’d expect most people to know this already, but I wouldn’t want to reject a potentially-excellent candidate for lacking a bit of maths knowledge. So sometimes I would simply ask for a function to product factorials.

There are several different ways to do this, with the two main options being whether the solution is recursive or not. A simple solution would look something like this:

public int fact(int n)
{
    return n == 1 ? 1 : n * fact(n - 1);
}

One obvious follow-up question to this is what the limits of the function are – using an int means you’ll have overflow problems quite quickly. Does the developer know the class to use to avoid this?

The question simply asks for a piece of code. I would have given bonus points for any developer who mentioned testing before writing. But the perfect response would have been a candidate saying that the code would be different depending on its intended use.

Someone on github has produced an enterprise version of another interview classic, with Enterprise Fizzbuzz. Obviously that is going too far. But there are considerations for even the simplest piece of professional code:

  • Who else needs to work on this? What documentation/commenting is required?
  • How and where will it be deployed?
  • Is any error checking or exception handling required? In the example above – which I would have accepted as correct – there is no handling for the obvious overflow error.
  • Is something this simple suitable for the intended use? For example, a large number of similar requests might be better handled with some sort of question.

All of which is a complicated answer to a simple question. A dev raising these issues in an interview would still need to produce the code – but the discussion that followed would be very different.

Thomas Mann once claimed that “A writer is someone for whom writing is more difficult than it is for other people.” In the same way, the more I think about development, the harder it seems. A factorial example might seem almost insultingly simple – but it’s possible to have a very complicated conversation about it.

Alexa, Please

I sometimes feel uncomfortable giving orders to Alexa. I know that she is a series of scripts, and have a good idea of the technology involved, but I still dislike barking demands at her.

I’ve read a couple of articles about parents who were concerned about their children’s interactions with Alexa. In a post entitled, Amazon Echo Is Magical. It’s Also Turning My Kid Into an Asshole, Hunter Walk suggested Alexa needed a mode that required please and thank you, to help children learn manners. These words currently have no effect on how Alexa works, and are filtered out before a request is sent to a skill.

In a piece on “bot-mania”, Dan Grover looked at the recent excitement over bots, placing it into a historical context. It’s a fascinating piece, talking in detail about how freetext chat may not be the best option for most requirements. Once particular passage jumped out at me:

This notion of a bot handling [tasks like ordering pizza] is a curious kind of skeumorphism. In the same way that a contact book app… may have presented contacts as little cards with drop shadows and ring holes… conversational UI, too, has applied an analog metaphor to a digital task and brought along details that, in this form, no longer serve any purpose. Things like the small pleasantries in the above exchange like “please” and “thank you”, to asking for various pizza-related choices sequentially and separately (rather than all at once). These vestiges of human conversation no longer provide utility (if anything, they impede the task).

A skeumorph, as defined in wikipedia, is “a derivative object that retains ornamental design cues (attributes) from structures that are inherent to the original“. As an example, it gives the swiping gesture for turning pages on tablets, or the shutter sound on digital cameras. However, these skeumorphs sometimes have their own uses, for example the shutter sound notifies people that a photograph has been taken.

In one discussion of the please/thank-you issue (Parents are worried the Amazon Echo is conditioning their kids to be rude) an investment firm founder called Manu Kumar explained why he felt it important to be nice to devices. “One of my metrics for determining how nice someone is is by watching how they interact with a waiter. In a similar way, even if the AI or tech doesn’t care about it, other people around us are going to experience how we interact with it.

For a while I thought it would be good if Amazon gave discounts to people who are well-mannered to Alexa. Then it occurred to me that, despite the rigorous codes about thank-you in English society, this is not universal. If you look at basic phrases translated into Hindi (ie omniglot), the word for thank you is given as dhanyavād, but this misses a subtlety. Deepak Singh wrote an article in the Atlantic, ‘I’ve Never Thanked My Parents for Anything’, where he talked about the status of thank you in Hindi.

In India, people—especially when they are your elders, relatives, or close friends—tend to feel that by thanking them, you’re violating your intimacy with them and creating formality and distance that shouldn’t exist. They may think that you’re closing off the possibility of relying on each other in the future. Saying dhanyavaad to strangers helps initiate a cycle of exchange and familiarity. But with family and friends, dhanyavaad can instead chill relations because you are already intimate and in a cycle of exchange.

All of this discussion may seem obscure, but there is an interesting issue around the way we respond to devices. Alexa behaves with a personality and explicitly presents herself as female. Even if she is a batch of scripts, we are supposed to respond to her as an entity. There is a question of how we learn to behave with such creatures, and how we factor this into thinking about designing skills – where the skill is accessed via Alexa, at a strange remove – all requests involving Alexa being asked to pass the question on to the skill.

I still think it is important to be polite to Alexa. But I’m prepared to accept that it is irrelevant to her.

Placedreamer – a more interesting application

The problem with most Alexa apps is that they’re simple text bots with voice UIs.

Obviously, Alexa’s clever hacks make her skills a little more interesting than the same thing on the command line; such as the system used to match thousands of different phrases to a user’s intention. But a lot of the skills available are boring – particularly the ‘facts’ type of skill, where Alexa recites a random piece of information. What would be interesting would be an application that would not work outside of an Alexa device.

A more interesting problem

Take as an example the tarot app I built in my recent tutorial. It doesn’t do anything particularly novel – we could do the same with a twitter bot or a Bash script. As well as speech, Alexa provides the ability to play sounds, as well as some clever ways of handling streaming (something Tim O’Reilly praised in his celebration of Alexa). An interesting skill would make use of such things.

Rather than take the tarot skill further, I wanted to work on something more interesting. I asked friends on Facebook what they thought I should do. Tom suggested “Pipe in birdsong from the last distant place you travelled to. City sounds from another timezone.

This sounded like a great idea. I like the idea of Alexa as a device that can occupy a strange, eidetic space – something to talk to when you can’t sleep. There are online field recordings available, and I’ve got photos which can be added to the response cards. This is a somewhat whimsical application, but that’s what attracted me to Alexa in the first place – a device that is placed in intimate, home spaces, and is always listening into conversations. (Although this is only for her name/wake-word, it can still prove disconcerting)

A problem with invocation names

My previous skill had the invocation name ‘tarot’. Which was OK for testing, but won’t pass Amazon’s requirements for invocation names, which states that “One-word invocation names are not allowed”.

Following this rule, I first set up the invocation name for this skill to be ‘Place dreamer’. Actually summoning this skill proved difficult. Place was too easily confused with Play, which Alexa saw as a more likely word, and would hear “Place Dreamer” as “Play Streamer”.

Don’t Believe the Hype

VUIs have a huge potential for providing certain types of information. I like asking Alexa if it’s going to rain – it saves me having to grab my phone to look at the weather forecast while trying to leave the house. I can also see how great a VUI will be when I’m driving – I hate setting off in my car and realising I’ve misconfigured my satnav. But one of the big problems I’m having with VUIs is being told how this is the next big thing.

I’m personally not interested in bots for most applications. I find it hard to trust that constrained conversational pathways will be better than tools like google. One book I’m reading about bots sounds the same as late-nineties books on the topic. Just because technologies have improved does not by itself mean this is definitely the era of VUIs and chatbots. Either the application needs to be appropriate or the interface very well crafted.

June’s Brighton Java – Alexa and CQRS

We had two speakers at Brighton Java this week. I was the support act, kicking off with an introductory talk on Alexa in Ten Minutes. I enjoyed putting together such a brisk technical presentation, which came to 34 slides. I managed to finish dead on time and had some interesting questions afterwards. The slides are online,  along with a video of the evening. I also took my Echo Dot along, so that Alexa could speak with me:

Some links from the talk:

The second presentation, from David Ellis, was about CQRS and event sourcing. It was very timely for me as I’m reviewing the design of a platform I’m working on.

CQRS stands for Command/Query Responsibility Separation and is the idea of using a different data model for reads and writes. One way of doing this is event sourcing, which records a system’s full history as a series of immutable events. These become an append-only system of record for the system (as well as a full audit). Representations of state can then be built from this (including in-memory where speed is needed). You can also produce representations of specific times – a form of time travel. David ran through the basics and showed how event-sourcing worked as a means for CQRS.

While I’ve read about CQRS and event sourcing before, it’s great to hear someone talking about it. It also amused me that David’s examples were “written in Scala so that they fit on the slides”.

Given that the talk was at Brandwatch, there was some good discussion afterwards about the possibility of using Kafka. I’ve also been wondering today about combining CQRS and REST and hope to research that next week.

So, all-in-all, a good evening. My talk seemed to go well, and I learnt about CQRS. Thanks, once again, to Brandwatch, who hosted the event, as well as providing drinks and pizza. Luke and Adina helped set up the night, with Luke doing an amazing job with the tech side of things, handling sound, streaming and technical gremlins.

First Steps with Alexa

Amazon’s Alexa is a virtual assistant available on the Amazon Echo, as well as (with a little struggle) on the Raspberry Pi. I find voice interfaces fascinating – they’re a staple of sci-fi, and the natural way people communicate with one another. Email and IM are great, but look at how often people switch from these to picking up the phone for a discussion. Voice works.

One of the great things about Alexa is how easy it is to build a skill (as new Alexa abilities are called). There’s an Amazon tutorial that explains how to get a skill up and running in five minutes.  There are also lots of useful blog posts on the subject, of which this is another. It will be the first in a series of posts. This initial version simply returns a randomised String to demonstrate how easy it is to get Alexa to work.

All the code here is in a git repository. I’m going to draw attention to specific parts of the code, but not repeat the entire contents of the repo.

Pre-requisites

Working through this tutorial requires a basic Java development environment (with gradle and git) and an Amazon account with AWS and Amazon Developer access.

The Basic Application

At their simplest level, Alexa skills receive a String and return a String. Amazon handles the parsing of the voice, and even does some neat things to contextualise what it has heard – for example, adding metadata about dates and times in the request. The skill simply needs to use this input to generate a response (sometimes marked up with Speech Synthesis Markup Language metadata).

The application I’m working on is a simple Tarot card reader. I became interested in tarot through the Hexen deck, which I saw at the V&A’s recent You Say You Want a Revolution exhibition. The cards are based on the history of the Internet and counter-culture:

I don’t believe in the tarot as a fortune-telling method, but I’m interested in it as, in Italo Calvino’s phrase, “a machine for telling stories”. And I like the idea of having a piece of technology like Alexa reading and interpreting the cards. I mean, that’s science fiction, right?

Building the Basic Code

The first tag in the repository is for a simple Java class returning a string. There is no Alexa code at this point – when this is added in the next step, it is simply as an interface for this simple code. We could as easily hook up the basic class to a twitter bot, or a web-server, or even an email auto-responder. The core code does not depend upon Alexa. To have a look at the basic code, clone the repository and then checkout the initial commit:

git clone git@github.com:orbific/alexa-tarot-skill.git
cd alexa-tarot-skill
git checkout basic-tarot-class

The first version of the TarotCardPicker creates a deck of 78 cards, then selects a random one. At this points we’re ignoring things like reversed cards and interpretations in favour of simplicity. We’re also ignoring lots of things like internationalisation, unit tests and javadoc which would be expected if this were to be used as production code.

It’s possible to compile and run this piece of code.

gradle build
java -jar build/libs/alexa-tarot-skill-1.0.jar

Running the two commands above from the project’s root directory results in a line of text stating the name of the card that has been picked. The next stage is to make this code available via Alexa.

Adding the Alexa/Lambda files

The easiest way to get going with a skill is to use Amazon Web Services (AWS) to host the code. The specific part of AWS that is used for Alexa is AWS Lambda. This is a serverless code environment and basically means a piece of code can run without having to worry about infrastructure. You don’t need to use Lambda, but it makes life a lot easier – running Alexa skills from web services means having to deal with certificates between Alexa and another environment.

To see the basic framework code, use the appropriate checkout:

git checkout basic-alexa-framework

There are five files needed to wrap Alexa around the simple Tarot code:

  1. SampleUtterances.txt – the next file lists all the different ways a user might contact the service. These are grouped into ‘intents’. If you wanted to ask for a coffee you might say “Please give me a coffee”, “I’d like a coffee” or “I demand coffee”, all of which have the same Intent. This file maps statements to intents.
  2. IntentSchema.json this describes the intents the skill expects to receive. We can also use in-built intents, which allow different apps to have similar functionality. For example, I might ask Alexa to tell me how the Tarot card functionality behaves, which would trigger a HELP intent.
  3. TarotSpeechletRequestStreamHandler.java this class is a subclass of SpeechletRequestStreamHandler. The documentation explains “This class provides the handler required when hosting the service as an AWS Lambda function” – it’s a link between AWS lambda and Alexa. It’s pretty much a piece of boilerplate code.
  4. TarotSpeechlet.java the Speechlet is a simple interface that defines the Alexa behaviour: this implements the SpeechletV2 interface which is the main provider of the Alexa behaviours. In this case, it receives an intent and provides a response.
  5. log.properties this file defines the logging for the lambda function.

Code structure

As stated above, there are two Java classes. As the link between Alexa and the application code,  TarotSpeechletRequestStreamHandler is fairly straightforward. It contains a static initializer which copies the APP_ID environment variable to a supportedApplicationIds variable, which can then be passed through the constructor to the superclass.

public class TarotSpeechletRequestStreamHandler extends SpeechletRequestStreamHandler {

private static final Set<String> supportedApplicationIds = new HashSet<String>();

static {
String appId = System.getenv("APP_ID");
supportedApplicationIds.add(appId);
}

public TarotSpeechletRequestStreamHandler() {
super(new TarotSpeechlet(), supportedApplicationIds);
}
}

Obviously, it would be possible to have the APP_ID hardcoded here, but this saves a little messing around. The reason for checking the application ID is to ensure that the request has come from our Alexa skill. It would be possible for someone who knew our endpoints to try sending requests, using our code to do the hard work for them (and be billed for it). Every request for Alexa includes an application ID, that can then be checked.

The Alexa code, the interesting bit, is contained within the TarotSpeechlet. This particular class is extremely simple, but it demonstrates the basic behaviour of Alexa. There are four overridden methods from the interface which are described in the javadoc:

  • onLaunch is “Entry point for Speechlets for handling a speech initiated request to start the skill without providing an Intent”. We’re ignoring this functionality.
  • onSessionStarted, onSessionEnded are two callbacks for when a session begins or ends. Sessions are outside the scope of this particular post, but I’ll write about them later. This allows the creation of skills that include a series of interactions.
  • onIntent is called when a speech request linked to an intent is sent. Here we check for a single intent and if that is not found then an appropriate response is made.

The onIntent method is relatively simple:

@Override
public SpeechletResponse onIntent(SpeechletRequestEnvelope<IntentRequest> requestEnvelope) {
  Intent intent = requestEnvelope.getRequest().getIntent();
  String intentName = (intent != null) ? intent.getName() : null;
  if ("SingleCardIntent".equals(intentName)) {
    return getSingleCardResponse();
  } else {
    return getUnknownCommandResponse();
  }
}

Note that the response is created as a SimpleCard – this is used in the Alexa app to add contextual information to the speech. I’ve also stripped all of the logging to make it simpler.

Putting the application live

At this point, we have all the code that we need to put this application live. There is still a little bit of work required to get the new skill connected to Alexa. I’m not going to outline all of the steps in detail, but will list the basic steps as at the time of writing (late June 2017). I’m also going to assume that an AWS account is available. AWS Lambda is available on the free tier.

The first step is to produce a ‘fat jar’ locally. This is jar file that contains all of the projects dependencies. The build.gradle file contains a jar target that builds this far jar.

gradle build

The resulting file, ./build/libs/alexa-tarot-skill-fat-1.0.jar, is 9.4MB.

The basic steps for the upload are described on Amazon’s page, Deploying a Sample Custom Skill to AWS Lambda. There are three stages to this process:

Creating the Lambda function

Log into AWS lambda, and create a new lambda function. Currently, this needs to be in one of two specific regions, US East or EU. The first option is to select a blueprint. Use a blank function.

The second stage is to create the appropriate trigger. From the list available, select “Alexa Skills Kit”. To generate the list, click on the dotted-rectangle to the left of the Lambda logo.

The bulk of the configuration is contained in the next screen, Configure function.

The required options are as follows:

  • Name and description are set as needed
  • The runtime drop down is changed to ‘Java 8’, which updates the options below in the form.
  • The Upload button is used to upload the far jar from earlier to AWS
  • No environment variables are set at this point. We do not have the APP_ID that will be provided by Alexa, so the skill will not work properly yet.
  • The handler is the packaged classname for the SpeechletRequestStreamHandler. In this case, it will be com.riddlefox.tarot.TarotSpeechletRequestStreamHandler
  • The role needs to be set. If there is no existing role for the account, the drop-down opens a new page in the browser to set up a new role.
  • The Tags and Advanced Settings can be ignored.

Pressing next results in a summary page, and the Create Function button can be pressed to complete the work. This takes a little time to work. Once it is complete, look at the settings for the function to find the ARN. This will be needed in the next stage.

Linking Alexa to the Lambda function

The next task is to create an Alexa skill that connects to the newly-created Lambda function. This requires an account on the Alexa developer portal.  Log in to this and select the Alexa Skills Kit option. This has a button, Add a new skill. Press this to begin the process. The resulting form looks quite complicated, but not all of the tabs need to be filled in.

Skill Information

  • If necessary, change the language from English (US) to English (UK) as this can improve the information received from Alexa.
  • The name should be filled in as needed
  • Invocation name is the name that needs to be spoken to contact the skill.
  • Press Save

Interaction Model

This section defines the way in which Alexa interacts with the Lambda skill. It requires us to copy the information from the resource files. The intent schema requires the content of the IntentSchema.json file:

Further down the page we need to provide our sample utterances. These are in the SampleUtterances.txt file:

Press the save button to make sure everything is as expected, then press next.

Configuration

This tab defines the link to AWS Lambda.  Set the service endpoint type to be AWS, pick the region and add the ARN in the (untitled) text box:

Press save and next.

At this point, the application ID will be available. This is a long string that begins “amzn1.ask.skill”. This now needs to be copied across to the environment variables for the lambda function.

Adding the app ID to the lambda function

Returning to AWS lambda, open the configuration for the function that has been created. Add a new environment variable, called APP_ID, which contains the application ID from earlier. Save this, and the Alexa skill has been set up.

Demonstrating the Skill

Having set up the skill, how do we confirm that it works? Go back to the Alexa developer page and open the Skill in question. On the Test tab, there is a ‘Service Simulator’ section. Entering an utterance here and pressing the Ask Tarot Demo button lets us see the request and response:

The final stage is to test it on the Alexa device. Because this skill is still at the developer stage, it can only be tested on Alexa devices connected to the developer account. Saying “Alexa, ask Tarot,  give me a card” will result in Alexa saying the name of a card.

 If you do not have a physical Alexa-enabled device, it is also possible to test Alexa through a web browser at echosim.io. Note that this still requires a valid Amazon account.

Summary

This post has shown how to set up a very simple application in Alexa. There’s not much more to it, other than demonstrating the set-up, but future posts will focus on how to get Alexa to do more interesting things.

If you have any questions, or anything is not clear, then please leave a comment and I will update the text.

A Day in the life of a Functional Programmer

I sometimes think I should have used a different name to Brighton Java. Maybe Brighton JVM would have been more appropriate. We’ve certainly had a wider range of talks than the name implies. But the name Brighton Java does seem to draw people in, and nobody complains about the eclectic selection of talks.

IMG_20170531_202936

Last Wednesday we welcomed back Richard Dallaway of Underscore Consulting, who spoke about a day in the life of a functional programmer (slides here, with notes here) The talk was intended to communicate some of the exciting concepts available in Scala

You don’t come away from a talk like this knowing how to work with Scala – but it gives a clear demonstration that there are other ways in which code can work. While Java has certainly added many features in recent years, the language is a far less interesting and expressive one than Scala. It’s also interesting to see the way Scala supports different approaches to a same problem – Java, by comparison, tends to be more prescriptive about how to write code.

For Richard, Scala works better for modelling some things better than Java’s class hierarchies – the talk worked as a good companion to Danielle’s talk the month before.

We’re hoping to start a Scala workshop as part of Brighton Java, based on Functional Brighton‘s Learn You a Haskell group. It looks like we will base the first sessions on Underscores Essential Scala book. More news on that soon.

It was a good meeting, sponsored by Brandwatch. As well as providing the venue and AV equipment, they supported us with a great abundance of pizza. Thank you!

IMG_20170531_193004IMG_20170531_183443