The Perils of Alexa certification

I’ve got my first skill up on Amazon’s Alexa service. It’s called Travel Stories and it’s not very exciting. To be honest, it could have done with more work, but there’s an offer of free Alexa  Echo Dots in return for submitting a new skill. So this post will be about two things: my experience with the certification process; and Amazon’s desperate drive to get developers producing new skills.

One of the biggest boasts about Alexa compared with other voice interfaces is that it has many more skills available – 15,000 compared with hundreds on Apple. The problem is that a lot of these skills are not very good. Amazon has templates for basic voice apps with the idea that developers will customise these. It means there are a lot of skills which recite pieces of trivia. Travel Stories is pretty much one of these in its first incarnation.

In an BBC new article, Amazon’s race to make Alexa smarter, Leo Kelion pointed out that “For all the promise of compelling new ways to control home appliances or on-demand news updates from major media brands, there seem to be a mountain of apps dedicated to delivering “fun facts”, gags, wacky noises and a vast range of ambient sounds.

When Amazon’s representative was challenged on this, he said “I guess I would not agree with the thesis that some of the skills are not sticky – many of them are… You never know which one of those Cat Facts is going to turn into the next big thing. There are many examples of that out there.

The lack of an example here, the absence of a well-known case where a cat skill became famous, underlines the issue. There are a lot of not-very-interesting skills for Alexa.

Alongside prizes, Amazon’s push for new skills also includes huge discounts on the AWS services needed. Amazon have also began directly paying the makers of successful skills. There’s a lot of encouragement to work with the platform.

On a more positive note, Amazon have a fast efficient certification service that aims to get skills moved through as quickly as possible. I originally submitted my new skill at the start of July. At the Amazon Workshop we’d been warned that 70% of skills failed because example phrases do not occur in the sample utterances. According to Amazon I failed for two reasons

* “The example phrases that you choose to present to users in the companion app must be selected from your sample utterances. These sample utterances should not include the wake word or any relevant launch phrasing.
* “If the session closes after launching the skill, a core functionality must be completed without prompting users to speak.

The first of these was clumsiness on my part – somehow I fell for a trap I’d be warned about. I’m still not sure why my sample utterances failed and the ones Amazon suggested worked. I’m sure this check could be automated in some way before certification. The response was short and to the point so that a few minutes work, editing the code and my sample utterances, was all I needed to get through the second time.

You can now add my skill to your Alexa device, but I’m not sure why you would. Certification is more about quantity and quality. But I do have interesting plans for my skill, and I’ll be working on them soon.

Some notes on remote working

  • Maybe there’s an irony to reading the 37 Signals book Remote while on a business trip. I’ve been remote working for my clients since finishing at Intel. It seems a more humane way to work. I’m also convinced it’s better for the companies involved. But, obviously, there is still some contact time required.
  • For me, the biggest argument for remote work is removing the link between geography and employment. That means people can live where they want with no need to commute. They don’t need to be near an office, removing pressure on housing.
  • We should definitely be trying to use technology to end commuting. All that travel is bad for the environment and produces stress. It’s not possible for carers or the disabled to make long journeys every day. And, while I’ve had good days commuting, no commute beats quiet time with family and friends.
  • I personally like remote working as I have a nice flat and that flat is better for me than any office I’ve worked in. Sure, dotcoms sometimes offer laundry services – but I already have that. The coffee is excellent, and there’s a chaise long when I need to read long reports. I also have flexibility – to work from friends’ houses, or a library, or a co-working centre like the Skiff. Research has shown that commute time is inversely proportional to how happy people are with their lives. I’ve never been happier than I have since getting more control over my working schedule and conditions.
  • Remote work is also better for companies. Many start-ups I’ve worked for have ended up as oral cultures, with documentation and processes distributed by word of mouth. Remote working forces companies to make things explicit. This makes their actions more efficient, repeatable, and reduces the risk of knowledge disappearing when employees leave.
  • (There are some interesting related questions about  transparency within companies. Why should my email account not be visible as documentation for all my colleagues? If another employee seeing what I write is a problem, then I shouldn’t be writing it.)
  • Remote working requires much more written communication. Many companies already worry about the amount of email, with some even instituting email-free Friday. I’m not convinced that email is the evil that some people are making it – rather, like meetings and managers, these are good things that are misused.
  • (One of the best email strategies I saw was in the early days at Brandwatch. That company had the only functional corporate wiki I’ve ever seen. One reason for this is that they used mediawiki, which is a much more fluent tool than, say Confluence. Brandwatch also had a rule that questions should not be answered by email – instead you had to write it in the wiki and send a link. And, after a while, the wiki answered a lot of questions, cutting down on repetitive emails.
  • Another cargo cult of management is open plan offices. These environments are toxic for knowledge work, but companies still go for them, despite decades of research. Many times its because managers want to stop employees slacking off. This leads to a culture where people are rewarded for turning up, not for the work that is done. Remote working forces companies to be aware of what is actually happening. 37 signals talk about the vigilance needed for remote work – that small problems must to be addressed rather than allowed to grow. And this is not just in terms of work done, but also employee satisfaction and engagement. If employees are slacking off, it could be because they are not engaged. Forcing companies to be pro-active about these things can only be positive.
  • There are excellent tools to support remote working, but these take some getting used to. Conference calls are tricky (as this site  demonstrates). And slack is taking some getting used to: am I posting in the right channel? what should be a DM and what should be public? I’m also finding it difficult to know how best to  work with remote reports.
  • But Remote working is a skill and I am still learning. For example, I’ve discovered how much I rely on quick, informal chats rather than explicit briefings.  But, as I get better at remote working, I become a better employee all round.

A Day at an Alexa Workshop

Back at the start of July, I attended an Alexa workshop in Brighton. I planned to blog about it soon after, but client work took over my life a little, sending me to LA and Dublin in the process. The session was a good one and I wanted to note some of the interesting things I learned for future reference. It happened in the ballroom of a seafront hotel (note chandeliers, and those dangling lengths of cloth that probably have an actual name) and was run by David Low, an Alexa evangelist from Amazon, who’d previously worked on the Skyscanner Alexa skill.

I’m cautious about in-person training sessions, in case the pace is slower than a day’s reading and tinkering at home – particularly when some of the Amazon tutorials are so good. Here, I learned a lot of interesting information, of which this is some of the highlights:

  • The example skills we looked at were written in Javascript, rather than Java, as I’m used to. It’s notable how much more concise the Javascript code was.
  • Previously, I knew little of the Echo device’s technical details and it was interesting to find out about how Alexa works. Apparently the seven directional microphones are set up so one listens to the speech, and the other six record the background noise and filter it out. This explains how good Alexa is at hearing me over music.
  • Amazon have given a lot of thought to how they would like people to think about Alexa. A session like this is good for communicating this vision. For example, “Alexa lives inside the Echo” or “Not building apps, building conversations”.
  • Apparently about 70% of skills fail certification because their example phrases do not occur in the sample utterances. This rule seems self-evident, but somehow people miss it. Even knowing this, my first attempt at creating a skill failed the certification stage. I’ll have more to say about that soon.
  • This is possibly a spoiler for the workshop, but we were told that Alexa had received 250,000 marriage proposals. It’s interesting to see that number, and that Amazon feel this is a positive thing. I feel uncomfortable about the gendering of Voice UIs and chatbots, something I want to spend more time thinking about
  • Some services have 270,000 sample utterances, which is far larger than I had expected.
  • I sometimes feel lost in the Alexa skill store, so it was good to have a discussion of some successful use cases:
    • There is some interesting research around the benefits of Alexa for older people.
    • Alexa is working to help guests in hotels.
    • It was also interesting to hear how the Skyscanner app worked, allowing the user to pose questions such as “Where can I go for £100 this weekend?”
  • One of the interesting things about devices like Alexa is that functionality can be unlocked over time. Apparently there are several colours available in Alexa’s glowing ring, which may be used for different things. There are also possibilities such as additional voices. Push notifications/events are a more complicated issue, with Amazon trying to work through the privacy and interruption issues.

Microservices for the monolith

I found this post in my drafts folder. It’s about three years old, and I was going to delete it; but I had a quick read and there were a few things that are still worth saying.

It dates back to when there was a lot of excitement about microservices and I was researching them for an employer. Ultimately, the technical demands of maintaining microservices were too much for a company that had bigger problems – but the principle involved (microservices as service-oriented architecture done right) were incredibly useful.

Foremost among these useful principles was the idea of treating all interactions as inherently asynchronous and unreliable. The principles in the Fallacies of Distributed Systems are a useful caution for almost all projects. Although I’m not convinced Conway’s Law is either valid or useful.

Now that we’ve got big data out of the way, it looks as if the next big hype will be microservices. The term has been around for a while, most notably in a January 2013 talk by James Lewis of Thoughtworks. The idea started to become popular around February 2014, with Martin Fowler publishing a post on the subject in March. Discussion is gathering, with a conference, Mu-Con, planned for the end of November and an O’Reilly book on the topic due for publication in March 2015.

As with Big Data, there is a similar lack of clarity about the definition. How tiny is micro? Definitions have ranged to a couple of screens of code, to a small amount of code that a single developer can understand in a few days. James Lewis suggested about 1000 lines long.

Regardless of the problematic definition, obvious characteristics of such services are emerging, with JSON and REST are forerunners for the protocols. Netflix have been open about their successes with this architecture, releasing some amazing tools and documentation.

A purist microservice architecture is not going to be helpful for most companies. In his early presentation, James Lewis mentioned the importance of Conway’s law. Microservices require highly-skilled developers, clearly defined structure and rigourous processes. Most companies are unlikely to have the structure, processes and calibre of developer to make full use of these architectures, let alone convert their existing architecture to fit in with this new way of working.

Despite this, microservices are still relevant to every Java developer, even if you’re working with a monolithic ball-of-mud architecture. It’s a rare system that doesn’t need to integrate with some external service. Microservices teach us to treat any external system as potentially unreliable and incompetent. If the service takes too long to reply, how does that effect our SLAs?

Netflix are so confident of the resilience of their server eco-system that they have introduced the chaos monkey. This turns off services randomly during business hours, to make sure that service continues uninterrupted. Because failures are going to happen so you should treat that as a fact of life. You system might be up 99.9% of the time, but it’s the 0.01% that gets remembered in your annual review.

The tools Netflix have produced to deal with these issues are designed to work with thousands of small services, but something like Hystrix is usable for a single integration. You need to be asking; how might this fail? When should I time out? And you can get this power with just a few Spring annotations.

These tools and architectures will be essential for every Java developer. When I first started writing software, everyone knew how important automated tests were, but test harnesses were difficult to produce. The creation of Junit made tests simple to write and this has altered the way Java software is developed, enabling refactoring, continuous delivery, and TDD.

Testing was important before Junit, and there are developers who still don’t use automated tests, but but Junit has revolutionised Java development. The software I am writing now is considerably more sophisticated that what I was writing 14 years ago. In part that is down to new tools like unit testing and Spring.

The tools designed for microservices are simple enough that they should be understood by every developer. The questions faced by Netflix should be considered for any integration: how do I handle failure, what do I do if this is not available?

The scale and size of the systems being build nowadays are incredible, and this has been enabled by the range and skill of the tools that have been open-sourced. Every developer needs to understand the tools created for microservices. Simple architectures should be as reliable as complicated ones.

The Complexity of Java

Java has become much more complicated over the years. I started working with the language back in 2000. I’d been a database developer for a couple of years, working with Oracle, but wanted to create more general applications. I learned enough to pass my first Java job interview with two books: Laura Lemay’s Teach Yourself Java in 21 days (which had been recently revised for Java 2!) and the first half of Wrox’s Professional Java Server Programming.

Together, these two books contained most of the Java knowledge I needed to do my job. I quickly picked up a lot of other things like CVS and Unix, but Java was definitely a lot simpler back then. I would say that the Java I needed as a professional developer back then included :

  • Core Java
  • JDBC
  • MySQL
  • HTML
  • CVS

The builds were done through  makefiles if they were scripted at all. It was fairly easy for a new developer to get working professionally. I mean, two books contained most of the information you needed – along information on how to do graphics, animation and applets. And the Wrox book also found time to cram in chapters on esoterica like Jini and Javaspaces. You could learn a lot of Java in 21 days.

(I wish I still had my old copies of these books. Living in Brighton involved moving frequently between small rooms and a lot of books had to be abandoned).

Over the last 15 years Core Java has become more complicated. The addition of things like generics and lambdas were much needed but make the language much more complicated. And the basic skills a developer needs where I’ve worked recently are much more complicated:

  • Core Java
  • Eclipse or equivalent IDE
  • Hibernate
  • MySQL
  • HTML
  • XML/JSON
  • REST
  • Junit and a mocking framework
  • git
  • Maven
  • Spring

The applications that can be built with modern Java are impressive and far beyond the scale of what would have been possible in 2000. I think it would be impossible now to write any large scale Java application without a decent IDE. And Java is much more complicated than before.

A lot has improved too, and it’s great to escape the horrors of classpath config, which has disappeared in place of easier options. But the point remains: I know a fair few people who learned to code under their own steam and ended up with successful careers. I imagine that is much more difficult nowadays. Back in 2000, applets were an easy way to learn to code and you could get going with notepad and a compiler. Modern Java is probably not a good beginner’s language.

Discipline vs. scheduling

One of the best statements I’ve read on developer discipline (which for me includes testing, documentation etc) came from Robert C. Martin:

“You know what you believe by observing yourself in a crisis. If in a crisis you follow your disciplines, then you truly believe in those disciplines. On the other hand, if you change your behaviour in a crisis, then you don’t truly believe in your normal behaviour” (from Chapter 11 in the Clean Coder)

There’s a slight subtlety here, in that you can sometimes gain time by dropping process, but this gain quickly evaporates as technical debt builds up. And, if you believe in these disciplines, you will schedule time to make up for this. As Martin explains:

“If you follow the discipline of Test Driven Development in non-crisis times but abandon it during a crisis, then you don’t really trust that TDD is helpful. If you keep your code clean during normal times but make messes in a crisis, then you don’t really believe that messes slow you down. If you pair in a crisis but don’t normally pair, then you believe pairing is more efficient than non-pairing. Choose disciplines that you feel comfortable following in a crisis. Then follow them all the time. Following these disciplines is the best way to avoid getting into a crisis.”

Discipline is often abandoned due to scheduling pressure. Martin discusses how to respond to such pressure in the second chapter of the Clean Coder. Developers often give an estimate only to be pressured to produce the output more quickly. The temptation is to give in to this pressure and promise to “try”. This is dangerous:

“If you are not holding back some energy in reserve, if you don’t have a new plan, if you aren’t going to change your behavior, and if you are reasonably confident in your original estimate, then promising to try is fundamentally dishonest. You are lying. And you are probably doing it to save face and to avoid a confrontation.”

Overtime is one response to this, but it can easily spiral out of control, since this is one of the few places where leverage can be applied. It also suffers from diminishing returns. There is good evidence that working more than 40 hours a week over a long time is harmful to projects.

So how can projects speed up? By the time an unrealistic deadline has become solid, it’s usually too late. Deadlines are often particularly problematic in scrum. It takes time for a scrum team to settled into a cadence where their work-rate becomes predictable. If the scope and team size have been fixed, then there is no way to hit the deadline without distorting the process.

What should you do when you have an urgent deadline that looks unachievable? In such a situation, failure has happened before development begins. Most times, this is not recognised until the developers are at work – resulting in crunch time, and estimates made to fit the deadline. In such situations, the best thing to do is to deal with the current project as best you can; and to look at future projects, making sure that these are not being scheduled without a good idea of the development needed.

The best answer to the factorial code interview question

Whenever I interview a developer, I always ask them to write code on a whiteboard. Nothing too complicated – I expect everyone uses an IDE these days and the candidate is also probably feeling a little nervous.

The big interview cliche is asking for a method producing a Fibonacci sequence. Even with telephone screening, this still eliminates more candidates than it should. A well-prepared interview candidate should have practised that one already, which means it’s still a useful test. Whether or not someone can do it, there are lots of interesting follow-up questions.

Some candidates get flustered trying to understand the Fibonacci sequence. I’d expect most people to know this already, but I wouldn’t want to reject a potentially-excellent candidate for lacking a bit of maths knowledge. So sometimes I would simply ask for a function to product factorials.

There are several different ways to do this, with the two main options being whether the solution is recursive or not. A simple solution would look something like this:

public int fact(int n)
{
    return n == 1 ? 1 : n * fact(n - 1);
}

One obvious follow-up question to this is what the limits of the function are – using an int means you’ll have overflow problems quite quickly. Does the developer know the class to use to avoid this?

The question simply asks for a piece of code. I would have given bonus points for any developer who mentioned testing before writing. But the perfect response would have been a candidate saying that the code would be different depending on its intended use.

Someone on github has produced an enterprise version of another interview classic, with Enterprise Fizzbuzz. Obviously that is going too far. But there are considerations for even the simplest piece of professional code:

  • Who else needs to work on this? What documentation/commenting is required?
  • How and where will it be deployed?
  • Is any error checking or exception handling required? In the example above – which I would have accepted as correct – there is no handling for the obvious overflow error.
  • Is something this simple suitable for the intended use? For example, a large number of similar requests might be better handled with some sort of question.

All of which is a complicated answer to a simple question. A dev raising these issues in an interview would still need to produce the code – but the discussion that followed would be very different.

Thomas Mann once claimed that “A writer is someone for whom writing is more difficult than it is for other people.” In the same way, the more I think about development, the harder it seems. A factorial example might seem almost insultingly simple – but it’s possible to have a very complicated conversation about it.

Alexa, Please

I sometimes feel uncomfortable giving orders to Alexa. I know that she is a series of scripts, and have a good idea of the technology involved, but I still dislike barking demands at her.

I’ve read a couple of articles about parents who were concerned about their children’s interactions with Alexa. In a post entitled, Amazon Echo Is Magical. It’s Also Turning My Kid Into an Asshole, Hunter Walk suggested Alexa needed a mode that required please and thank you, to help children learn manners. These words currently have no effect on how Alexa works, and are filtered out before a request is sent to a skill.

In a piece on “bot-mania”, Dan Grover looked at the recent excitement over bots, placing it into a historical context. It’s a fascinating piece, talking in detail about how freetext chat may not be the best option for most requirements. Once particular passage jumped out at me:

This notion of a bot handling [tasks like ordering pizza] is a curious kind of skeumorphism. In the same way that a contact book app… may have presented contacts as little cards with drop shadows and ring holes… conversational UI, too, has applied an analog metaphor to a digital task and brought along details that, in this form, no longer serve any purpose. Things like the small pleasantries in the above exchange like “please” and “thank you”, to asking for various pizza-related choices sequentially and separately (rather than all at once). These vestiges of human conversation no longer provide utility (if anything, they impede the task).

A skeumorph, as defined in wikipedia, is “a derivative object that retains ornamental design cues (attributes) from structures that are inherent to the original“. As an example, it gives the swiping gesture for turning pages on tablets, or the shutter sound on digital cameras. However, these skeumorphs sometimes have their own uses, for example the shutter sound notifies people that a photograph has been taken.

In one discussion of the please/thank-you issue (Parents are worried the Amazon Echo is conditioning their kids to be rude) an investment firm founder called Manu Kumar explained why he felt it important to be nice to devices. “One of my metrics for determining how nice someone is is by watching how they interact with a waiter. In a similar way, even if the AI or tech doesn’t care about it, other people around us are going to experience how we interact with it.

For a while I thought it would be good if Amazon gave discounts to people who are well-mannered to Alexa. Then it occurred to me that, despite the rigorous codes about thank-you in English society, this is not universal. If you look at basic phrases translated into Hindi (ie omniglot), the word for thank you is given as dhanyavād, but this misses a subtlety. Deepak Singh wrote an article in the Atlantic, ‘I’ve Never Thanked My Parents for Anything’, where he talked about the status of thank you in Hindi.

In India, people—especially when they are your elders, relatives, or close friends—tend to feel that by thanking them, you’re violating your intimacy with them and creating formality and distance that shouldn’t exist. They may think that you’re closing off the possibility of relying on each other in the future. Saying dhanyavaad to strangers helps initiate a cycle of exchange and familiarity. But with family and friends, dhanyavaad can instead chill relations because you are already intimate and in a cycle of exchange.

All of this discussion may seem obscure, but there is an interesting issue around the way we respond to devices. Alexa behaves with a personality and explicitly presents herself as female. Even if she is a batch of scripts, we are supposed to respond to her as an entity. There is a question of how we learn to behave with such creatures, and how we factor this into thinking about designing skills – where the skill is accessed via Alexa, at a strange remove – all requests involving Alexa being asked to pass the question on to the skill.

I still think it is important to be polite to Alexa. But I’m prepared to accept that it is irrelevant to her.

Placedreamer – a more interesting application

The problem with most Alexa apps is that they’re simple text bots with voice UIs.

Obviously, Alexa’s clever hacks make her skills a little more interesting than the same thing on the command line; such as the system used to match thousands of different phrases to a user’s intention. But a lot of the skills available are boring – particularly the ‘facts’ type of skill, where Alexa recites a random piece of information. What would be interesting would be an application that would not work outside of an Alexa device.

A more interesting problem

Take as an example the tarot app I built in my recent tutorial. It doesn’t do anything particularly novel – we could do the same with a twitter bot or a Bash script. As well as speech, Alexa provides the ability to play sounds, as well as some clever ways of handling streaming (something Tim O’Reilly praised in his celebration of Alexa). An interesting skill would make use of such things.

Rather than take the tarot skill further, I wanted to work on something more interesting. I asked friends on Facebook what they thought I should do. Tom suggested “Pipe in birdsong from the last distant place you travelled to. City sounds from another timezone.

This sounded like a great idea. I like the idea of Alexa as a device that can occupy a strange, eidetic space – something to talk to when you can’t sleep. There are online field recordings available, and I’ve got photos which can be added to the response cards. This is a somewhat whimsical application, but that’s what attracted me to Alexa in the first place – a device that is placed in intimate, home spaces, and is always listening into conversations. (Although this is only for her name/wake-word, it can still prove disconcerting)

A problem with invocation names

My previous skill had the invocation name ‘tarot’. Which was OK for testing, but won’t pass Amazon’s requirements for invocation names, which states that “One-word invocation names are not allowed”.

Following this rule, I first set up the invocation name for this skill to be ‘Place dreamer’. Actually summoning this skill proved difficult. Place was too easily confused with Play, which Alexa saw as a more likely word, and would hear “Place Dreamer” as “Play Streamer”.

Don’t Believe the Hype

VUIs have a huge potential for providing certain types of information. I like asking Alexa if it’s going to rain – it saves me having to grab my phone to look at the weather forecast while trying to leave the house. I can also see how great a VUI will be when I’m driving – I hate setting off in my car and realising I’ve misconfigured my satnav. But one of the big problems I’m having with VUIs is being told how this is the next big thing.

I’m personally not interested in bots for most applications. I find it hard to trust that constrained conversational pathways will be better than tools like google. One book I’m reading about bots sounds the same as late-nineties books on the topic. Just because technologies have improved does not by itself mean this is definitely the era of VUIs and chatbots. Either the application needs to be appropriate or the interface very well crafted.

June’s Brighton Java – Alexa and CQRS

We had two speakers at Brighton Java this week. I was the support act, kicking off with an introductory talk on Alexa in Ten Minutes. I enjoyed putting together such a brisk technical presentation, which came to 34 slides. I managed to finish dead on time and had some interesting questions afterwards. The slides are online,  along with a video of the evening. I also took my Echo Dot along, so that Alexa could speak with me:

Some links from the talk:

The second presentation, from David Ellis, was about CQRS and event sourcing. It was very timely for me as I’m reviewing the design of a platform I’m working on.

CQRS stands for Command/Query Responsibility Separation and is the idea of using a different data model for reads and writes. One way of doing this is event sourcing, which records a system’s full history as a series of immutable events. These become an append-only system of record for the system (as well as a full audit). Representations of state can then be built from this (including in-memory where speed is needed). You can also produce representations of specific times – a form of time travel. David ran through the basics and showed how event-sourcing worked as a means for CQRS.

While I’ve read about CQRS and event sourcing before, it’s great to hear someone talking about it. It also amused me that David’s examples were “written in Scala so that they fit on the slides”.

Given that the talk was at Brandwatch, there was some good discussion afterwards about the possibility of using Kafka. I’ve also been wondering today about combining CQRS and REST and hope to research that next week.

So, all-in-all, a good evening. My talk seemed to go well, and I learnt about CQRS. Thanks, once again, to Brandwatch, who hosted the event, as well as providing drinks and pizza. Luke and Adina helped set up the night, with Luke doing an amazing job with the tech side of things, handling sound, streaming and technical gremlins.