Global Celebrities Who Are Rushing For Bitcoins

Bitcoins are for everyone — the unbanked, the rich and the famous. While there is a common misconception that bitcoin is only useful in the developing countries of Africa or elsewhere, these…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Making Alexa Even Smarter with Other AWS Services

Part 1: An overview of the project.

This article series is on building an app that allows a user to upload images to S3 and then use Alexa to ask for information on those images after it has gone through AWS’s image analysis service, Rekognition.

I wanted to learn. To me AWS is literally a cloud. This thing that is there in front of me, but forever changing and moving. My goal with this post/project is to understand it a little more. Tap in and seek its capabilities that exist outside of just what I use for work.

We will touch on using tools from the AWS ecosystem —Alexa Skills Kit, Cloudwatch, DynamoDB, Lambda functions, Rekognition and S3. Using all these technologies together, we will look to not only make an Alexa skill, but to enhance its capabilities with image analysis from Rekognition.

If you have ever been interested in how Alexa works, building your own Alexa skill or have been looking for a way to de-mystify some AWS services, this post can perhaps help with that.

That’s okay, just be curious =)

“Amazon ecosystem”

With these AWS services in mind, it feels like a natural step to build some kind of voice enabled image analysis to fully utilize and learn all three of these services. Theoretically, a user could interact with their Alexa device, asking for information that can be provided by Rekognition.

At the top of this post was a crude drawing of what a basic Alexa skill flow looks like. We have already discussed what the Backend Processing layer looks like for this project(Lambda) but we haven’t mentioned much about the Interface layer. In Part 3 of this series, we will implement the Interface layer of an Alexa skill, but for now just know that the technology in this layer is solely controlled by the Alexa team. Basically, it is an Alexa GUI where a user provides sample sentences in a text format(“utterances”) that map to functions(“intents”) you want to be called.

To build this skill, lets just give Alexa a URL, have the lambda function connect to Rekognition and then tell us what it can find.

With this project flow, there are a couple problems right off the bat. One, Alexa is one of the primary technologies in this project and using speech to input a URL sounds like a nightmare. Two, giving Alexa a URL would require your lambda function to at runtime, make an external call to pull the image, make another call to Rekognition, wait for the analysis and then return the result. That experience would move at a glacial pace.

To create a snappy experience, we’ll have to alter the project flow and make this a two step process. In order for Alexa to pull image information immediately after a user interacts with it, the information has to already be there, waiting.

A persistence layer needs to be added, and of course AWS has an answer for that too! DynamoDB is AWS’s NoSQL database service. I typically use relational databases so getting up and running with DynamoDB was a bit of a struggle. Like any NoSQL database, it is very flexible so table writes were incredibly simple — table reads were not.

Now in the project flow there is a persistence layer to store image information. Once a user interacts with Alexa, a lambda function can look for an image stored in DynamoDB, and pull the relevant information, without needing to make a call to an external source, or Rekognition.

“Vision” step

This new flow also adds another opportunity to use Lambda. Instead of giving Alexa a URL, a user will first upload images that they want analyzed, into S3. Upon upload, a Lambda function will be called that will pass this image into Rekognition for analysis. When Rekognition returns the relevant information, it will then store the information into DynamoDB for extraction at a later time.

“Categorize” step

Using the “Categorize” step to gather information into the database, and then to use the “Vision” step to extract that information is a full data transformation from image to speech. Although I admit that there is not much practical use for this project in its current state (I gave myself a week’s worth of free time to work on this), it is amazing to think that using AWS’s free tier, someone can gain this kind of computational power and create this capability.

Part 2 and 3 will be a walk-through on of how to implement this Alexa skill with links to AWS and my github repo. While Part 4 is a dive into the code.

Continue to Part 2.

Add a comment

Related posts:

ARTIST BARON VON FANCY WANTS YOUR RESPECT.

Last week I met up with Gordon Stevenson, a New York based artist know as Baron Von Fancy, at New York’s The Odeon. The Odeon is arguably the restaurant that defined New York in the 1980s. A time…

Naturalia Scan App

First week at Ironhack was about working in group. My group and I were asked to design a project for a brand of groceries that we had to choose. We chose to work with Naturalia. First of all, we…