🛫The Smart Speaker Market is About to get Noisy

China’s Alibaba, Samsung and Facebook are reportedly the latest tech giants to announce their intentions of joining the smart speaker market. The market, originally pioneered by Amazon, is expected to hit 13bn by 2024.

The smart speaker timeline

Let’s take a closer look at some of the major players in the smart speaker market.

 

Smartspeaker timeline

Amazon was the first player on the market, introducing Alexa and the Amazon Echo back in November 2014. The company enjoyed a monopoly of the market for quite some time and unveiled the Echo Dot in March 2016. Then, in November 2016, Google unveiled Google Home, its smart speaker to rival Amazon’s. Chinese company LingLong also launched its smart speaker DingDong in November. 2017 saw Amazon reveal the Echo Look in April and more recently the Echo Show in June. Apple announced that its smart speaker, HomePod, would be available at the end of the year and Orange has teamed up with Deutsche Telecom to create Djingo, the first French smart speaker, due for release in 2018! Reports revealed that Samsung was working on a Bixby powered smart speaker, and Facebook and Alibaba are also reportedly planning to join the market.

Amazon vs Google

To seduce the developers, the giants of the Internet are waging a war without mercy. Despite the headstart, Alexa is being caught up by Assistant, which is pretty much iso functional today.

Comparing Google Assistant and Alexa

In terms of geographical coverage, we can see in the map below that Assistant is already in front.

Geographical coverage - google home and amazon

Geographical coverage –  Assistant vs Alexa

However, Amazon remains far ahead in terms of the number of voice applications available on the store: +15000 vs 378.

Voice application skills - google home amazon alexa

Finally in terms of product range, Amazon is in front with a solid base of innovative products like the Echo Show and Echo Look. Interstingly those last two devices are more Voice-First than Voice-Only devices opening new UX opportunities.

Accelerators and brakes

The rise in smart homes, in addition to companies wanting to increase consumer experience and convenience, are among the major factors driving the rise of smart speakers. Indeed, today smart speakers are a lot more that a gadget for amusement and they can do a lot more than order a hawaiian pizza! Amazon Alexa, for example, has official skills from the banking, tourism and connected home sector. Privacy concerns owing to the fact that the devices are connected to the internet and can store voice data, as well as connectivity range and compatibility are all potential brakes to this otherwise fast growing market.

This is just the beginning

Voice is one of our primary and most natural methods of communication. Now, thanks to technological advancements it has become a major interface, transforming how we interact with technology. Touchscreens represented the last major shift in the way humans interact with machines, however, the leap to vocal interactions with machines is far more significant, particularly thanks to all the possibilities with third party applications. With an increasing number of players announcing their intentions of joining the smart speaker race, it is clear that this market isn’t going to slow down anytime soon, and it will indeed be fascinating to watch everything unfold. Smartly AI has been an advocate of vocal technology since 2012, and we are delighted to see its mainstream adoption, which encourages us to work even harder to accompany companies in this voice first revolution.

👀 Are smart speakers putting your privacy at risk?

Voice is becoming a primary interface. In our home appliances, cars, mobile apps… voice is everywhere. We can turn off the lights, order takeout, buy our weekly groceries or listen to our favorite album, all by using one of the most natural interfaces of all: our voice! This is made possible thanks to smart speakers such as Amazon Echo and Google Home! The convenience and fun these devices can bring is boundless, however, just how safe is it to sit these unassuming devices on our bedside table or in our living room, listening to our every word?

Smart speakers and privacy

What are smart speakers?

Voice recognition technology, like Apple’s Siri, has been around for a while. However, smart speakers such as Amazon’s Echo and Google’s Google Home are game changers. These speakers want to be your virtual assistant and transform the way you interact with your home, other devices, even your favorite brands. Based on voice activated artificial intelligence, smart speakers can be connected to third party Internet of things devices, such as your thermostat or car doors, enabling you to order and control things using your voice! Smart speakers are equipped with a web connected microphone, that is constantly listening for their trigger word. When a user activates a smart speaker to make a request, the device sends a record or stream audio clip of the command to a server where the request is processed a response is formulated. The audio clips are stored remotely and with both Amazon and Google’s devices you can review and delete them online. However, it is not clear whether the data stays on servers after being deleted from the account. Furthermore, at the moment devices only record requests, however, as they advance and are we are able to do more with them, such as dictate emails to be sent, where will this data be stored ?

 

privacy
Your voice is only cloud processed if you say a specific trigger word


As smart speakers are designed to wake up and record as soon as they hear one of their activation words, there could be instances where conversations get stored
without you even knowing! One prosecutor even issued a search warrant to see if a suspect’s Echo contains evidence in a murder case. As smart speakers cannot yet differentiate between voices, anyone can activate them. This was something that Burger King took advantage of in its recent TV ad, which has just won prestigious Cannes Lions’ Grand Prix award. At the end of the ad, the actor triggers Google Home to wake up and cite the Whopper Burger’s wikipedia description, by saying “OK Google, what is the Whopper burger!”. All this leads us to ask, just how private can a home with voice activated microphones really be?

Your privacy at risk?

So, can hackers exploit the backdoor coding of these devices and listen to what you’re saying? Well, nothing is impossible, but both Google and Amazon have taken the necessary precautions to stop wiretapping. Furthermore, the audio file that is sent to their data centers is encrypted, meaning that even your network was compromised, it is unlikely that smart speakers can be used as listening devices. Someone getting hold of you Amazon or Google password and seeing your interactions is the biggest risk, so make sure you use a strong password, you could even consider 2 factor security!

What can you do?

If the thought of the smart speaker being about to listen in at any moment makes you uneasy, you can put it on mute manually or change your account settings to make your device even more secure, such as password protecting purchase options available with the speaker or making the device play an audible tone when it is active and recording. You can also log onto your Amazon or Google account and delete your voice history (either individually or in bulk. To do this for your Google device, head over to myactivity.google.com, click the three vertical dots in the “My Activity” bar, and hit “Delete activity by” in the drop-down menu. Click the “All Products” drop-down menu, choose “Voice & Audio,” and click delete. For Amazon’s speaker, go to  amazon.com/myx, click the “Your Devices” tab, select your Alexa device, and click “Manage voice recordings.” A pop-up message will appear, and all you need to do it click “Delete”. However, please note that deleting your history on your smart speaker may affect the personalisation of your experience. Check out this handy screen cast for further instructions on deleting your Amazon Alexa account history.

Developers could also use privacy by design assistants, such as Snips. However, use may be limited due to these kinds of assistants having no internet connection.

The privacy / convenience tradeoff

At the rate the smart speaker and IoT industries are evolving it looks like they are going to become more and more present in our daily lives, therefore, it is essential to understand how they work and and what you can do to prevent them from breaching your privacy. In conclusion, yes, theoretically smart speakers could pose a threat to privacy. However, they are not terribly intrusive, as they are only recording when awoken by a trigger word, and the likelihood of them picking up on a conversation they aren’t supposed to, and then someone intercepting it is very slight. Google, Amazon and other sites have been logging our web activity for years, now it is starting to happen with voice snippets. In the pursuit of convenience privacy is sometimes sacrificed, and in this particular trade off, convenience comes out on top for us!

🔔Toward a fully Context-aware Conversational Agent

I was recently asked by my friend Bret Kinsella from voicebot.ai for my predictions on AI and Voice. You can find my 50 cents in the post 2017 Predictions From Voice-first Industry Leaders.

In this contribution, I mentioned the concept of speech metadata that I want to detail with you here.

As Voice App developper, when you have to deal with voice inputs coming from an Amazon Echo or a Google Home, the best you can get today is the transcription of the text pronounced by the user.

While It’s cool to finally have access to efficient speech to text engines, It’s a bit sad that in the process, so much valuable information is lost!

The reality of a conversational input is much more than just a sequence of words, It’s also about:

  • the people — is it John or Emma speaking?
  • the emotions — is Emma happy ? angry ? excited ? tired ? laughing ?
  • the environment — is she walking on a beach or stuck in a traffic jam?
  • local sounds — a door slam? a fire alarm? some birds tweeting ?.

Imagine now the possibilities, the intelligence of the conversations if we could have access to all this information: Huge!

But even we could go further.

It’s a known fact in communication that while interacting with someone, non-verbal communication is as important as verbal communication.

So why are we sticking to the verbal side of the conversation while interacting with Voice Apps ?

Speech metadata is all about the non verbal information, wich is in my opinion the immerged part of the iceberg and thus the more interesting to explore!

A good example of speech metadata is the combination of vision and voice processing in the movie Her.

With the addition of the camera, new conversations can happens, such as discussing the beauty of a sunset, the origin of an artwork or the composition of a chocolate bar!

Asteria is one of the many startups starting to offer this kind of rich interactions.

I think this is the way to go and that there would be a tremendous amount of innovative apps that will be unleashed by the availablily of the conversational metadata.

In particular, I hope from Amazon, Google & Microsoft to release some of this data in 2017 so we the developers can work on a fully context aware conversational agent.

🔊Introducing Audicons™

The way we are interacting with our digital world
will be completely changed with the rise of voice assistants such as Alexa or Assistant.

We created Smartly.AI to make this transition easier for developers while pushing the horizons of Conversational AI.

The Problem
Currently, if you want to build a rich message for your bot, you can use a language called SSML to mix Voice Synthesis and Audio Sounds.
With SSML you can do pretty amazing things ( change the pitch and tone of the voice, add silences, …). You can check a documentation on how Alexa’s SSML . But, the issue here is that SSML has also a tricky syntax that makes it quite hard to master for a new developer.
As an illustration, let’s see what I have to do to build an answer to this question with SSML:

“Alexa, ask PlaneWatcher: Where is the plane DC-132?”

<speak>
    <audio src="https://server.com/audio/plane.mp3"/> 
    <s>Welcome to Plane Watcher, 
    <audio src="https://server.com/audio/sad.mp3" /> 
    <s>The plane DC-132 is currently being delayed of 30 minutes!
</speak>

Wait another XML like grammar to deal with… 🤔
Come on, this has to be fixed!

Our solution
As we overuse emoticons in our Slack channel, we couldn’t resist to try to transpose this awesome language to the voice world!
After a few experiments, we are happy to present you our latest creation:
the Audicons !

✈ Welcome to Plane Watcher ☹ The place DC-132 is currently being delayed of 30 minutes!

Audicons  is a set of standardized audio-files that can be easily recognized and associated to specific meanings. Audicons will be soon open sourced so you can reuse them in your ownprojects, Stay tuned 😀
In most cases, we think Audicons can replace SSML.
Audicons have the potential to evolve to a standardized audioset used in ALL the voice interfaces.

A short examples you may want to create for weather forecasts

😃 Tomorrow is gonna be sunny ☀🕶!
😩Tomorrow is gonna be rainy ☔⛈!

Hear our first Audicons in the demo below.

Cool isn’it ? 😃
Wich ones do you prefer?

You can already use Audicons in your Alexa skill if you build it with Smartly.AI but we plan to open source them soon along with our SSML generator.

Now it’s up to you to make your beloved Alexa more expressive  !

💪🏻 Are you ready to hire an AI?

Let’s face it – the super-intelligent AI takeover that many are fearing is not for today.

We may all lose against Watson at Jeopardy, and AlphaGo is the champion when it comes to Go but… those cool marketing campaigns are far from the holy grail of the so-called General AI.

According to the most authoritative voices in the space,
the singularity may probably occur at some point in the 2040s.
Until then, I can’t imagine having a smart, meaningful and pleasant conversation with Siri, Alexa or Cortana for more than 5 minutes.
When it comes to open conversations, there is no match to humans.

 Human > general AI

Things get less contrasted when we narrow the conversation to a specific topic. A specialized AI can be much better at managing user requests because it has been designed for a unique purpose.
The Turing is easier to achieve for those AIs.
To illustrate this, you can try Amy, the virtual assistant created by x.ai to perform a single task: scheduling meetings for you. She does so by email: demo here.
Amy is so good at doing this that most people think she is a real assistant.
When it comes to narrowed conversations, AI has the advantage of dealing with big data volumes while humans are more accurate. 1-1 here.

Human ~  specialized AI

How is Amy doing such a great job?
Well, as explained here, a key part of the process relies on a Supervised Machine Learning. AI trainers teach Amy how humans express time, locations, contact names… Amy then uses this knowledge to work better.
It’s a virtuous circle. 🙂

Facebook M is relying even more on humans to teach the AI how to complete tasks. “M can purchase items, get gifts delivered to your loved ones, book restaurants, make travel arrangements, appointments and way more.”

In a recent project at Smartly.ai, we tried this “hybrid AI” approach.
The results were stunning – the AI was able to manage 80% of the requests!
While the AI was successfully dealing with the simple questions,
the operator was enjoying more time to engage in a qualitative way with the customers having complex requests.
Our AI excelled at narrowed and repetitive requests, humans excelled at complex and particular ones.

The magic of hybrid AI is that it just works and scale at the same time!
Peter Thiel’s Palantir is another powerful demonstration of the power of hybrid AI in solving big challenges of today’s world : fraud and terrorism.

So, basically:

Human > Human  + specialized AI

At Smartly.ai ,
We are committed to empowering humans with AI assistants.
We have got awesome demos,
book yours now by dropping us an email! 😉

 

 

 

👍 Congratulations for your chatbot Mr President !

Yesterday,
the White House released a brand new chatbot. 🙂

Why?

One remarkable habit of @Potus has been reading 10 letters a day since he was elected.
This allows him to get the pace of the nation from the inside.
I did my math and if that comes to be true it’s a quite impressive number of letters:

As of today, Obama has been President for 7 years and 204 days.
(7*365 + 204) * 10 = 27,590… That’s a lot of letters!

But wait, who is writing letters anymore?
Are those letters representative of generation X, Y, Z?
They are probably more used to emails, SMS or Facebook Messenger.

Capture
So, according to the White House, 2016 is Messenger year!
With 60B daily messages and 900M users worldwide, it’s probably a safe bet.

How?

Now, let’s see how the bot experience is delivered.

Capture

The experience is focused on getting your message to Obama, getting your name and email address… and that’s it until Mr. President decides to answer you. 🙂

The purpose is simple, and the edge cases are well managed.
At the end, you get an emoji and a cool video.

Still we may regret that the bot isn’t showing any kind of intelligence.
In fact, you may have sent your message to the President 10 times faster using the contact form…

It may have been funnier if the bot had been an automated version of Obama! Some gamification around his job, or even some interactive poll on his next actions, travels, outfits…

You can try this bot by yourself here.
The operation is also described on the White House website, here.

We hope to see more bots used by political figures but they should be aware that a poorly designed bot will inevitably flop.

And if President Hollande needs a bot,
we’ll be happy to build one for him. 😉

 

 

🍏 Testing out the Siri SDK

 

Finally, it is live!
Last week at the WWDC, Apple released the new Siri SDK.

At VocalApps, we were dying to try it out and find out the pros and cons of this new Apple feature.

The following video demonstrates how we successfully created an iOS app that can be launched directly within Siri.

Although this first version is quite limited to a predefined list of usages, it still allows developers to create some interesting use cases:

  • starting audio or video calls in your VoIP app
  • sending or searching text messages in your messaging app
  • searching pictures and displaying slideshows in your photo app
  • sending or requesting a payment in your payment app
  • starting, pausing, ending a workout in your sports app
  • listing and booking rides in your ride-sharing app
  • controlling your car music if your app is CarPlay-compatible

ios-10-siri-sirikit-third-party-apps-100666361-large[1]

The SDK was designed so that Siri will listen to the user, try to understand what he means, and if all goes well, transfer the user’s request to your app.
Then you can engage a conversation, display some custom data and process the request with your own web services.

This is really nice since it will support in an out-of-the-box way all Siri languages and all the stuff Siri knows about you (where you are, who your sister is, your name…).

For instance, if you want to send money to Sarah using your VocalApps app, you just have to tell Siri:

“Hey Siri, send $10 to Sarah using VocalApps.”

Siri understands you want to send money, that the amount is $10, that the recipient bears the name “Sarah” and that the app you want to use is “VocalApps.” So it calls a sendPayment method in your app with all these arguments.

Currently, Siri is included in iPhone 4S, iPhone 5, iPhone 5C, iPhone 5S, iPhone 6, iPhone 6 Plus, iPhone 6s, iPhone 6s Plus, iPhone SE, 5th generation iPod Touch, 6th generation iPod Touch, 3rd generation iPad, 4th generation iPad, iPad Air, iPad Air 2, all iPad Minis, iPad Pro, Apple Watch, and Apple TV.

It’s gigantic, it’s the future and it’s only the beginning.

Do you have a mobile app that you would like to connect to Siri?
If so, we can definitively help you, just start chatting with us 🙂 !

 

 

 

🎵 Alexa Skill update – Blind Test

Hey Alexa fans!

Introducing the Blind Test
Blind Test is a fun game that you can play to test your musical knowledge.
The concept is simple. Listen to the song extract and find the name of the artist!

fans-8x3[1]

So, what’s in this updated version? Well…

New songs 
Hundreds of new tracks to discover!
How many will you recognize?

New features 
Blind Test calculates your score so you can challenge your friends. 🙂

So if you don’t have it yet on your Alexa,
grab it now and enjoy the music!

📈 A new monitoring tool for Alexa Skills!

Hi there,

Once we published Music Quiz, our first Alexa skill, we quickly wanted to see how it was performing. We quickly discovered that we had to put in place a logging system and then that navigating through all the data generated by an Alexa skill was a nightmare.

To get more transparency and true actionable insights,
we decided to build a tool that would allow us to:

know exactly what’s going on between our skills and our users,
⇒ find and fix bugs and
⇒ enhance the user experience 

After weeks of work, here is the dashboard we have finally built:

 

The Logs section,  which allows you to search for specific sessions.

Capture
User logs

We are also bringing out specialized analytics for conversational apps.
You can see it as “Google Analytics” for Alexa.

Capture

Capture


Awesome, but… Can I use it for my skills?
Sure! All you have to do is log to Alexa Designer and install a small tracker code in your lambda function.

Cheers
The Vocal Apps Team 

PS: If you have privacy concerns, contact us so you can have everything installed in your server.

 

 

😎 Alexa Designer: Announcing upcoming update

Good news: A big update is coming soon!

Since we launched our private beta of Alexa Designer in January, we have received tons of feedback from the Alexa developer community.

Well, we took into account your feedback.
We reviewed and improved many points in this version.

Long story short, here is what the v2 is all about:

New dialog algorithm
Improved UX & better onboarding
In depth user analytics
Automated testing

Those features will come as part as an opt-in beta in April.
Stable version is expected in May.
Feel free to test it & send us your feedback and suggestions.

Cheers,
The Vocal Apps Team