🔔Toward a fully Context-aware Conversational Agent

I was recently asked by my friend Bret Kinsella from voicebot.ai for my predictions on AI and Voice. You can find my 50 cents in the post 2017 Predictions From Voice-first Industry Leaders.

In this contribution, I mentioned the concept of speech metadata that I want to detail with you here.

As Voice App developper, when you have to deal with voice inputs coming from an Amazon Echo or a Google Home, the best you can get today is the transcription of the text pronounced by the user.

While It’s cool to finally have access to efficient speech to text engines, It’s a bit sad that in the process, so much valuable information is lost!

The reality of a conversational input is much more than just a sequence of words, It’s also about:

  • the people — is it John or Emma speaking?
  • the emotions — is Emma happy ? angry ? excited ? tired ? laughing ?
  • the environment — is she walking on a beach or stuck in a traffic jam?
  • local sounds — a door slam? a fire alarm? some birds tweeting ?.

Imagine now the possibilities, the intelligence of the conversations if we could have access to all this information: Huge!

But even we could go further.

It’s a known fact in communication that while interacting with someone, non-verbal communication is as important as verbal communication.

So why are we sticking to the verbal side of the conversation while interacting with Voice Apps ?

Speech metadata is all about the non verbal information, wich is in my opinion the immerged part of the iceberg and thus the more interesting to explore!

A good example of speech metadata is the combination of vision and voice processing in the movie Her.

With the addition of the camera, new conversations can happens, such as discussing the beauty of a sunset, the origin of an artwork or the composition of a chocolate bar!

Asteria is one of the many startups starting to offer this kind of rich interactions.

I think this is the way to go and that there would be a tremendous amount of innovative apps that will be unleashed by the availablily of the conversational metadata.

In particular, I hope from Amazon, Google & Microsoft to release some of this data in 2017 so we the developers can work on a fully context aware conversational agent.

đŸ’ȘđŸ» Are you ready to hire an AI?

Let’s face it – the super-intelligent AI takeover that many are fearing is not for today.

We may all lose against Watson at Jeopardy, and AlphaGo is the champion when it comes to Go but… those cool marketing campaigns are far from the holy grail of the so-called General AI.

According to the most authoritative voices in the space,
the singularity may probably occur at some point in the 2040s.
Until then, I can’t imagine having a smart, meaningful and pleasant conversation with Siri, Alexa or Cortana for more than 5 minutes.
When it comes to open conversations, there is no match to humans.

 Human > general AI

Things get less contrasted when we narrow the conversation to a specific topic. A specialized AI can be much better at managing user requests because it has been designed for a unique purpose.
The Turing is easier to achieve for those AIs.
To illustrate this, you can try Amy, the virtual assistant created by x.ai to perform a single task: scheduling meetings for you. She does so by email: demo here.
Amy is so good at doing this that most people think she is a real assistant.
When it comes to narrowed conversations, AI has the advantage of dealing with big data volumes while humans are more accurate. 1-1 here.

Human ~  specialized AI

How is Amy doing such a great job?
Well, as explained here, a key part of the process relies on a Supervised Machine Learning. AI trainers teach Amy how humans express time, locations, contact names… Amy then uses this knowledge to work better.
It’s a virtuous circle. 🙂

Facebook M is relying even more on humans to teach the AI how to complete tasks. “M can purchase items, get gifts delivered to your loved ones, book restaurants, make travel arrangements, appointments and way more.”

In a recent project at Smartly.ai, we tried this “hybrid AI” approach.
The results were stunning – the AI was able to manage 80% of the requests!
While the AI was successfully dealing with the simple questions,
the operator was enjoying more time to engage in a qualitative way with the customers having complex requests.
Our AI excelled at narrowed and repetitive requests, humans excelled at complex and particular ones.

The magic of hybrid AI is that it just works and scale at the same time!
Peter Thiel’s Palantir is another powerful demonstration of the power of hybrid AI in solving big challenges of today’s world : fraud and terrorism.

So, basically:

Human > Human  + specialized AI

At Smartly.ai ,
We are committed to empowering humans with AI assistants.
We have got awesome demos,
book yours now by dropping us an email! 😉