GOOGLE DUPLEX, an Artificial Intelligence System


The key to artificial intelligence has always been the representation.”

Jeff Hawkins

The stimulation of human intelligence processes particularly through computer systems is called Artificial Intelligence (AI). This can further be divided into 3 major processes:

  • Learning
  • Reasoning
  • Self-Correction

AI is commonly used in expert systems, speech recognition and Machine Vision.

Many scientists have shed their blood, sweat and tears to allow people to have a real conversation with computers, as they would with one another other. In the recent years there has been Google voice search, WavNet etc. are a few examples of the applications of Deep Neural Networks. But we are often not friendly with computerized voices which sometimes struggle to recognize simple words and commands. They often force the caller to adjust to the system rather than the system adjusting to the caller.

The answer to such creepy situations is GOOGLE DUPLEX, an Artificial Intelligence System for accomplishing real world tasks over the phone. It is an add-on to Google’s already existing voice-based digital assistant which makes phone calls spontaneously for local businesses and book appointments by speaking with workers at the other end.

The exchange of dialogues is made as natural as possible by the system which is the keynote of this feature. It helps the other person to not to inhibit his flow of words without having to adapt to a machine. Google Duplex imprints an impression of a real person with natural pauses and speech disfluencies like ”hmm” and “umm” which people generally use to gather their thoughts – Assistant doesn’t sound like a bot in those calls.

The remarkable and noteworthy feature is that the Assistant also understands the context of discussions. So when a call doesn’t go as expected, the Assistant still gets the job done. It is also useful for those people who are suffering from hearing problems or phone anxiety. Furthermore, it is an added advantage for travelers who don’t speak the local language well.

For training the system we use Real-time Supervise training. In the duplex system the operator behaves as an instructor. As per the requirement, the behavior of the system is affected when it enters a new territory. This goes on and on as long the system performs at the desired quality level, then the supervision gets suspended and then it can make calls autonomously.

DUPLEX has a Recurrent Neural Network (RNN) which is designed in such a way to meet the challenges of understanding, interacting, timing and speaking using the Tenser Flow Extended (TFX). The output of Google’s Automatic Speech Recognition (ASR) technology is used to obtain a high degree of precision. The principle of hyper parameter optimization from TFX is also used here.

We are entering an age when conversations with computers are reaching great heights of perfection. But there is still lot of scope for improvement. This technology is still a long way from being used for more than simple and inchoate, vestigial, unsophisticated and rude interactions. The capability for deeper interaction is not far behind.

This technical advancement will surely contribute in our daily interactions with computers.



B.Tech (IT) 

Manipal Univesity, Jaipur

