I had thought of titling this article GUI vs VUI but then I realized that using these terms would leave out many people who don't know their meaning and perhaps wouldn't even bother to read the article.
First of all, let me explain each concept. GUI is the abbreviation for graphical user interface, while VUI ("voice user interface") is a conversational interface, that is, a communication between a human and a machine in which the medium of interaction is not a screen but voice.
The beginnings of the graphical interface date back to the late 70s, so comparing this with the conversational interface we could say that the latter is still in its infancy.
To make this concept clearer, the goal of using a voice user interface is to allow the user to interact with the system or machine simply by using their voice. Instead of scrolling on a screen or touching a keyboard, the user gives voice commands, so they don't need to use their hands at all.
During the research phase, designing a voice user interface is similar to designing a graphical user interface (GUI) because you have to consider who will use the interface, what they will use it for, and on what device. The goal remains the same: to communicate the necessary information to the user in the most effective way possible.
In the conversational interface, the user does not navigate through the different layers to find the option they are looking for, but asks for something specific, providing all the necessary information at once.
This is a big difference from the graphical interface, since here we do not have several clicks or interactions before we achieve our goal, but we shorten the navigation process with a simple voice command aimed at getting specific information that will be returned by the wizard.
“Google Home sitting on table” by NDB Photos is licensed under CC BY-SA 4.0.
Let me give you an example of a user asking her Google Home assistant: “Okay, Google, what time is my flight?”. Each assistant activates its listening with a specific voice command, in this case, it is by saying: "Okay, Google". The device will then answer: “The next flight, Wizz Air's 4402 from Madrid to Sofia, leaves today at 9:55 p.m.”
The only learning curve here is that, as a user, you need to know what voice command you have to say for each action you want your assistant to perform.
Every voice assistant has artificial intelligence within them. If the words you use are not those that the artificial intelligence has, so to speak, been built in with, then it will tell you that it cannot help you and that you should try another voice command. It may even offer you recommendations.
Do you know what artificial intelligence or AI is and where it comes from? Here's a little background history.
Although research into artificial intelligence has been going on since the 1950s, it wasn't until 2007 that it advanced enough to become what we know today as voice assistants.
In 1936, Alan Turing designed a machine capable of implementing any calculation that had been formally defined, the essential cornerstone for a device to be able to accommodate different scenarios and "reasoning". Does the Turing Test ring a bell? It is a test that determines whether a machine is intelligent or not and defends the possibility of emulating human thought through computing.
We say that an AI has passed the test if the users involved think that on the other side there is a person instead of a machine.
The 1970s gave way to what became known as the winter of artificial intelligence. This occurred after the mathematician James Lighthill presented a report stating his view that machines would only be capable of an "experienced amateur" level of chess and that common sense reasoning and supposedly simple tasks such as facial recognition would always remain beyond their capacity.
This report together with the few advances in the field of AI caused the funding for the industry to be drastically reduced. But in the 1980s, researchers decided that instead of trying to create artificial intelligence by simulating that of humans, it would be better to create "expert systems" that focused on much more limited tasks.
That meant they only needed to be programmed with the rules of a very particular problem. And voilà, this is when the first steps in the advancement of artificial intelligence began, and why today it's possible for you to talk to your smartphone, your car, or an Alexa or Google Home device.
And while speech recognition seemed simple given that it was one of AI's key objectives, decades of investment had never managed to raise the accuracy level in understanding speech recognition to above 80 %.
In 1997, the Deep Blue supercomputer, created by IBM, beat world chess champion, Garry Kasparov. There goes the conclusion of Lighthill's report in the 1970s that this could never happen. It should be noted that this was the second time it played against Kasparov, since the first time Deep Blue lost and had to be upgraded.
“Garry Kasparov à Linares en 2005” by Owen Williams, The Kasparov Agency, is licensed under CC BY-SA 3.0.
Another milestone for IBM happened in 2011 when it used its artificial intelligence called "Watson" to win a TV contest against two of its most experienced players. The contest featured questions about culture and general knowledge. The fact is that Watson not only understood the questions and answers provided during the show but also was able to make intelligent moves when weighing the choice of categories.
Since then, IBM Watson has become the standard for cognitive systems, natural language processing as well as automatic reasoning and learning. This technology is currently being used to assist in cancer treatments, e-commerce, the fight against cybercrime and international banking.
I have to add that I am particularly fond of this artificial intelligence because I was able to work with it during my project at WatsomApp, in which we used this AI to make a robot help detect bullying in schools. I also presented this project during a Women in Voice event at the Google Campus in Madrid.
Google also started researching AI and ended up pioneering a new approach: It connected thousands of powerful computers, running parallel neural networks, learning to detect patterns in the large volumes of data transmitted by the many users of Google. At first, it was quite inaccurate, but after years of learning and improvement, Google today claims that its speech recognition is 92 % accurate.
But it wasn't until May 2016 that it launched Google Assistant as part of the Google Allo messaging application, and in its Google Home assistant device. The Siri voice assistant was released a few years earlier as an iOS application in 2010 and later integrated into the iPhone 4S from its launch in October 2011.
Another well-known voice assistant is Amazon Alexa, who arrived in Spain in the summer of 2018 although it had already been working in the United States for a few years. There are many others like Cortana from Microsoft or Aura from Telefónica.
How does this new technology fit into the world of user experience design? To design voice interfaces, companies are asking for UX design experts who specialize in conversation design. This new profile called "conversation designer" has just arrived on the Spanish job market and we are already seeing some job offers, although still very few.
This profile not only requires knowledge of UX design but also of how to design dialogue between a human and a machine, taking into account all possible voice interactions between the assistant and the user. The logic that works for a graphical interface will almost never work for a conversational interface so designers need to learn this new way of interacting with a voice assistant.
Do you see yourself designing voice interfaces in the future? I still feel like a student of the subject today, and I find it fascinating. If you live in Seattle, Madrid, London, or Mexico City and want to learn more about the subject, I recommend you to go to the talks on designing conversational interfaces by Women in Voice.
At their event in December 2019 in Madrid, I explained my experience designing a conversational interface with IBM's artificial intelligence in the WatsomApp project. A very pleasant and necessary project, which aims to help schools detect bullying by using a robot that communicates with the students.
I would like to end this article by saying that the best possible interface in human-machine communication is voice, as it is our natural means of communication. This is one of the reasons why the voice interface is also called the invisible interface. And what better interface than the one that allows us to communicate in the most natural way possible.
Can you imagine a future where all the interaction you have with your digital devices is by voice? I have no doubt that this technology will advance more and more, and within one or two decades it will be very common to talk to machines instead of interacting with them through a graphical user interface.
What do you think about the future of conversation design and VUIs? If you have any doubt just leave a comment below.
Originally published in Spanish in “Píldoras UX”.