SoundHound CFO: The Future of Human-Computer Interaction

Nitesh Sharan, CFO of voice AI company SoundHound, explains to OPTO Sessions why he views AI as being currently in its ‘adolescent’ stage, and why it is more likely to augment human labor than to displace it altogether.

Keyboard Obsolescence

Whenever artificial intelligence (AI) is discussed, displacement is on the agenda. As long ago as 2020, the World Economic Forum anticipated 85 million jobs would be displaced by automation over the next five years — and this was long before generative AI entered the chat.

Companies like SoundHound [SOUN] broaden the scope of what AI can potentially displace.

Take the keyboard: SoundHound’s voice-enabled AI software, and its ambition to increase the role of spoken conversation in human-machine interactions, threatens a piece of hardware that has become a staple over the last century.

However, as SoundHound’s Chief Financial Officer Nitesh Sharan observes, it is not quite as simple as that. The QWERTY keyboard itself is a symbol of the stickiness of certain behaviors. Originally designed so as to prevent the jamming of commonly used letters typed in quick succession on typewriters, the design persists even though the mechanical need for it has long since passed.

QWERTY keyboards have survived through innovations like the computer mouse, the graphical user interface (GUI) and touch screens.

“These just build and it creates new waves,” Sharan tells OPTO Sessions. “It creates more efficient ways of engaging.”

AI’s Adolescence

Questions of how humans communicate with computers are at the heart of SoundHound’s philosophy.

“As an end state, we want voice to be increasingly involved in all ways we interact with technology,” says Sharan.

He compares the development of machine communication to the development of a child.

“When, a decade ago, you got your Alexa in your kitchen, you were blown away. It was like a toddler: it speaks, wonderful! But you found quickly that it was limited in its utility.”

In this analogy, modern generative AI tools like ChatGPT are akin to teenagers.

“You can engage in wholly different levels of conversation — but it does hallucinate. You’re amazed, thinking ‘where did you learn this? You’re learning beautiful things in school, and yet you’re making stuff up sometimes.’

“We are going to continue to evolve this technology to greater and greater maturity, into adulthood.”

However, Sharan believes that this future development will take place “on top of the GUI, and on top of the keyboard”.

Human-Computer Coalescence

Sharan believes that democratization and personalization will define the future of AI.

“I think there is a common shared responsibility, because AI has the real promise of being great, and the risk of causing trouble.”

Its various stakeholders — from corporations, like SoundHound, that are building the technology, to regulatory bodies, academia and the consumer base — must ensure that its future plays out responsibly.

If that happens, Sharan believes that AI has the potential to reverse a global trend of increasingly concentrated wealth, knowledge and access. From personal tutors through to vocational training, he believes that AI can help people learn and increase access to information, and that it can bring vital services like finance and healthcare to historically underserved areas.

It will, however, naturally lead to some displacement of human work.

“There will be some automation of human-based activities,” says Sharan. “That’s happening right now.” However, like the hardware stack that has built up around the keyboard, much future development will be based on using automation and AI to assist human workers.

Data from the tools that SoundHound has already put into practice shows that, even in its adolescence, AI is capable of improving on the work that humans can do alone.

Take the results of its voice-enabled automatic ordering system in White Castle’s drive-thru restaurants.

“We saw some data that showed that post-pandemic, human accuracy is actually only about 85%. You might order a cheeseburger, ‘hold the pickles’, but you still get the pickles.”

White Castle, however, found that order accuracy rose to 90% after just months of working with SoundHound.

Continue reading for FREE

Latest articles