Nitesh Sharan is Chief Financial Officer at SoundHound [SOUN], a conversational and voice artificial intelligence (AI) company.

“We believe that natural conversations are the easiest way to interact,” explains Sharan. “Think about our human history of conversing with one another for hundreds of thousands of years. Conversation has been the primary modality.”

However, over the last century the rise of computer technology has shifted the focus of communication to the QWERTY keyboard, the mouse and the graphical user interface.

“We really believe that voice will be the next horizon of how humans will increasingly interact with technology.”

Prior to joining SoundHound, Sharan was CFO at Nike [NKE] and has previously worked at Hewlett Packard Enterprise [HPE] and Accenture [ACN].

SoundHound’s Three Pillars

SoundHound’s product suite can be categorized into three key pillars, which between them reflect the company’s view on how use cases for voice AI will develop.

The first pillar consists of software products. These bring voice AI capabilities into other products, especially cars.

“We are the voice power engine when a driver or passenger interacts with their car and says, ‘Turn up the AC, close the windows, what’s the weather, what’s traffic like?’”

SoundHound’s software also voice powers products such as TVs, and other IoT devices.

The second pillar consists of voice-enabled services.

“Think of this as customer service,” says Sharan. Food ordering is one example: at drive-thru restaurants, for example, SoundHound is enabling food to be ordered directly by voice without requiring a human to take the order.

Commercially, pillar two operates as a subscription-based model. “A customer will sign up for our service, and on a monthly basis, they’ll pay us for the ability to take orders.”

The third pillar constitutes the company’s long-term vision, and is a combination of voice-enabled products and services.

“Imagine you’re driving into work and you want coffee. It’s early. You’re tired, it’s dark outside, it’s rainy, but you can just seamlessly talk to your car and say ‘Hey, Kia [KIMTF], I’d like coffee’.

“Your Kia knows exactly where you’re going. It knows there’s a Starbucks [SBUX] one mile down the road, a five-minute detour. It orders your cappuccino for pickup.”

The advantages of the third pillar run deeper than simple convenience. It allows for what Sharan calls “a new wave of interaction for discovery and transaction”.

For example, there are lead generation opportunities: the coffee shop in Sharan’s example might be ready to pay for the additional customer.

Breaking the Chain

Up until last year, the first pillar — voice-enabled products — accounted for approximately 90% of SoundHound’s business.

This year, however, Sharan says that the majority of growth is appearing in the second pillar, voice-enabled services.

“In the near term, we think customer service is going to be the greatest growth engine,” he says. In particular, he predicts that SoundHound’s voice-enabled services will displace labyrinthine customer service call systems. “The traditional legacy systems, where it says ‘Press one to go here, press two to go here’, you have to listen to a menu for a while.

“That frustrates a lot of people. They want to scream ‘Operator, get me out of this chain!’ That’s going to be displaced, and we think that is happening now and will happen at a pretty rapid, increasing pace.”

Over the medium term, he adds, all three pillars are expected to be growth levers, and in five years’ time it is possible the three could contribute roughly equally towards revenue.

Over the same period, Sharan also anticipates geographical expansion.

“I think a lot of the outpaced growth will happen in the US,” he says, but adds that many of its current partners offer opportunities to expand into new geographies organically and that, in the near term, Europe is a likely market for expansion.

“Our ultimate vision is that we want to bring conversation and voice in how humans interact with technology. Well, we’re not excluding anybody then, right? That’s a global opportunity for us.”

Talking Business

“If you extend this to voice interactions, voice commerce, advertising opportunities and all that we put out, we think it’s a $140bn market,” says Sharan.

It is self-evident to Sharan that voice AI carries this kind of potential; that in effect, it will correct a recent aberration that has resulted from the rise of computer technology.

“Over the past 10 years, we’ve learned to put our neck down and use our thumbs to type really fast,” he says. “Well, kudos to us for learning quickly. But we don’t have to learn how to talk. Once the technology catches up, which it absolutely is doing now, the use cases are going to grow.”

As such, Sharan believes that the total addressable market for voice AI products is “massive”.

SoundHound is currently focusing on nurturing this market while capturing enough of it to become self-sufficient.

“We’ve been in investment mode, and you can see it in our GAAP financials. We’ve been losing money, and we needed capital to support us.”

However, he adds, “we are moving to a break-even profile. And I think with the major growth opportunities and the return on capital we’re seeing, investing in growth is really important, but I also have messaged that we want to get to break-even.”

Next year is the target date for achieving break-even, and Sharan believes that this will reduce the need for fundraising in future.

Continue reading for FREE

Includes free newsletter updates, unsubscribe anytime. Privacy policy

Talking Business: SoundHound’s CFO on the Three Pillars of Voice AI

SoundHound’s Three Pillars

Breaking the Chain

Talking Business