Vijay Karunamurthy, Field Chief Technology Officer (CTO) at Scale AI, joins OPTO Sessions to explain how his company has evolved from labelling data for autonomous driving to enabling a range of applications for companies, including OpenAI. He also discusses how artificial intelligence (AI) is evolving and why modelling using advanced data sources is the key to building trust.

Blame it on questionable ChatGPT results, but when it comes to AI’s trustworthiness, some of us are still dubious.

However, things are only going to get better, says Vijay Karunamurthy, Field CTO at pioneering start-up Scale AI, where he works at the cutting edge of generative AI and large language models (LLMs).

If anyone knows AI and where it could be heading, it’s Karunamurthy.

He was previously Director of Engineering at Apple [AAPL], where he evolved the iPhone maker’s machine learning for content discovery while maintaining client privacy. He’s known for his pioneering work early on at YouTube (throughout its acquisition by Google [GOOGL]), where he developed game-changing algorithms that match viewers with must-see personalised video content.

With a master’s degree in computer science from Stanford University and an MBA from the University of California, Berkeley, he also co-founded the defunct foodie app Nom, which lets users create and share recipes.

Scale AI partners with a range of stakeholders, from governments to financial firms and major tech players like Meta [META].

The Evolution of ‘Trustworthy’ AI

The secret weapon is simple: good data.

“We started with our founder Alexandr Wang's vision of how to build more trustworthy AI models,” Karunamurthy told OPTO Sessions this week. “And he really focused on the data part of the equation first — what sort of data makes AI models more reliable, and how can you monitor the changes in that data and how that impacts model performance over time?”
Karunamurthy cites the technology behind self-driving cars — where Scale AI’s business began — to illustrate his point. “If you want a car to reliably stop when there's a pedestrian in a crosswalk, the data you need to back that up might be pedestrians of all sorts — people walking their bikes across, all different sorts of real-world scenarios — to help that model reach that safety milestone.

“That technique, called reinforcement learning, ended up being a really critical piece of most of the AI models that people use today, whether those are LLMs, image models or multimodal encoding models.”

Reinforcement learning helps make AI models ever more nuanced: “This idea is you can train a model, not just by directly telling the model, ‘Hey, go here and do this’. You actually train the teacher that can train the model, which oftentimes is another model just as capable as the model itself.”

Also important for this process are ‘embodied agents’. “An embodied agent has access to a car or a robot and can navigate what kind of data helps that agent understand the world around it,” says Karunamurthy.

For self-driving cars: “We pioneered techniques to help identify whether models need more data about the snow and how vehicles look differently in snow, or whether it needs more data about open highways versus highways that might be weaving through city traffic — all sorts of interesting scenarios that could help these models improve.”

Building the AI Data Pipeline for Financial Services

In just eight years since its inception, Scale AI has quickly evolved from labelling data for autonomous driving to enabling a range of applications — its mega clients include OpenAI, Microsoft [MSFT] and Meta.

Much of Scale’s work involves developing AI models, but it also adapts the technology for industry use cases ranging from financial services to healthcare and education.

Scale AI’s data-centric approach is especially important in financial services like insurance, says Karunamurthy. “Companies have realised a lot of the value you have is all of this data you keep internally within your organisation.”

“That could be information about your customers and how you want to talk to them about what they should be doing — when they’re saving for retirement or thinking about wealth planning, for example.”

Once an AI data pipeline is established, companies can use this to drive operations. “We’re leveraging a lot of this data that’s inherent in these enterprises in a very private, secure way to make these models answer questions that would have been difficult or time-consuming for human operators to do.”

“Companies have realised a lot of the value you have is all of this data you keep internally within your organisation.”

It sounds complex, but ultimately, the goal is simple: “to have AI models give more accurate, actionable advice back to humans”.

In financial services, a testing evaluation might be: “Can you explain to a 65-year-old why they should be concerned about rising interest rates?

“We might ask that same question not just as a 65-year-old, but a 40- or 20-year-old… not just in the US, but in the UK or Canada. Or not just to someone who has a lot of holdings in US dollars or British pounds, but in euros and other currencies, and see whether the model changes its answer.”

The AI Bucket List

When it comes to Scale AI’s own customer base, Karunamurthy describes three broad buckets.

First are the model providers. “It starts with the really large research labs. We had an amazing partnership with OpenAI leading up to ChatGPT.”

Reinforcement learning models came about from that partnership and “have ended up being critical for this industry”. Another “huge field” is ‘alignment’: “an industry term that means if you ask a question of the model… it understands what actions it should take.”

Secondly, Scale is “helping customers start with testing and evaluation of the AI systems” to overcome any bottlenecks blocking their rollout.

The third bucket of customers is the public sector — after all, says Karunamurthy, it’s probably “the most important technology any of us will deal with in our lifetimes”.

“We spend a lot of time finding ways to play AI responsibly, to try to get ahead in terms of how the public sector can think about this technology and to help citizens understand it. [...] They have really important considerations around safety, given public sector data is often really sensitive. So we spend a lot of time with public sector clients on the safety guardrails that need to be in place.”

Where do humans stand in all this? Well, says Karunamurthy, they can help train LLMs to do a better job. For example, a model might be overexplaining the concept of photosynthesis to biology students.

“The model may start off giving you a pretty decent answer. But you realise halfway through, the model got off track, it started explaining all these other concepts that a high schooler doesn’t need to know. [...] You can use biology PhDs, people with domain expertise, to help you understand these subjects. We incorporate a lot of that finding into how we make models better.”

“We spend a lot of time finding ways to responsibly play AI, to try to get ahead in terms of how the public sector can think about this technology and help citizens understand it.”

Human or non-human, one thing is clear: we are only at the beginning of AI and its investment opportunities. Whatever happens next, Scale AI is not just looking ahead of the curve — it is modelling it.

Continue reading for FREE

Includes free newsletter updates, unsubscribe anytime. Privacy policy

Scale AI’s CTO on Building ‘Trustworthy’ AI from a Data-Centric Engine

The Evolution of ‘Trustworthy’ AI

Building the AI Data Pipeline for Financial Services

The AI Bucket List