Incept AI
Why We Invested

Incept AI

Voice AI uses artificial intelligence to understand and respond to human speech, powering tools like virtual assistants, automated customer service, and voice order-taking systems. Despite its rise during the pandemic, voice AI has struggled to meet its full potential, particularly in restaurant order-taking. Incept AI is changing that with its pioneering audio neural networks and innovative approach.

Rally Ventures
April 30, 2025

Why We Invested: Incept AI

Tackling theLast Mile Problem in Voice AI

Voice AI refers to artificial intelligence technologies that enable machines to understand, process, and respond to human speech. It powers applications like virtual assistants (Siri, Alexa), automated customer service, and voice order-taking systems. Despite its rise during the pandemic, voice AI has struggled to meet its full potential, particularly in restaurant order-taking where people seek to optimize labor productivity while maintaining a great guest experience.

Incept AI is changing that with its pioneering audio neural networks and innovative approach, and we are thrilled to lead its $3 million pre-seed round. Below is a short Q&A with Incept AI Co-Founder and CEO Umut Isik and Rally Venture Partner and Incept AI Board Member Ben Fried. Welcome to the portfolio, Incept AI!


1. What is Incept AI and what core problem are you solving?

Umut Isik, Incept AI Co-Founder and CEO: Incept AI is tackling the “last mile” problem in voice AI—making it work in real-world, noisy environments like crowded spaces and outdoor settings. While voice AI has the potential to automate and enhance customer service, existing systems struggle with challenges like background noise, acoustic echo, and interfering speech. Foundation models have advanced speech recognition significantly, but these audio issues remain a major hurdle. Incept AI is solving this problem, starting with drive-thru and phone ordering for restaurants.

Ben Fried, Rally Venture Partner and Incept AI Board Member: I led teams responsible for key components of Google Meet, and spent years tackling complex audio challenges in conference rooms with varying acoustics. What seemed like a simple problem of ensuring clear communication required a multi-pronged approach and deep expertise. A key contributor to the solution had spent years working on something like submarine detection in the Swedish Navy, which highlights the level of specialized skill needed.

When I saw similar challenges in other contexts, particularly in drive-thru and phone ordering, it resonated. Post-COVID, the quick-service restaurant industry in the U.S. has shifted heavily toward delivery and drive-thru, creating a massive and growing market. Umut and his co-founder Justin Foster stood out as uniquely qualified to tackle this overlooked problem.

2. What differentiates your business from other solutions on the market? And why have other solutions fallen short?

Umut: We’re in the third generation of this technology, which is roughly broken into three components: audio processing, speech recognition, and what was previously called intent modeling, now known as language modeling or action-oriented LLMs. Intent modeling refers to interpreting not just what is being said, but why it is being said (essentially determining the speaker’s intent).

In the past, intent modeling was so difficult that even perfect audio processing wouldn’t have made much difference. The dominant approach relied on tree-and-branch structures. This is a rule-based system that functions like a decision tree, where each user input follows a predefined path based on rigid logic and branching rules. However, this method led to slow and clunky interactions, and failed to achieve more than 75% non-human-intervention rates.

The biggest shift came with the rise of ChatGPT and large language models. Unlike tree-and-branch systems, LLMs can make decisions and infer multiple simultaneous intents using the entire conversational context as they generate responses and actions—making interactions far more fluid and adaptable. As we build on LLMs to successfully handle complex food-ordering interactions, we focus on audio processing as the key last-mile problem to offer reliability and accuracy that will make this technology widely used.

Ben: This technology is particularly relevant for drive-thrus. On the phone, audio quality is generally stable, aside from occasional network issues. But in a drive-thru, there are numerous challenges: background noise, microphone distance, multiple voices, and even feedback from the speaker. All of these factors make it difficult to capture a clean audio signal of the order, which is crucial for integrating emerging technologies effectively.

3. Tell me about your customers. Why does this technology matter for them and how does it impact their business?

Umut: Our restaurant and hospitality customers are navigating labor shortages, rising costs, and increasing pressure to maintain high-quality service while keeping expenses low. We help address these challenges by reducing labor demands overall and enabling businesses to reallocate staff to higher-skilled roles, ultimately improving efficiency and profitability.

AI-powered voice technology enhances the drive-thru experience. Only about 70% of drive-thru interactions are rated as friendly. AI can elevate customer experience by ensuring consistently pleasant interactions, better listening, and more effective suggestive selling—similar to an attentive waiter who makes suggestions based on what they’ve learned about you and greatly enhances your dining experience. This improves customer satisfaction and boosts revenue.

Order accuracy is another key benefit. Currently, even the best chains only achieve 85–93% accuracy, which is frustrating for customers and wasteful for restaurants. We believe modern machine learning can greatly improve order accuracy, reducing errors and ensuring customers receive exactly what they ordered.

4. What’s the hardest part of starting and growing a company?

Umut: There are countless challenges in entrepreneurship. I think the most telling phrase in entrepreneurship is, “How hard can it be?” Well, it’s always harder than expected, but I’m loving every minute of it. Working with my co-founder Justin, our team, and getting the backing of Rally, we’re starting to make real progress, which is so exciting.

Starting out in my career, I was a pure mathematician and all I cared about was doing elegant science. Then I moved into deep learning, applying science to real-world problems. At Amazon, I refined this further—ensuring my work was not only elegant and real-world applicable, but also so useful that you could reach it by working backwards from the customer.

The biggest leap in becoming an entrepreneur was realizing that, in entrepreneurship, elegant and useful applied science isn’t enough. It must also be able to stand on its own in the market and lead to a long-term business with pricing power. Making that shift was a huge unlock for me.

5. Ben, what makes Umut and the team uniquely suited to build this product?

Ben: When Tom Peterson (Rally Venture Partner) and I discussed Incept AI, we both saw it as the quintessential tech startup dream. Umut is a true scientist and inventor—someone who is incredibly skilled in a highly specialized, critical area of science and technology—who has demonstrated an enormous ability to continuously learn and stay at the cutting edge. Umut has years of expertise in building specialized deep learning models for audio processing, and he recognized that his expertise could solve a specific business problem.

Then there’s Justin, a highly regarded leader with deep connections and expertise in quick-service restaurant technology and voice technology. Together, they form an ideal team, both deeply committed to tackling a major problem in a market with vast potential.

Startups succeed when technology and business expertise come together, but it’s rare to see that convergence as purely as in Justin and Umut. Time and again, we’ve seen proof that their skills and experience are real and it’s already making a significant impact at Incept AI.

Building a company is always hard. It is said that “0 to 1 is impossible, 1 to 10 is improbable, 10 to 100 is inevitable.” Right now, we’re in the so-called impossible stage, but these are exactly the people you bet on to break through it.

You might also like

Subscribe

Get the latest news and actionable advice.

Next /