The Hidden Complexities of Enterprise Voice

Apr 16, 2026
00:43:54

Loading video...

Show Notes

In this episode, host Rishi Ahluwalia interviews Ankur Edkie, CEO of Murf AI, about the hidden complexities of scaling enterprise voice AI. Ankur discusses the journey from pre-ChatGPT text-to-speech to building hyper-efficient, human-like voice systems. He unpacks why voice demos feel magical, but production deployments often fail, highlighting the critical gap between lab settings and the variability of real-world acoustics and devices. The conversation covers the challenges of the cascading stack (ASR, LLM, TTS), the importance of turn-taking and latency consistency over raw speed, and Murf’s "compute acquisition" approach with the Falcon architecture that drives down costs and enables global data residency. Ankur concludes with a hard lesson for builders: enterprises are buying holistic outcomes and trust, not just isolated API calls.

Key Topics Covered

  • Voice AI works in labs but fails in the wild.
  • Cascading stacks break without shared audio context.
  • Consistent latency matters more than pure speed.
  • Consistent latency matters more than pure speed.
  • Enterprises value trust and professionalism over hyper-realism.
  • Sell end-to-end outcomes, not isolated API calls.

Episode Chapters & Transcript

0:00:00

From Early TTS to the Turing Test

Ankur shares Murf's pre-ChatGPT origins, the early limitations of IVR-era speech, and the team's push to make AI voice pass real human quality bars.

0:05:40

Why Voice Demos Break in Production

The conversation explores the gap between lab success and enterprise reality, including fragmented ASR-LLM-TTS stacks and poor cross-layer context sharing.

0:10:39

Latency, Jitter, and Turn-Taking Constraints

Ankur explains why consistency beats raw speed, how jitter affects call quality, and why turn-taking remains understudied despite strict timing budgets.

0:19:41

Compute Acquisition and the Falcon Advantage

Murf's Falcon architecture is unpacked through compute acquisition, cost efficiency, concurrency on commodity GPUs, and global data residency implications.

0:29:00

Enterprise Priorities Beyond Voice Quality

The episode highlights trust, reliability, and end-to-end outcome testing, while challenging the assumption that the most hyper-realistic voice is always best.

0:36:44

What Builders Should Learn from Enterprise AI

Ankur closes with future predictions and a practical enterprise lesson: customers buy holistic outcomes and long-term trust, not isolated API features.

Click on any chapter to view its transcript content • Download full transcript

Convo AI Newsletter

Subscribe to stay up to date on what's happening in conversational and voice AI.

Loading form...
✓ Conversational AI news✓ No spam, ever✓ Unsubscribe anytime

Tags

#enterprise voice ai#murf ai#ankur edkie#rishi ahluwalia#voice ai#text-to-speech#tts#speech-to-text#asr#llm#cascading stack#turn-taking#latency#compute acquisition#falcon architecture#data residency#enterprise ai#conversational ai#agora