Convo AI World
A podcast from Agora
Explore voice-first conversational AI through honest conversations with practitioners. Hear from AI builders, infra engineers, product strategists, and more for the latest insights on what it takes to build best-in-class conversational AI experiences.
Listen on Your Favorite Platform
Available on all major podcast platforms
Real-Time Avatars, Translation, and Visual Storytelling with Akool's Jeff Lu
In this episode of the Convo AI World podcast, Hermes Frangoudis interviews Jeff Lu from Akool, a company revolutionizing video generation technology. They discuss Akool's origin, its innovative approach to visual storytelling, and the various use cases of its technology in marketing, internal communications, and more. Jeff shares insights on the challenges of balancing quality and cost in video generation, the importance of real-time inference, and advancements in video translation. The conversation also touches on Akool's strategy for staying ahead in the rapidly evolving generative AI landscape and the future of creativity in content creation.
AI at the Edge: 6G, Arabic LLMs & the Middle East’s AI Leap with Mérouane Debbah
In this episode of Convo AI World Podcast, we dive deep into the future of AI, telecom, and the evolving role of conversational interfaces with Prof. Merouane Debbah, Founding Director of the Khalifa University 6G Research Center and one of the leading minds behind the Arab world’s first large language models — Noor and Falcon.
The Voice AI and VR Revolution in Heavy Machinery with Carbon Origins' Amogha
In this episode of the Convo AI World Podcast, Hermes Frangoudis interviews Amogha Srirangarajan, Co-founder and CEO of Carbon Origins. They discuss the evolution of Carbon Origins from last-mile delivery robots to heavy machinery teleoperation, the integration of voice AI and VR in enhancing operator experiences, and the future of robotics in construction and space mining. Amogha shares insights on the challenges of labor shortages in critical industries and how Carbon Origins aims to address these through innovative technology and partnerships. The conversation also touches on ambitious plans for energy solutions and space exploration, highlighting the potential of robotics in shaping the future of human civilization.
Open-Source Voice Activity Detection with TEN Framework's Ziyi Lin
Ziyi Lin, speech engineer on the TEN Framework team, joins the Convo AI World podcast to explore the design and impact of a new open-source Voice Activity Detection (VAD) model. The episode explores the challenges faced with existing VAD solutions, the importance of high-quality training data, and the design choices that led to improved performance metrics. Ziyi explains how VAD functions as a critical component in conversational AI, managing real-time processing and latency, and the advantages of deploying it on edge devices.
Building AI Community with Voice AI Space
Thibault Mardinli (T-Bot) from Voice AI Space joins to discuss the evolution of Voice AI communities and ecosystems. Hermes and Thibault explore Thibault's journey from building a Voice AI startup to creating an open resource platform, the challenges of discoverability in the fragmented Voice AI landscape, and the democratization of AI expertise through visual interfaces. The conversation covers the spectrum of Voice AI companies from infrastructure to UX-focused products, adoption in emerging markets, privacy considerations, and the future of voice-first interfaces. Thibault shares insights on building global communities, curating quality resources, and the grassroots movement powering Voice AI innovation.
The Science Behind AI Speech Recognition with Deepgram's Andrew Seagraves
Deepgram's VP of Research Andrew Seagraves joins to explore the science and engineering behind modern speech recognition systems. Hermes and Andrew dive deep into why speech recognition isn't a solved problem, the two-stage training process of speech-to-text models, and the challenges of balancing real-time latency with accuracy. The conversation covers Deepgram's origins from dark matter research, power laws in speech data, buffer-based architectures for real-time transcription, and frontier challenges like multilingual code-switching, emotion detection, and conversational dynamics. Andrew shares insights on model deployment, customer use cases from NASA to food ordering, and the future of self-adapting speech models.
AI Content Moderation with Google's Ninny Wan
Google's Ninny Wan, Product Lead for AI Content Safety, joins to discuss the evolution of AI content moderation in the age of GenAI. The conversation covers Google's approach to semantic understanding, multilingual moderation across 140+ languages, synthetic data generation for training, and the balance between user freedom and safety. Ninny shares insights on transformer models, human-in-the-loop processes, cross-functional safety reviews, and Google's on-device privacy-compliant features like sensitive content warnings.
Interactive Digital Avatars with Trulience's Richard Bowdler
Trulience's Head of Growth Richard Bowdler joins to discuss the world of interactive digital avatars and conversational AI. Hermes and Richard explore how Trulience creates lifelike avatars, the technology behind real-time client-side rendering, multilingual support, and real-world applications from healthcare to customer service. The conversation covers the evolution from capture cages to modern avatar creation, competitive advantages in scalability, and the democratization of AI expertise through visual interfaces.
Real-Time Translation with Palabra's Artem Kukharenko and Ivan Kuzin
In this episode, Palabra's Artem Kukharenko (Co-Founder) and Ivan Kuzin (Head of Business Development) join to discuss the Palabra real-time speech-to-speech translation technology, the inspiration behind Palabra, common misconceptions about AI translation, the balance between latency and accuracy, and the challenges of voice cloning and intonation. The conversation also covers the applications of their technology, user feedback, differentiation in a competitive market, privacy and data security, benchmarking, developer experience, and future advancements in AI and speech translation.
Introduction to Conversational AI with Agora's Ben Weekes
Agora's Ben Weekes joins to discuss the world of voice-first conversational AI. Hermes and Ben delve into the differences between voice and chat-based systems, explore the real-world applications of conversational AI, and break down the technology stack involved in creating effective voice agents. The conversation also touches on virtual avatars, infrastructure challenges, and the various conversational AI frameworks available for developers.
Convo AI Newsletter
Subscribe to stay up to date on what's happening in conversational and voice AI.