Introduction to Conversational AI with Agora's Ben Weekes

Jul 11, 2025
00:31:08

Loading video...

Show Notes

Agora's Ben Weekes joins to discuss the world of voice-first conversational AI. Hermes and Ben delve into the differences between voice and chat-based systems, explore the real-world applications of conversational AI, and break down the technology stack involved in creating effective voice agents. The conversation also touches on virtual avatars, infrastructure challenges, and the various conversational AI frameworks available for developers.

Key Topics Covered

  • Voice-first conversational AI vs chat-based systems
  • Real-world applications of conversational AI
  • Technology stack for creating effective voice agents
  • Virtual avatars and infrastructure challenges
  • Conversational AI frameworks for developers

Episode Chapters & Transcript

00:00

Introduction to Conversational AI

Opening remarks by Hermes and Ben on the goals of the podcast and the focus on voice-first AI.

01:22

Defining Voice-First Conversational AI

Ben explains the difference between chat-based, voice-first, and multimodal AI, emphasizing the importance of uninterrupted audio.

02:37

Real-World Applications of Voice AI

Examples of voice agents in restaurant bookings and how voice AI is improving call center experiences.

03:59

Understanding the Voice Pipeline

A technical breakdown of cascading vs. real-time pipelines and how audio is processed through STT, LLMs, and TTS.

09:25

The Role of Virtual Avatars

Exploring the visual layer of conversational AI and its emotional impact on user engagement.

10:17

Infrastructure Challenges in Voice AI

Discussing the challenges of latency, packet loss, and the need for reliable RTC networks like Agora’s SD-RTN.

14:29

Frameworks for Conversational AI

Overview of LiveKit, PipeCat, and TEN—comparing tradeoffs between Python and C-based implementations.

16:59

Cascading vs. Real-Time Models

Comparison of model flexibility, SDK support, performance, and transparency between cascading and voice-to-voice approaches.

19:22

Function Calling and Tools in AI

How function calling, tools, and structured metadata enable richer interactions with avatars and multimodal agents.

21:40

Cost Implications of Voice Models

Analyzing pricing dynamics and memory considerations when choosing between voice-to-voice and cascading architectures.

24:08

Demo of Multimodal Interaction

A live demo featuring a sales avatar in a live shopping context, showcasing text and voice interruption handling.

26:51

Conclusion and Future Outlook

Final thoughts from Hermes and Ben on the future of voice AI, and an invitation to join future episodes.

Click on any chapter to view its transcript content • Download full transcript

Convo AI Newsletter

Subscribe to stay up to date on what's happening in conversational and voice AI.

Loading form...
✓ Conversational AI news✓ No spam, ever✓ Unsubscribe anytime

Tags

#conversational ai#voice ai#voice to voice ai#agora#ben weekes#voice agents#virtual avatars#real-time streaming#speech recognition