Rishi Ahluwalia (00:02) Hey everyone, this is Rishi Ahluwalia. I am the Director of Solutions and Success at Agora India. And welcome to the Convo AI World Podcast, where I have with me today, the Chief Technology Officer at Eloelo, Sagar Gaonkar. So welcome to the podcast, Sagar. Just giving a brief introduction about Sagar here. He is the tech mind driving Eloelo Group. So Eloelo Group is one of the fastest-growing live entertainment platforms and spans across multiple applications such as Eloelo, Connecto and Story TV. So today we will be diving deep into how he's scaling it, reimagining the creator economy and shaping the future of live entertainment in the era of conversational AI. So welcome again to the podcast, Sagar. Sagar Gaonkar (00:57) Thank you Rishi. ⁓ Hello everyone. Thank you Rishi for having me here. Happy to be here and looking forward for this podcast. Rishi Ahluwalia (01:06) Perfect, yeah, absolutely. We are looking forward as well. So let's dive into it, Sagar. Now let's discuss about your personal journey and your journey to conversational AI leadership. Now, Sagar, you've had a fascinating 16-plus-year journey through the video streaming ecosystem. From Ittiam systems to Disney+ Hotstar, we have ShareChat and now Eloelo as well. So how has your journey through this high-scale streaming and social platforms shaped your current approach to conversational AI. Sagar Gaonkar (01:45) Yes, it's been a wonderful journey, Rishi. I think covered the breadth and depth across industries, use cases, etc. started At Ittiam, started with grassroot-level understanding of how video works, streaming works, applying it for industrial enterprise use cases. ⁓ Then at Hotstar, working with live content, long-form content, content for small and large screens alike. More focus on security of the content, as content becomes the primary IP, etc. ⁓ Then now at Eloelo, in some sense, it all comes together, right? Nicely comes together, a collection of use cases that we have. Building mobile-first, very Bharat (India) first. Across the three use cases that we have at Eloelo, we have a primary Eloelo app, which is interactive live streaming. Then we have Connecto, which is one-on-one voice calling. And at the new entry, we have Story TV, which is micro-drama OTT application. you So you're part of the question where how does conversational AI fit into this and how has that led ⁓ improving all of these three? I think being part of the industry where you understand how at grassroot-level streaming works, what latency costs because you don't build everything in-house, you have third-party integrations, have ⁓ things that work in lab, don't work on field. So this journey has helped me at least what not to do, understand what not to do and then experimenting on the field, experimenting live on the use cases which enable improvements across Eloelo, Connecto, Story TV, etc. Rishi Ahluwalia (03:44) I think you mentioned a great point there, Sagar, what not to do. I think. And things sometimes work in the lab, but not in the field. So that's a great ⁓ point of view and great understanding with all your experience right there. So thank you so much. ⁓ I'll move to the ⁓ next one. Now, Eloelo ⁓ you know what, has reshaped the way India does live entertainment, right? And obviously, ⁓ we have been associated with you right from the beginning as well, powering the live streaming and the multiple use cases. But we have seen Eloelo grow to serve 90+ million users across eight Indian languages and with more than 150,000 plus creators in less than a year of monetization. So with all of this happening in the vicinity of things, how does conversational AI fit into the vision of becoming the digital third place for Bharat. Obviously, you've captured the first and the second, but now with the emergence of conversational AI, I think how do you envision Eloelo's approach in this powered by conversational AI? Sagar Gaonkar (05:05) Yeah, so I'll start with a vision first, ⁓ Rishi, which has been in three parts, right? is one is, of course, building for Bharat. All our use cases will address to the audience, is Bharat-first. ⁓ Then comes enabling creators to earn livelihood, helping them ⁓ make money, earn their livelihood. And then comes ⁓ enabling ⁓ users to get social, get social, help connect when in need and ⁓ enable passive content watching also. Now the approach to conversational AI has been exactly as a vision to enable this vision in some sense, right? And to expand ⁓ each of these areas, right? And how do we, when I meant what I'll double click on this, what I meant is When I say improving our vision of building for Bharat, how do I enable more languages? How do I enable cross languages, right? I mean, of course, more language will only solve in pockets, but how can I enable conversations or connections between somebody who can speak, let's say Hindi and somebody who cannot, right? That is part one. ⁓ The earlier they can onboard into platform, the earlier they can know what to do, what not to do, what works, how do you engage, how do I talk, how do I connect with people who are not so proficient in my language, et cetera? how do I bring the regionality back, etc. So conversational AI helps me with that. For me, as a platform, how do I make my platform safe for creators, right? Moderation, et cetera, right? ⁓ Understanding what is okay and what is not okay in terms of Indian languages, slangs, etc. ⁓ That helps me create a safer environment for creators so that they can be more confident of being in the platform. And as for the users, how do I enable content consumption in their language? How do I make someone available for socializing when I need it and not just when everybody else is available. Etc. And enabling both live and offline consumption, enabling my somebody who can understand what my interests are. I might not understand politics or sports or cricket, but I would want to connect to somebody who would want to understand and somebody can conversate and talk socialize in what my interests are. So conversational AI helps me enable all of this and that is our approach towards it also. Rishi Ahluwalia (08:06) Absolutely, Sagar. I think you touched upon a very complex subject in India, which is obviously changing languages and multi-cultures and being available as a creator on the social live platform as well. I Hold on to that thought. We'll dive deep into it in a later question that we have. ⁓ But thanks for sharing your thoughts on that. ⁓ The next thing I think I wanted to understand and the audience here as well. Now, conversational AI is relatively a fresh topic, and I'm sure there are multiple challenges with respect to integration and adoption that a platform such as Eloelo might face. So when it comes to the integration challenges, now, Eloelo integrates live video, audio, and chat-driven engagement. So any unique challenges that you think you have faced or might face when layering conversational AI over a real-time high-volume ecosystem such as the multiple applications that you serve. Sagar Gaonkar (09:19) Yeah, first of all, is the India spread which is a challenge. What I meant by that is there's a lot of dynamic content, the spread of languages across India. People also switch languages within a conversation, interaction, et cetera. It could be audio. It could be text. Let's say and People prefer different conversational languages across audio and chat. For example, I might speak Hindi, but when I'm typing, I might prefer English or Hindi or a mix of both Hinglish, right? And I can switch this interchangeably, start with something and with something else. ⁓ That becomes one major problem. Second comes the regionality of it. Language is one part of it. Now Hindi spread across multiple states. So let's say, somebody, a Hindi-speaking guy from Punjab could be different from somebody from MP. So the local slangs what we used, ⁓ the regionality, tonality, ⁓ the nouns and pronouns that we use for addressing things, again, changes within languages and very dependent on regionality of the creator or the user and Creators and users have mix of this also. It's not just, okay, they'll stick to one regionality. They have they'll have They would have been brought up in multiple regions and that will influence how they interact. This is as difficult as it is for us to consume, AI will find it even more difficult to consume and conversate using it. So that becomes a very, very difficult problem. Next comes the user persona. Right. I mean, what I meant by that is no two creators are alike. And like you mentioned, we have a 150k+ creator platform. Now imagine the complexity, the variations in persona across these creators. Of course, would, Of course you can pocketize them, etc. but the number of pockets will still be huge. You cannot just say, these 150K creators are all the same. There is a persona, which Sticking to a persona becomes very important, etc. For any use case, let's say translate, could be any use case. So that is there. The other challenge also is, challenge or maybe a good thing also is that our audience is very much still AI-unaware audience. I mean speaking of tier-two, tier-three cities. So ⁓ good or bad, but that is a consideration we need to keep into mind. ⁓ Then also comes this thinking, because it's a live interaction, it is a very social interaction. You need to be up to date on current affairs. Right? What it means is Let's say there has been some mishap or some festival happening today, tomorrow, etc. Then it has to be aware. I mean, if, let's say, you're using any conversation, it can't be based on just past learnings. It has to somehow been able to connect what is happening today, what happened today morning, what happened yesterday, very regionality, cultural awareness is very important, current affairs awareness is very important. So these are very, I would say, the India landscape challenges and of course, the second level of very technical challenges, come into picture which is of course ⁓ because it is ⁓ a, how do I put it, it's the other challenge that comes into picture is the cost factor. Then the technical challenges with respect to latency because it is live. can't Let's say if I am Interacting with you in some other language, I can't wait 10 seconds and say, "okay, let me understand that translation and then get back". Absolutely. So So giving it a live interaction feel becomes very important, the latency becomes very important, the cost effectiveness very becomes very important Another challenge is that Rishi Ahluwalia (13:16) Absolutely. Sagar Gaonkar (13:29) This is a very evolving field. And it's not like it's done and dusted. Like "Okay, A, B, C, D, do this and you're sorted". Very evolving field. So things, New things come up every day. So all of these are, in general, I mean, the challenges, think not just Eloelo, but any building for Bharat organization would face. Rishi Ahluwalia (13:50) No, those are absolutely valid points there, Sagar. And I think ⁓ the language obviously will continue to be a challenging aspect in the adoption of conversational AI. Although, you know what, there are vendors who are coming up with data models which are local to India and cover many aspects of conversational AI, such as speech-to-text or text-to-speech. But I think LLM is going to play a big role here that if that language model has been trained on enough data, let's say, there are multiple languages in the south of India from Kannada to Tamil. So we need to have local models who can perform, understand the different dialects, because dialect changes every 20 kilometers in India, and so does the food. And that's how it affects the overall journey in conversational AI. But I think you brought up a great point about language switching as well. So in India, we have this tendency of switching between languages, Hindi to English, or you know what, local language and English. So that's another major challenge that conversational AI will face. But I'm sure it will eventually resolve that. But I think I also wanted to probably get your opinions on one more thing, which is data scarcity and model training. Now, with India, being such rich in tradition and such rich in the multiple languages that people speak. I think there is still some hesitation in the minds of startups and enterprises out there that are there enough data models who understand the Indian languages in live conversational context? Like how do you approach model training specifically with respect to transfer learning, synthetic data sets, and linguistic expertise? So I think while you are at the scale that you are, Sagar, and I'm sure it's going to grow even bigger, how do you anticipate adoption of these language models and with the data scarcity that is there at the moment in addressing all of these issues as well? I'm sure there is some strategy that might be effective. But I think the viewers would like to understand your approach to how you envision to tackle this. Sagar Gaonkar (16:26) Yeah, this is a problem. However, it's something which is getting better every day. Luckily, our approach has been understanding pockets of these languages. For example, South Indian languages some of them tend to be very similar. If not very similar, they are similar so cross-learning can happen there right? And this has to be to a prompt. We don't can't rely on vendors to support everything because, luckily, we have a good data where we can train new things. right We have actual creators and users talking conversating in these languages day in, day out. So there is a data problem which we can supplement. right We can take 80% ready data model, train them in-house and right work with vendors. ⁓ In some sense, this is where the vendors also benefit. that okay They get access ⁓ to a lot of training material and work with them. right And of course, it can't be done all in one go it's a very iterative approach, you pick languages you pick use cases and train them. ⁓ Most of the training also we do in-house so that we can train them with our data. Not every data set will match us because, ⁓ not very, how do I put it? like I said, the spread of the user personas for us vary a lot so training for us so we will train them with our data, we will tune them with our data etc and then that is how we are approaching the problem it's not still a solved problem but, how do I put it, it's getting better and I would say, it has to be a vendor-customer solution to make it better. It can't be just vendor or just a customer solving it by themselves, that's what my take on it is. Rishi Ahluwalia (18:31) No, I think that that's another great take there, Sagar, because that brings me to the topic of RAC, right? Retrieval Augmented Generation, where the customers have their own data. Like similarly in the case of Elo Elo. I think the good thing with Elo Elo is since you are spread across Bharat and the various states, so you have that data. And I think both good data and bad data. I think you made a very good point in the beginning is content moderation is how creators feel safe in the platform. And I think that can also be a great differentiator when training data models in-house, that this is the positive aspect of data training, and this is what needs to be avoided in order to make the platform more safer for creators, especially the females who are using your platform. ⁓ So, perfect. think, Sagar, we have discussed about some of these challenges right now. Let's explore some of the innovative use cases that you might be bringing to the table, you might be strategizing on right now. And what you think can be the next big disruptor when it comes to adoption of conversational AI in these use cases. So I think one thing that came into my mind ⁓ is that, ⁓ are you exploring or have you implemented AI hosted sessions? like let's ⁓ say, telling bots, right? Or like a co-pilot for the content creator. Or let's say ⁓ it's a chat-driven emcee who is monitoring the interaction as well and probably, you know what, keeping flags ready for content moderation as well. It's like AI is helping the host, the actual creator. to engage more with the audiences. And last but not the least, since creators cannot be available 24-7 on the platform, are you also exploring use cases such as AI-driven creators who, let's say, speak multiple languages and adopt the same dialect as the person that ⁓ is joining that conversation or the audience that is listening to that conversation? ⁓ So yeah, these are fascinating use cases. I'm sure they have their own bits of challenges, but I like to get your opinion on the same. Sagar Gaonkar (21:09) Rishi, ⁓ I don't want to divulge all ⁓ exact use cases, right? So I thought I could ⁓ give a brief on the direction what you're thinking, right? ⁓ Right. And maybe then I'll pick one or two examples of what you have mentioned, right? Do you think that is okay? Rishi Ahluwalia (21:22) You can. So I'll reprint this. That's fine, that's fine because I think this can be more details for you. Sagar Gaonkar (21:35) Because I'll tell you why, because all of this are work in progress and exactly. And these are very generic use cases. So it's not like, I mean, anybody else can't do so. I would want to not mention specifics, but I'll mention problems and what we're trying to do towards enabling them using conversational AI. Rishi Ahluwalia (21:39) Yeah, you can move to repeal him. I think the problems we have discussed, Sagar. think we need to work repetitively. What I'll do is I'll just ask you the direction that you think ⁓ in the direction of use cases which can evolve with respect to conversational AI and what's your thought on the same. I'll just keep it very brief. So, Vyabhav, are we good to go? Shall we start in like five, 10 seconds? OK. All right. Sagar Gaonkar (22:01) Yeah. Yeah, yeah, yeah. Rishi Ahluwalia (22:38) Sagar, thanks for your inputs on the challenges and the integration worries that comes with AI and the adoption of conversational AI. Now, let's probably deep dive into some of the innovative use cases and probably future directions that you have in mind. So in terms of the use case direction with the adoption of conversational AI, would you like to give ⁓ some insight to our audience as to what are the possibilities that you can unlock with the same. Sagar Gaonkar (23:16) Yeah, I think one of the very applied or famous use cases is this autonomous chatbots. ⁓ which can enable interactions between creator and user. For example, when let's say creator is unavailable or ⁓ is out of your time zone etc. right when we go let's say across countries. etc. In that case what can happen is ⁓ the creators can fill in the users of what went through the day, this was my previous life this is what went through and this creator doesn't have to do by herself, she can ⁓ we can have chatbots which enable her summarization, which can enable creator persona-type interactions with users so that users feel that, okay I she's not available right now, but I can still interact with her, and she can let me know when she's available next so that I can join her content streaming. etc That could be one of the use cases. which is one ⁓ The other thing which will help us and creators both is this whole process of onboarding, training, moderations generally takes time and this can be enabled with, let's say, AI conversational AI-enabled onboarding so that I know the creator's language etc what she's speaking in, is she fluent in something etc. I can train her in terms of how to talk, what not to talk. And this, right now, is very operational driven. This can all be enabled in-app. You just sign up and you go through this training session, onboarding session of sorts and then you can ⁓ go live, as we call it, as soon as possible. So this will help me and this basically the creator can do it at their own time not really depend on when, let's say, ⁓ the moderator or the onboarding team is available, can have multiple retakes etc., not at the cost of somebody else's time. So this enables faster creator onboarding creator go live of sorts. Then you also have use cases where, ⁓ I think you touched upon this is, cross language interactions. I can speak in one language, the other person can consume in another language of what he or she understands as real time as possible. Of course, not having to wait for the other person to actually stop and can also the bidirectional, that also responds in his or her language and then the creator can understand in her language. etc. Very tough problem to solve, but a very impactful problem also, right? I mean, if ⁓ we can solve this ⁓ nicely, right? Real audio becomes even more Real-time audio becomes even more problematic to do this. So that it's very important that you feel the naturalness of the conversation. It cannot be that I talk and then you wait, you wait for it to be translated and then respond. The naturalness of the conversation, Because these are not one-off translations, these are not one-off of conversations. these are I speak a line, you respond in one or two lines etc and the faster the better and the more realness the role, humanized the response or face is very important. So that ⁓ the conversation ⁓ quality maintains. ⁓ Doing this real time becomes a challenge. ⁓ Video is a bigger challenge, audio slightly lesser, chat more or less is a solved problem. But yeah, I think the complexity varies with real-timeness, with chat audio and video. And I would like to one more thing. Sorry. Yeah, like to add one more. Yeah. For the new micro drama segment that we have, right? Conversational AI can also create a good impact, in terms of, ⁓ ⁓ voiceovers if we need to add, right cross language voiceovers, dubbing sometimes for the marketing when you need to generate ads etc. cross languages ⁓ translations by maintaining character consistency right and you do not want let's say a dialogue being dubbed into a local language and across scenes the tonality varies the character consistency becomes very important. The emotions should be intact. Because this is not just plain conversation, these are dramas. If you lose the emotion of the sentence, etc., then it becomes too much of a problem. So this is also where we have been applying some things, but a lot more to do on these directions also. Rishi Ahluwalia (28:22) Absolutely, Sagar. think the tonality, the sentiment, and I think the preserving of the emotions in the conversation is obviously key to having a successful conversational AI environment. And I think thanks for bringing up the point where you are treating conversational AI as a supplementary tool in terms of creator support creation, right? Or content creators need that ability of, you know, what, skipping through some daily tasks or like you said, real time translation of conversations and everything feeling very human, very natural. I think that's another thing. ⁓ The cascading flow of Agora's conversational AI engine, resolves that to a very high degree. But obviously it's multiple elements like you mentioned. The accuracy of your speech-to-text transcriptions, the latency that is coming from the LLM or the RAG integration that you're using, and finally, the text-to-speech elements that drive that entire conversation. So I like it when you said that obviously it's not going to be replacing the content creators, it's going to be adding as a supplementary support to what they're already doing and help them create more content which is more suitable for the audiences. Sagar Gaonkar (29:47) Yes, yes, because we have to be true to our vision of enabling creators, enabling users, right? And that is where ⁓ I feel ⁓ conversational AI will supplement the growth for us, for creators, for users and Bharat, in general. Rishi Ahluwalia (30:04) No, absolutely. I think we are all in for the creators of Bharat and especially, on the Eloelo platform. So I hope that grows exponentially. But ⁓ I think one more thing ⁓ I wanted to bring up as part of the discussion today. Now, ⁓ again, since these are early days here, Sagar, with respect to conversational AI, and ⁓ you are also evaluating the ROI impact that eventually it will have with respect to all the investments that you might be doing. So how do you see this industry evolving in the next three to five years? Not just at Eloelo, but at the Bharat Digital Entertainment. What capabilities do you think will be most exciting, most fascinating, apart from the use cases that we've already discussed? Sagar Gaonkar (31:00) Yeah, I think ⁓ the accuracy language adaptiveness will get a lot better. ⁓ I feel ⁓ the broken approach of stitching multiple components, speech-to-text, LLM, TTS, that will soon get merged into a single component or is already getting and that merge component will get better and better. However, what I feel is we'll soon go into a model where ⁓ go into a model where one size doesn't fit all. What I meant by that is you'll get into very, very custom adaptations of the solution. The solution might be ⁓ a base, but I think right now that is one difference I see going. Right now we're trying to fit one thing to most use cases, but slowly, as you understand the use cases more and more, it could be across languages, it could be across application areas but I feel slowly will diverge and have solutions which address one small use case or a group of use cases, rather than trying to fit one size for everyone. Rishi Ahluwalia (32:32) That's absolutely right. I just want to give one comment here. I think you had more comments there, Sagar. But I think you made it very clear that a generic model might not serve all the use cases. And it will have to be niche models serving niche use cases in order to monetize what ⁓ the platform might be looking for. So thank you for bringing that up. Sagar Gaonkar (33:00) Yes, think generic models will work good in the lab, when you take it to the field, that is when you'll see the difference and that is where it could be a solution providers, it could be integrators, it could be either of them solving this, but this is how this is I see the direction in which it will eventually shape up and ⁓ get field tested in some sense. Rishi Ahluwalia (33:27) That's true and that's true and India being such a big landscape with so many languages, I think things will obviously get better and probably this will push us towards being the second or third largest economy as well in the next five years as envisioned by our Prime Minister as well. So yeah, I think. ⁓ ⁓ I think One more thing towards the end here, Sagar, is any advice and key takeaways that you would want to give back to the community who are building conversational AI teams. right? Because one thing is strategizing on use cases, building something. But I think the whole concept of building a team that actually works on data science, empowering conversational AI use cases, itself seems to be a big headache, right? So with companies like Meta throwing away money at the moment in terms of deep research, and you know, what, we've never seen those kinds of compensation ever in the industry. So ⁓ I'm sure you must have, ⁓ you are facing those challenges in building the conversational AI teams as well. Because you will need to structure these teams across ML engineering, linguistics, product and infrastructure. So any advice back to the community of how they should approach in building such teams. Sagar Gaonkar (35:00) Yeah, I'll go with the approach what we have, in some sense, done right. ⁓ This being a very new and niche field, first of all, it is very important that an awareness or a learning philosophy is built across the board and not just one or two engineers that are like okay, what is there? What is there in the field? What is that is applicable to us etc. So the awareness, the curiosity part has to be with everyone and not can't be with a small core team right across the board leadership your PMs, Engineers etc. That is very important. Also it is important that not everybody solves the same problem. What I mean by that is I mean You'll have to have a team which will focus on assessing what is already there in terms of available data models etc. Teams which will build pipelines to learn experiment. Right. I mean it could be your ML team, it could be your platform teams. etc There will be teams who will think about how is what is the use case and how is this integrated for example, right? For example in our case we use Agora for our streaming. How can these use cases be enabled in real time with our vendors? How is that you build a framework? So as much as the data science ML team is important it is also important how your systems team and product teams take it live. right Choosing the right experimentation base, as we call it, right which is independent of the product where ⁓ Agora gives us a good solution in terms of how you can integrate other, let's say, your LLM models or speech-to-speech models etc. Use them, build an experimentation base, parallelly build a team which takes care of integrating and field testing it because not everything can be tested in the lab like I said. How do you take it as experiments? How do you make it non disruptive in terms of user experience. I mean the user should feel that should it should be natural transition. It can't be a very disruptive transition. You cannot. Suddenly, I can't suddenly be hearing Rishi talking and suddenly a robotic voice talking to them. It is very important that it's a very smooth transition. etc. So try them very smoothly. Gauge how the user is responding to it, how is he communicating, etc. So it's very important you have dedicated teams, coming to your question is, a core team which builds assessment of available models, learning. Another team which builds a framework for helping us learn, deploying etc. A product team which takes care of ⁓ use-case identification, integration, field testing etc. I think this is how we have been doing it. Of course, it's a work in progress but this has worked well for us. ⁓ Rishi Ahluwalia (38:14) Yeah, absolutely, Sagar. And that also, I think, is inclined to how your journey has been, right? So you need to be open to change, because a change can basically lead you towards a chance, which can be disruptive in nature and which can help the platform and the content creators ⁓ monetize better in the same vicinity as well. ⁓ I think you also brought up a great point about everything feeling natural, right? Not be that it sounds natural at one point and again sounds robotic at another point. So I think there are other things as well which probably the community can experiment is voice cloning and probably the adoption of AI-based avatars, video-avatars, which will probably talk like them, look like them and give a more natural feel, a feel to the viewers that they are catering to. That's another thing maybe they can try out. But I think it's the teams who are open to change. It's the team who are open to transitioning, who are open to experiment, are the teams that will probably be more successful. And that's how you build a team, which can make ⁓ or bring ⁓ a major difference to your product strategy as well. Perfect. The other thing, ⁓ Sagar, here is while you build those teams, right? now ⁓ Since you've been building that for about 3 years at Eloelo and about 16 years in your past life with multiple organizations. I think there are common pitfalls that the viewers might want to understand that what should we avoid, right, when we are adopting or, you know, what, exploring the conversational AI space or the adoption of AI, in general. And another thing is the success metrics. Now, one thing is avoiding these mistakes or these pitfalls that others might have encountered. The other thing is building for success. So is it the time to first buy it for the multiple phases that comes in conversational AI? Is it the naturalness that you pointed out? So any thoughts on the same? Sagar Gaonkar (40:35) On the pitfalls, I think one is approach, right? We can't keep building building in the lab and not let it test the real field. I think it's very important that the iterative approach is very important where you build something small, you POC on it, then do a field test, gauge your metrics, how it is performing, and then iterate over it. Second is that ⁓ this is a very evolving landscape. So for example, until six months back, the de-facto approach was speech-to-text, LLM, text-to-speech. If you were over indexed on this approach and ⁓ then only optimized LLM trained. et cetera. But the latency, of course, becomes a problem there. But this new approach of let's say a speech-to-speech directly some of the newer models are providing that because it's very hard to adapt to these things. And then your LLM learnings don't really flow into this naturally, right? So it's very important that the base is set up nicely to be open to change and this industry is evolving. So it's not like what is working today will continue to work 2 weeks down the line. So it's very important that the base is ⁓ you build it in such a way, train it such a way that the learnings can be adapted to a new approach also. That is one of the things what we have faced. Second is of course, The first one, of course, I told you is about ⁓ testing in the field. ⁓ The third is, ⁓ again this is something which I mentioned earlier, this is what we have realized also that because not all languages are same, not all use cases are same, it is important that you find mixed solutions instead of trying to find one solution for all. ⁓ Something which will work for Hindi-English will definitely not be the best fit for Malayalam, let's say, right with the pronunciation, the tokenization works very differently in terms of even cost. Let's say even if the effectiveness is perfect, ⁓ southern languages are more complex and will result into more tokens and more cost. so Assuming that if it fits for one, it will work well everywhere, is not correct. Especially, I think that is when many people realized the hard way which we did and but now it has helped us shape our solutions towards languages towards use cases towards Audio versus Text versus Video very differently, right? I think ⁓ It is very important that you separate out the use cases and not try to build very generic solutions across. And coming to the second part of your question, how do you measure the success in some sense? ⁓ One is definitely the user comfort and engagement. Ultimately, if that is driving more user engagement and that will be a byproduct of they feel it real, they feel it very usefulness to their time, basically, a time they can converse it when they like, how they like. Rishi Ahluwalia (43:22) No, Yeah. Sagar Gaonkar (43:51) with whom they like none of these get broken the user engagement increases. The creators also feel safer, it enables them to onboard fast, to go online faster and help them earn more. For us, simple use cases, simple success metrics, this will have of course, these are the end metrics, this will have upstream ⁓ smaller metrics which we'll have to measure. But these are ultimately the final metrics, north-star metrics, which will guide you that whether your experiments are going well, not going well, if not going well, can you tweak them, et cetera, et cetera. Rishi Ahluwalia (44:29) Yeah, very well summarized this. I think the eventual goal is to have a simplified metric system that ⁓ gets affected by AI. So basically, if the engagement time increases on the platform with the adoption of AI and it unlocks more possibilities for monetization, this is both good for the platform and the content creator itself. And I think I'd also like to touch upon the other point where you mentioned the difference in ⁓ the cascading flow of conversational AI versus the adoption of speech-to-speech models. Again, both of them have their own sets of advantages and disadvantages. I think we also need to look at it from a developer experience standpoint, whether they want to get locked into a particular vendor, like you said. ⁓ That is a problem when it comes to speech-to-speech, but in terms of the cascading flow, the developer experience, the flexibility increases. The only thing is if the latency is as low as, ⁓ let's say, one second, or if it feels real, then even those ⁓ seconds do not matter. But it's all about what approach the teams are following, ⁓ exactly, from a user experience standpoint, how much of it has an impact on the success metrics. Perfect. I think this was outstanding, Sagar. I'm loving the insights from yourself and what you have been building for the industry. I think ⁓ to wrap this up, Sagar, one final question from my side ⁓ is for the listeners who have tuned in into this podcast, right? So what one key takeaway would you give them when ⁓ they are approaching or ⁓ in the process of building products that ⁓ has conversational AI into the vicinity? And ⁓ in terms of being successful, is there anything that they can focus on so that obviously they can turn their platform to AI first? Let me rephrase this. Sagar Gaonkar (47:05) I don't know Vaibhav if you also felt the same. Rishi Ahluwalia (47:12) I'll repeat that. So yeah, I'll do it after five seconds. ⁓ So yeah, thank you so much there Sagar for those immense information that our viewers are consuming today. And especially when it's coming directly from a person who is actually building that in the field and serving ⁓ Bharat, it is even more precious. Now, I think ⁓ to wrap this up, Sagar, any final insights or any key takeaways you would want to give back to the community who is building conversational AI products and how can they be successful in this endeavor. Sagar Gaonkar (48:10) Okay, I think experimentation is key. I think ⁓ it's two part solution, I would say. right. I will build a very strong experimentational base and ⁓ there are multiple providers, multiple ways of adapting learnings for data models, etc. Right. ⁓ So ⁓ ⁓ Building the experimentation base and then trying out solutions not sticking to one approach because that is one thing, I would say. There is approach one and there is approach two and there always be new approaches which come and having that fine balance across ⁓ cost versus latency versus product because ultimately if it has to fit into a live use case. The latency becomes very important, cost becomes very important so it cannot be that a perfect solution might not go into the field if it doesn't make it doesn't meet the latency constraints, for example. It doesn't meet Bharat constraints, for example, low network considerations. It doesn't meet ⁓ cases where, let's say, ⁓ your users don't really have good 1Mbps, 2Mbps connections. So a lot of video etc cannot be ⁓ streamed at the utmost quality, right? So a very tier one first solution won't work if ⁓ basically tier one solution won't work everywhere. So it's important that experiment field is very different from how lab works and ⁓ there are new models new approaches coming up so be open to it and that's what our learning has been in the last few months and that's what I would want others to have takeaways also. Rishi Ahluwalia (50:15) Absolutely. I think the dire need of evolving in this space and consuming new and natural solutions ⁓ can be one of the biggest takeaways and the future outlook will definitely get influenced by the same. ⁓ Absolutely. So thank you so much, Sagar, for your time today. It was indeed a very intriguing session and I think everybody who tuned in will gain a lot from this discussion. ⁓ Thank you again for your time and looking forward to building more solutions and helping Eloelo out in the same vicinity as well. Sagar Gaonkar (51:00) Thank you, Rishi. It was a good one. Rishi Ahluwalia (51:05) Thank you. Appreciate it, Sagar. Thank you so much. Thank you. Thank you, everyone who tuned in. Sagar Gaonkar (51:06) Thank you, Dhanjeeva.