Hermes Frangoudis (00:35)
Hi everyone, welcome to the Convo AI World podcast where we interview the teams and builders forging ahead in the voice-first AI world. Today, I'm so excited to have with me Jeff Lu from Akool. And ⁓ thanks for joining me today, Jeff.

Jeff Lu (00:53)
Yeah, thank you for inviting me to the podcast and I'm excited to share something that everyone will like.

Hermes Frangoudis (01:03)
Let's get into it. Can you share the origin story of Akool? What was the core problem or catalyst that led you to found the company?

Jeff Lu (01:13)
Yeah, yeah. So Akool was founded about four years ago and at that time ⁓ we noticed that lots of  video generation technology is becoming mature but we didn't see much products out there on the market. We believe ⁓ the video generation can help many organizations and teams to do lots of work because creating video is very hard and also

want to do video to do interaction is very hard. So that's how we get started and just make video creation much easier, personalized, make communication easier and they also make it more interactive and fun. ⁓

Hermes Frangoudis (01:58)
That's so cool. Let's dive in and kind of unpack that. So with what you guys do at Akool, you do a lot of visual storytelling, right? And is that correct? Yeah, that's right. Yeah. So when you think about visual storytelling, how does Akool do these things that no one else seems to be able to do?

Jeff Lu (02:11)
Yeah, that's right.

Yeah, so on the video storytelling, I think a person is the most important thing in the story. So we really focused on the person, the characters in the story, and tried to generate ⁓ photorealistic and ⁓ people and so on that it's very hard to tell it's AI or not. People really like that. And meanwhile, ⁓

a very unique stuff is we are doing a lot of this stuff in live real time, as well as running on the ⁓ edge devices rather than on depending on the cloud compute. And  that's also changes ⁓ changes it a lot. Just imagine you are able to like interact with live characters like persons in the video ⁓

in real time, I think that's kind of a different level of experience.

Hermes Frangoudis (03:23)
Yeah, that face-to-face interaction really changes how the feeling that you get,

Jeff Lu (03:33)
Yeah, that's right.

Hermes Frangoudis (03:35)
So you talk about not only running in the cloud, but running on edge. So what does that mean like reducing those sort of technical barriers?

Jeff Lu (03:46)
Yeah, so now lots of the AI models are pretty big and you need to run them ⁓ on the cloud with high energy use and compute. What we do, we are doing a lot of things, ⁓ optimizing the compute and optimizing the stuff to make them run on their ⁓ edge devices and especially on the devices such as your laptop,

such as the mobile phones and so on. So there are several advantages. First is greatly reduce the cost. So imagine if you are doing live video interactions concurrently with thousands of people and all the computing happens on the cloud. That's extremely expensive. And if that happens on your edge device, that's very doable. You can even do millions of people at the same time. And also,

⁓ lot of the things regarding to latencies on devices that are much better and as well as securities, privacy on devices that are much better. So ⁓ many reasons to make it happen on the device.

Hermes Frangoudis (04:59)
So yeah, bringing that computation to the end user, right? Like on their device, if possible, the ultimate edge, as someone said.

Jeff Lu (05:10)
Yeah, yeah, that's right.

Hermes Frangoudis (05:12)
So speaking of customers, how do ⁓ Akool's customers use the tools today? what do you really see as some of the most common use cases versus maybe some of the more surprising?

Jeff Lu (05:25)
Yeah, yeah. So for the ⁓ use case and so on, there are quite many different use cases. We start with ⁓ marketing advertisements and then we go to more of the things like film productions, and we go more expanded into the field such as ⁓ powering other

AI agent companies with a face and a lot of the company internal communications. So yeah, these are the common use cases that we do.

Hermes Frangoudis (06:05)
think you touched upon some really interesting use cases. I'd love to dive a little bit into each of those. So you said you started marketing, advertising. I think that kind of makes sense, right? They are the hungriest for content and probably the most strapped for budget when it comes to creating big content. You said you guys also do internal communications. So can you tell me a little bit about that kind of use case?

Jeff Lu (06:34)
Yeah, in terms of that, it's really related to translation. So we are able to translate videos into multiple different languages from either static video or live video. And we can do that in 150 plus different languages. So for many large organizations, their leadership team might only be able to speak one or two languages.

But we are able to translate their speech into like tons of different languages and distribute to their employees globally. So ⁓ it applies to a lot of the live meetings as well and so on.

Hermes Frangoudis (07:23)
That's so cool. I think that's actually like a very interesting use case and definitely changes internal business communications that could sometimes struggle when things get lost in translation. How do you kind of balance the needs of these different users? Because you said, okay, there's marketing users, there's these internal communications and you even mentioned the film and cinema. Like these have to be very different style creators. So how do you kind of balance their needs?

Jeff Lu (07:53)
Yeah, yeah, for lots of their needs and so on, we do definitely prioritize their use case and so on that can generate more revenue. And their startup, we think that's a metrics that's very important for us to check which means ⁓ which has a bigger ⁓ market

and also which is more of a growing market rather than existing competing market and so on and how much market share we can get. So we kind of consider all the things. But meanwhile, we have our vision about where the technology should go and where we are good at. So it's a balancing between

their vision and the things as well as their market needs and so on.

Hermes Frangoudis (08:58)
Yeah, you gotta meet the customer need, but not try to please everyone in such a way that it derails from your core mission. I totally get that. ⁓ And your core mission of delivering these video avatars, right? Yeah, sorry, Can you explain a little bit about that?

Jeff Lu (09:19)
Yeah.

Yeah. So our core mission is to help lots of the different organizations and the businesses to generate, create a video significantly easier and integrate the videos into their workflow and into their interactions and works and so on.

Hermes Frangoudis (09:40)
So as they generate video, what do you think it is about this visual AI that makes it more ⁓ sticky? We've seen it with pre-recorded video. It's easier to follow than written text. But how do you see it now as it takes this next leap into the world of AI? How does that make it even more sticky?

Jeff Lu (10:04)
Yeah, you can think of communications in different levels. The first level of communication is text only. You read books and so on with text. And then the next level is probably voice. And some voice books. And the next level is video. So it's movies and videos. Definitely, video passes significantly more information than voice or text and so on.

And it's much more engaging as well. So it can frequently with video, you can tell much more engaging and convincing stories, as well as you can tell, make people understand that it is significantly easier. So that's how I understand the situation.

Hermes Frangoudis (10:54)
Totally makes sense. It's the ability to kind of connect more with more of the senses, right? Like the written is one sense, the audio is two, but then the visual, it's like audio and visual at the same time. Brings it really together. I want to talk a little bit about your technology stack. Really, what makes Akool's model architecture, like data training like what's unique about it?

Jeff Lu (11:25)
Yeah, so on the technology side, we really focus on developing as many technologies in-house as possible. So lots of the things that we do, we try to make it in-house and do it ourselves. And meanwhile, ⁓ we do leverage some open-source models, mainly on the foundation model side. And ⁓ our core work over there is to make the learning faster

and with more resource constraints and fine-tune with our data and so on. So we're a pretty tech-heavy team and we believe that tech differentiation is very important in the market and we need to keep improving on that.

Hermes Frangoudis (12:12)
Super cool to hear. So you use a mix of combination of foundational models like open source, but also your own models that you're training, right? Specifically for this.

Jeff Lu (12:23)
Yeah, the core models are mainly our trend, especially for avatars. For avatars, we actually developed the whole tech stack, but we also have some other ⁓ things that are more on the video foundation model side, not directly related to the avatar business, but these are ⁓ mainly built on top of open source and optimized on top of open-source.

Hermes Frangoudis (12:49)
cool. So when we're talking about ⁓ the stack and you say you've built a lot of this in-house, like what tool in your stack is maybe the hardest to scale or technically in terms of adoption? Like what is one of the maybe pain points you felt along the way?

Jeff Lu (13:14)
Yeah, so the the pain point we felt along the way like mainly around how to create video quality that's super high and how to get very precise control and also how to like

reduce the cost. So these are some of the problems that we have been constantly working on. ⁓ And on the cost side, definitely that's very important because it's competitive, and it makes real-time optimization possible. And on the other side, the result quality is very important as well. Lots of the customers, especially B2B customers, they expect very good result quality to be able to use them.

And they are less price sensitive on the B2B side.

Hermes Frangoudis (14:17)
That kind of leads me into my next question. What kind of trade-offs do you face between speed, realism, and controllability in the generation? Because it seems like it's speed versus price sensitivity, but are there other levers that you have to kind of twist and turn?

Jeff Lu (14:36)
Yeah, so there are several factors that they are associated and we need to twist and make it into the ideal status. The first piece is actually around the, what I would say, definitely the result quality. So most of the customers expect the best result quality. And then the next piece is how to run it faster, more efficiently.

Reduce the cost, increase the speed, and all these kind of things. And then the next piece is ease of use. And if your tool is too complex, it's very hard to use and people will get into confusion and so on. If your tool is too simple and don't give you that too much freedom, then the professional users will have a lot of difficulties about all these kinds of things.

And also the flexibility we are balancing, like how much flexibility we want to give to the user in terms of their APIs and so on, as well as back to how easy it is to use. So quite many factors ⁓ are combined. And there are lots of balancing. And ideally, we want to let users choose. So give them more options and let them choose what is best for them.

Hermes Frangoudis (16:05)
So the real balance is balancing what to expose to the user to give them that freedom without giving them too much freedom to mess it up. 

Jeff Lu (16:17)
Yeah, yeah, that's right. That's right. So definitely we have ⁓ lots of stuff ongoing internally with the various layers of models and abilities and so on. And we want to make sure they get what they need, but also make sure that they are not overwhelmed.

Hermes Frangoudis (16:37)
No, that makes sense and super cool. Like giving the users that that sort of freedom is so important to getting what they feel is like the perfect ⁓ output, right? So the more control you have, the more comfortable you feel with what you're getting out of it. ⁓ In terms of what you're getting out of it, what role does like real-time inference play in products like your streaming avatars and live camera?

Jeff Lu (17:04)
Yeah, so in streaming avatars and so on, there are quite many things ongoing over there. So we do quite many things around AI agents and AI agents related. We integrate into many  AI agent platforms and also with hardware as well to make things happen.

Also, translation is also a very interesting one for translating meetings and all the things in real time, live and so on.

Hermes Frangoudis (17:45)
Yeah, so you can have two options, right? Like, you could be live translating a person and then live or communicating in real time with a completely generative avatar, correct?

Jeff Lu (17:59)
Yeah,

Yeah.

Hermes Frangoudis (18:02)
Those are both wildly different directions, it feels like, but under the hood technologically, is it like basically solving the same problem or is there different nuances in each one and each approach?

Jeff Lu (18:20)
I think there are many other technologies, the underlying technologies are connected. So ⁓ it's just about how they are being tuned and adjusted, how the system is designed and so on. But lots of technology we are talking about, the underlying framework is similar. So ⁓ we definitely

try to develop new solutions based on similar foundations. Otherwise, workload on our side would be extremely high as well. So, ⁓ kind of expanded the feature family group, which is ⁓ interconnected. So when we launch new features, we don't need to write everything from scratch. We can leverage our existing things to ⁓

⁓ amplify our product features.

Hermes Frangoudis (19:21)
That's awesome how everything kind of like works to build on itself and you never really like starting over just from scratch again. Can you walk us through a little bit about what makes the video translation more advanced than like your typical dubbing or like lip-sync style technology?

Jeff Lu (19:45)
Yeah, yeah. So there are several things that we do in the video translation. First is we clone the voice and we translate the language, and we make sure it fits the video lens and we support 150 plus different languages. And second is we do the whole face reanimation rather than just the mouth region. So we do a lot of reanimations of the

people's faces and so on. And also we have a very well-designed workflow that allows people to easily ⁓ get input of video and get everything clear and then output the results and so on. So we also have lots of freedom to allow the users to post-edit their results to make sure the translation or the things are accurate and this kind of stuff.

And what's more, we are the only ones that's able to ⁓ put the whole pipeline to live real time, which means real-time interactions with the translations. And that happens in all the different places, ⁓ including, I think, even webinar and meetings in all these different places. So definitely that's a lot.

Hermes Frangoudis (21:09)
That's huge for like those massive global town hall. Now you can have your senior leadership speaking in their native languages, but having it come to the audience in their own native language with all the proper reanimation of the face to make it feel realistic and not just like it's a cheap dub.

Jeff Lu (21:30)
Yeah, yeah, that's right.

Hermes Frangoudis (21:34)
I think our producers for this show are going to want to dive into that to see how we can bring this, even our show, just to different audiences in their own languages, right? Because you can reanimate and people could be listening and watching along in their own native language without having to listen to the original in English, but still get a first-class experience.

Jeff Lu (21:56)
 Right. That's right.

Hermes Frangoudis (21:58)
So as a generative AR market is really like evolving rapidly, how does Akool stay ahead without really chasing every trend? You mentioned building your own in-house technology. Is that really the core or are there other pieces that you use to kind of stay ahead of things?

Jeff Lu (22:18)
Yeah, yeah. So there are quite many things that we do. And I know there are lots of new things that are coming out and people might want to chase every trend and so on. For us, we kind of limit the scope that we do. We don't want to chase every trend on the market. But

we ⁓ also want to follow up the core offerings we have and make sure we are leading in the core offerings. And in terms of if it's not our core offering, then we kind of slow a little bit until the market opportunity becomes more clear and so on. Then we think it's a good value, we go. So we kind of have different strategies when we decide it's our core stuff, then we need to go. We need to lead it. And when we decide it's not our core stuff,

⁓ we tend to watch how the market develops a little bit longer and then decide whether we want to do it or not.

Hermes Frangoudis (23:26)
Okay, so with the core, it's really about delivering upon the mission and understanding this is very clearly where we're going. But for the other pieces, it's hearing the customer needs, hearing the maturity of the market opportunity and deciding which makes sense to bring in. Do you bring it into the core or does it still kind of stay as this peripheral thing that only gets love based on how much market opportunity there is? ⁓

Jeff Lu (23:56)
Yeah, we run things a little bit like ⁓ labs and so on. For the new features, we think there might be opportunity. We kind of launch a lightweight version first and see how much traction we get. We value user feedback and so on. And we decide that's getting us a huge amount of traction and so on.

Then we will ⁓ move to getting that into our core offering and increase their priorities and so on.

Hermes Frangoudis (24:35)
Make sense. You roll it out almost like like beta, test the water, see how much ⁓ what's it called? see how much interest there is in the market, right?

Jeff Lu (24:47)
Yeah, yeah, that's right.

Hermes Frangoudis (24:49)
⁓ In that sense, like how do you balance experimentation with stability? Like when you roll out a new feature, is it fairly early or do you try to make it mature enough that when you roll it out, it's not gonna cause issues?

Jeff Lu (25:09)
We wait until the feature is mature, then we roll out. Before that, we might put it in beta and people can test out or we might have invited parties to come and use them. But for our official launch, they are all mature products. So just make sure that people have a consistent expectation of the product and the qualities of the overall platform.

Hermes Frangoudis (25:39)
Nice. So in terms of the platform and your customers, where are you really seeing more interest? Is it these real-time live stream overlays or the real-time avatars or is it like post-production re-dubbing?

Jeff Lu (25:59)
Yeah, so ⁓ we think lots of things in the market have opportunities. It's just which opportunity we can grab it easier and faster and which opportunity is bigger along the way moving forward. So we kind of constantly evaluating. Currently we believe the ⁓

live avatars and the live-video generation side, we see more opportunities and more chances and so on.

Hermes Frangoudis (26:32)
That real-time, we see a lot of opportunity there too.  So think about the broader industry what are some of the more recent trends that have genuinely impressed you and maybe change your mind on what you thought was possible?

Jeff Lu (26:37)
Yeah.

possible? Yeah, yeah, I think first of all, recent development in the video foundation models makes definitely increase the quality a lot. We used to believe that ⁓ very high-quality videos was still very far away ⁓ for the generative AI video models, but ⁓ they are becoming true. So many of them are having

pretty high-quality results. They are not as like movie quality or some real video quality yet, but I think they are getting there. It's definitely a lot of excitement in terms of the video-quality improvements and all these kind of things. On the other side, we do see that ⁓ quite many things happening in the world model as well. So

we see in the world model side, seems quite a lot of improvements are ongoing as well. It seems more related to video game though, but it's also a very ⁓ exciting piece. And in terms of the precise controls, we also see quite many interesting things happening like Nano Banana that allows you to have very precise controls of how the

Images are being added and updated and so on.

Hermes Frangoudis (28:18)
Yeah, that Nano Banana, I think, ⁓ just really blew the market open. So speaking of synthetic media and like really being able to alter what we see, I think this is huge. But what's your stance on like watermarking and authenticity on this piece? Like how do you balance "this is the source of truth" versus "this is the digital"?

Jeff Lu (28:49)
Yeah, yeah. for the, how do we do that? Source of truth and a digital, we do quite many things. First is we do have a watermark system that can ⁓ embed watermarks into videos and it can be visible, can be invisible and help users that it's AI-generated. And the second, we do have system to test and charge whether it's AI-generated or not.

And the third, there are lots of content moderation and safety-related  rules we apply on our platform to make sure that ⁓ users are doing the right content and so on. So of course, quite many things applied to ensure the security and the safety of the platform.

Hermes Frangoudis (29:47)
That's huge. Having people feel safe and secure when using the tools is very important for the end user.

Jeff Lu (29:54)
Yeah, yeah, yeah. And we have one more thing called Jarvis, content moderator, and it can moderate any type of content. So if it's a burger brand, then you can auto-detect whether the people throw the burger on the ground or any kind of stuff. So it's definitely lots of interesting stuff ongoing, and there are quite many efforts, very interesting.

Hermes Frangoudis (30:22)
It's super interesting. It's almost like a way to correct or double-check the outcome, right?

Jeff Lu (30:30)
Yeah,

yeah, yeah, that's right.

Hermes Frangoudis (30:33)
⁓ So what's your take on multimodal foundational models? Do you think they're there yet or is it still a long way off?

Jeff Lu (30:43)
Yeah, I think they are ⁓ definitely much better than before. ⁓ And there will still be lots of space for the improvement. So we can wait and see where the trend is going and so on. ⁓ For us, we believe the progress improvement of these models will ultimately benefit the ecosystem.

And lot of companies will be able to leverage these models to provide better products and solutions. ⁓ We are not so directly related to the multimode foundation models yet, but we do see the potential to use them in the future.

Hermes Frangoudis (31:33)
Okay, so they're not quite there yet. They're just something to keep an eye on. Okay.

Jeff Lu (31:38)
Yeah, I mean, it depends on your application.

It's not quite there yet for us, but I think, for other applications or other places, I think probably it's already there. Like on the language side and so on, probably they're already very good if you're main use cases on the language side. But if the main use cases is on the video side, I think they still need to take some time.

Hermes Frangoudis (32:04)
Crawling there. But it's good because that keeps you in the lead right now.

Jeff Lu (32:09)
Yeah.

Hermes Frangoudis (32:12)
So in terms of developments in generative AI, there's so much going on. What do you think is kind of underhyped at the moment? Like it's not gaining enough attention, yet it's super interesting and maybe more people should be looking.

Jeff Lu (32:28)
Yeah, so in the gen AI that's under hype, I think there are quite many things that are being ignored in the place, especially in the application layer. From the foundation model layer, we think the real time piece of the videos, they definitely can get much more significant attention, as well as how to reducing the cost of the compute.

You know, now everyone is very proud that they use a lot of compute to change their models or influence their model. The more resources they use, the more proud they are. But that might not be the direction to go in the future. Right? So that's also a very interesting trend. And as well as we noticed ⁓ quite many of these

⁓ being used in various applications in various places. And we do believe that they can be much more helpful, like legal or like lot of the documents or boring task related. I mean, they are already got attention, but they could play a significant bigger role. I mean, ⁓ they are really good even.

Even from the legal side, right? So ⁓ they can do very good research and we did the legal works and so on.

Hermes Frangoudis (34:02)
I think that that's a very interesting point. The, The underhyped is a lot of these like workflows that it it's, it's very time consuming. And that's why generally it requires a lot of, ⁓ human task effort versus an AI that can really kind of dive into it and go forward. But you, you touched upon something that I also thought was interesting is it's underhyped that people

are overhyped around their compute and the cost of compute has almost been like subsidized because of the race to train the best models and the biggest models and I think we're kind of going to see a reckoning of that. Do you agree?

Jeff Lu (34:50)
Yeah. So I think, I think in that side, it's hard to tell because, ⁓ because people are already overexcited about pushing the best model out. So, I mean, ⁓ call ROI or return on investment is not their like, ⁓ most important indicator anymore. So

it's just a different game.

Hermes Frangoudis (35:23)
Makes sense. Right now, it's even the the game of even the pricing, right? Like the token output, even if this model has a better pricing because of this in and out, some of these reasoning models and things like that, they just they they take more tokens so it's almost like obscured from from the user. Like the actual cost of this, it feels like ⁓ like you said the ROI is not the same.

So maybe that's just it is the benchmark of what is considered leading is not really so much like best return on your investment, but just best model, right? Like it's a zero-sum game for some companies?

Jeff Lu (36:09)
Yeah, yeah, yeah, yeah, that's right.

Hermes Frangoudis (36:13)
So how will this, ⁓ How do you think all of this is gonna evolve human creativity? Do you think anyone's gonna, just anyone off the street will be able to make these like movie-quality content from a single prompt? Or do you think it's just gonna change the style in which people use the tools, right? Like it's not that anyone can do it, but more so that anyone that's trained can do it.

Jeff Lu (36:42)
I think the goal of this industry is to make everyone can do it. And I believe that that's going to happen. Definitely the involvement of technology is just lowering the bar for more and more things and making everyone able to generate movie-quality videos ⁓ is ⁓ the goal of the one of the goals ⁓ in the industry.

Hermes Frangoudis (37:11)
So we're getting towards the end of my questions here. But one question that I always say for the end is, it's a bit of a wild card. So if you weren't building Akool, what area of AI do you find most exciting and that you'd be chasing right now?

Jeff Lu (37:29)
So yeah, I have been actually in the video space for a very long time. So my passion had always been ⁓ in the video. So if I don't build Akool, I think I will still be in the AI video space and doing some great stuff with the AI videos. It's just either ⁓ the models we do or some other

directions or so on, but pretty deep in the AI video for very long.

Hermes Frangoudis (38:06)
So no matter what direction you go, it's all AI video for you, right, Jeff?

Jeff Lu (38:10)
Yeah, yeah,

Yeah, yeah. So I think that's most interesting thing. We see it fits the tech trend, fits the history movement, also fits the interest and so on. So I think it will be in the AI video side.

Hermes Frangoudis (38:30)
I think we're just kind of at the very beginning of this AI video journey. The industry is so young and so exciting.

Jeff Lu (38:40)
Yeah, definitely. We see lots of the exciting things here.

Hermes Frangoudis (38:45)
Well, Jeff, I really want to thank you for your time today and thank everyone for following along, watching live and those listening along.

Jeff Lu (38:56)
All right, yeah, thank you ⁓ for everyone who show up here and listen to the podcast.

Hermes Frangoudis (39:03)
So for everyone following, ⁓ like, subscribe, do the social media thing, and we'll see you in the next one. Thank you.

Jeff Lu (39:14)
All right, thank you.