Categories: News

D-ID Introduces Advanced AI Video Translation with Voice Cloning and Lip Sync

AI video creation platform D-ID is the latest company to ship a tool for translating videos into other languages using AI technologies. However, in this case, D-ID also clones the speaker’s voice and changes their lip movements to match the translated words as part of the AI editing process.

The technology stems from D-ID’s earlier work — which you may recall from the viral trend a few years ago where users were animating their older family photos, and later those photos were able to speak. On the back of that success, the startup closed on $25 million in Series B fundraising in 2022 with an eye on serving its increasing number of enterprise customers in the U.S. who were using its technology to make AI-powered videos.

With the company’s now-launched AI Video Translate tech, currently being offered to D-ID subscribers for free, creators can automatically translate their videos into other languages to help them expand their reach. In total, there are 30 languages currently available, including Arabic, Mandarin, Japanese, Hindi, Spanish and French, among others. A D-ID subscription starts at $56 per year for its cheapest plan and the smallest number of credits to use toward AI features and then goes up to $1,293 per year before shifting to enterprise pricing.

D-ID suggests the new AI video technology could help customers save on localization costs when scaling their campaigns to a global audience in areas like marketing, entertainment, and social media. The technology will compete with other solutions for both dubbing and AI video.

For years, dubbing technologies have made it easier for video viewers to listen to audio in their own language but were often inaccessible to smaller creators. That’s been changing as companies improved access to technology. For example, YouTube released a multi-language audio feature designed to help its creators connect with a wider audience by translating their videos into other languages. Well-known creator MrBeast (Jimmy Donaldson) was among the early adopters, having used the tech to bring several of his popular videos to 11 more languages.

With AI, the ability to create, translate, or clone voices is also expanding. Microsoft this year announced it would use AI to translate and dub YouTube videos, and others, while you watch. In July, creator platform Vimeo unveiled tools to translate audio and captions and to do so by replicating the speaker’s voice with AI technology. Numerous companies also offer voice cloning or AI translation tools (or sometimes both), including those from Descript, ElevenLabs, Speechify, Veed, Camb.ai, Captions.ai, and Akool, to name a few, as well as tools that let you create videos using AI avatars that can speak dozens of languages, like those from HeyGen, Deepbrain AI and others.

Dubbing and lip sync AI libraries, like Wav2lip, have also made it easier for startups to build these sorts of tools while pitching to creators that they make it easier, and perhaps more affordable, to use AI technology. (D-ID’s newly developed proprietary model called Rosetta-1 powers AI Video Translate.)

D-ID says its new Video Translation technology will be available through D-ID Studio and its API. A one-month trial is being offered and further demos are on its website.

The company says videos can be between 10 seconds and 5 minutes in length, and file size should be under 2GB. The feature works with only one person in the frame and, for best result, they should be facing the camera with their face visible at all times.

admage