Categories: News

OpenAI Unveils Realtime API and Exciting Features for AI Developers

It’s been a tumultuous week for OpenAI, full of executive departures and major fundraising developments, but the startup is back at it, trying to convince developers to build tools with its AI models at its 2024 DevDay. The company announced several new tools Tuesday, including a public beta of its “Realtime API”, for building apps with low-latency, AI-generated voice responses. It’s not quite ChatGPT’s Advanced Voice Mode, but it’s close.

In a briefing with reporters ahead of the event, OpenAI chief product officer Kevin Weil said the recent departures of chief technology officer Mira Murati and chief research officer Bob McGrew would not affect the company’s progress.

“I’ll start with saying Bob and Mira have been awesome leaders. I’ve learned a lot from them, and they are a huge part of getting us to where we are today,” said Weil. “And also, we’re not going to slow down.”

As OpenAI undergoes yet another C-suite overhaul – a reminder of the turmoil following last year’s DevDay – the company is trying to convince developers that it still offers the best platform to build AI apps on. Leaders say the startup has more than 3 million developers building with its AI models, but OpenAI is operating in an increasingly competitive space.

OpenAI noted it had cut costs for developers to access its API by 99% in the last two years, though it was likely forced to by competitors such as Meta and Google continuously undercutting their prices.

One of OpenAI’s new features, dubbed the Realtime API, will give developers the chance to build nearly real-time, speech-to-speech experiences in their apps, with the choice of using six voices provided by OpenAI. These voices are distinct from those offered for ChatGPT, and developers can’t use third party voices, in order to prevent copyright issues. (The voice ambiguously based on Scarlett Johansson’s is not available anywhere.)

During the briefing, OpenAI’s head of developer experience, Romain Huet, shared a demo of a trip planning app built with the Realtime API. The application allowed users to verbally speak with an AI assistant about an upcoming trip to London, and get low-latency responses. The Realtime API also has access to a number of tools, so the app was able to annotate a map with restaurant locations as it answered.

At another point, Huet showed how the Realtime API could speak on the phone with a human to inquire about ordering food for an event. Unlike Google’s infamous Duo, OpenAI’s API can’t call restaurants or shops directly; however, it can integrate with calling APIs like Twilio to do so. Notably, OpenAI is not adding disclosures so that its AI models automatically identify themselves on calls like this, despite the fact that these AI-generated voices sounds quite realistic. For now, it seems to be the developers’ responsibility to add this disclosure, something that could be required by a new California law.

As part of its DevDay announcements, OpenAI also introduced vision fine-tuning in its API, which will let developers use images, as well as text, to fine-tune their applications of GPT-4o. This should, in theory, help developers improve the performance of GPT-4o for tasks involving visual understanding. OpenAI’s head of product API, Olivier Godement, tells TechCrunch that developers will not be able to upload copyrighted imagery (such as a picture of Donald Duck), images that depict violence, or other imagery that violates OpenAI’s safety policies.

OpenAI is racing to match what its competitors in the AI model licensing space already offer. Its prompt caching feature is similar to the feature Anthropic launched several months agoallowing developers to cache frequently used context between API calls, reducing costs and improve latency. OpenAI says developers can save 50% using this feature, whereas Anthropic promises a 90% discount for it.

Lastly, OpenAI is offering a model distillation feature to let developers use larger AI models, such as o1-preview and GPT-4o, to fine-tune smaller models such as GPT-4o mini. Running smaller models generally provides cost savings compared to running larger ones, but this feature should let developers improve the performance of those small AI models. As part of model distillation, OpenAI is launching a beta evaluation tool so developers can measure their fine-tune’s performance within OpenAI’s API.

DevDay may make bigger waves for what it didn’t announce. For instance, there wasn’t any news about the GPT Store announced during last year’s DevDay. Last we’ve heard, OpenAI has been piloting a revenue share program with some of the most popular creators of GPTs, but the company hasn’t shared much since then.

Also, OpenAI says it’s not releasing any new AI models during DevDay this year. Developers waiting for OpenAI o1 (not the preview or mini version) or the startup’s video generation model, Sora, will have to wait a little longer.

admage

Share
Published by
admage

Recent Posts

InstantIR: Restore Your Images

Figuring out the model that can fix your low quality pictures? Now, restoring your low…

2 days ago

Omnigen: Next-Gen Image Generation & Editing Tool

Traditional diffusion models uses various mechanisms for image modification like ControlNet, IP-Adapter, Inpainting, Face detection,…

1 week ago

Create Engaging Videos with Mochi1

Mochi 1, an open-source text-to-video diffusion model has been released by Genmo.  Trained with 10…

3 weeks ago

Local Installation of Stable Diffusion 3.5

So, it's finally here. Stable Diffusion 3.5 has been released by StabilityAI on October 22nd,…

4 weeks ago

Video Depth Mapper: Efficient 3D Depth Mapping Solution

Due to the extreme diversity of video content like motion, camera panning, and length, the…

4 weeks ago

Top 30 Negative Prompts for Reliable Diffusion Models

 You can use these negative prompts while generating images using Stable Diffusion models.1. Clear BrandingPrompt:text,…

1 month ago