Monday, May 13, 2024

ChatGPT-4o allows real-time audio-video conversations with an “emotional” AI chatbot


Abstract multicolored waveform

Enlarge (credit: Getty Images)

On Monday, OpenAI debuted GPT-4o (o for "omni"), a major new AI model that can reportedly converse using speech in realtime, reading emotional cues and responding to visual input. It operates faster than OpenAI's previous best model, GPT-4 Turbo, and will be free for ChatGPT users and available as a service through API, rolling out over the next few weeks.

OpenAI revealed the new audio conversation and vision comprehension capabilities in a YouTube livestream titled "OpenAI Spring Update," presented by OpenAI CTO Mira Murati and employees Mark Chen and Barret Zoph that included live demos of GPT-4o in action.

OpenAI claims that GPT-4o responds to audio inputs in about 320 milliseconds on average, which is similar to human response times in conversation, according to a 2009 study. With GPT-4o, OpenAI says it trained a brand new AI model end-to-end using text, vision, and audio in a way that all inputs and outputs "are processed by the same neural network."

Read 12 remaining paragraphs | Comments

Reference : https://ift.tt/sQfLped

No comments:

Post a Comment

Predictions From IEEE’s 2024 Technology Megatrends Report

It’s time to start preparing your organization and employees for the effects of artificial general intelligence, sustainability, and digi...