Thursday, September 22, 2022

AI model from OpenAI automatically recognizes speech and translates it to English


A pink waveform on a blue background, poetically suggesting audio.

Enlarge / A pink waveform on a blue background suggesting a visual depiction of audio. (credit: Benj Edwards / Ars Technica)

On Wednesday, OpenAI released a new open source AI model called Whisper that recognizes and translates audio at a level that approaches human recognition ability. It can transcribe interviews, podcasts, conversations, and more.

OpenAI trained Whisper on 680,000 hours of audio data and matching transcripts in approximately 10 languages collected from the web. According to OpenAI, this open-collection approach has led to "improved robustness to accents, background noise, and technical language." It can also detect the spoken language and translate it to English.

OpenAI describes Whisper as an encoder-decoder transformer, a type of neural network that can use context gleaned from input data to learn associations that can then be translated into the model's output. OpenAI presents this overview of Whisper's operation:

Read 4 remaining paragraphs | Comments

Reference : https://ift.tt/IKxVvbj

No comments:

Post a Comment

CES 2025 Preview: Needleless Injections, E-Skis, and More

This weekend, I’m on my way to Las Vegas to cover this year’s Consumer Electronics Show. I’ve scoured the CES schedule and lists of exhib...