Thursday, March 27, 2025

OpenAI’s new AI image generator is potent and bound to provoke


The arrival of OpenAI's DALL-E 2 in the spring of 2022 marked a turning point in AI when text-to-image generation suddenly became accessible to a select group of users, creating a community of digital explorers who experienced wonder and controversy as the technology automated the act of visual creation.

But like many early AI systems, DALL-E 2 struggled with consistent text rendering, often producing garbled words and phrases within images. It also had limitations in following complex prompts with multiple elements, sometimes missing key details or misinterpreting instructions. These shortcomings left room for improvement that OpenAI would address in subsequent iterations, such as DALL-E 3 in 2023.

On Tuesday, OpenAI announced new multimodal image generation capabilities that are directly integrated into its GPT-4o AI language model, making it the default image generator within the ChatGPT interface. The integration, called "4o Image Generation" (which we'll call "4o IG" for short), allows the model to follow prompts more accurately (with better text rendering than DALL-E 3) and respond to chat context for image modification instructions.

Read full article

Comments

Reference : https://ift.tt/UmfKbXo

No comments:

Post a Comment

"The Doctor Will See Your Electronic Health Record Now"

Cheryl Conrad no longer seethes with the frustration that threatened to overwhelm her in 2006. As described in IEEE Spectrum , Cheryl’...