OpenAI’s new AI image generator pushes the limits in detail and prompt fidelity

20 Settembre 2023 News

On Wednesday, OpenAI announced DALL-E 3, the latest version of its AI image synthesis model that features full integration with ChatGPT. DALL-E 3 renders images by closely following complex descriptions and handling in-image text generation (such as labels and signs), which challenged earlier models. Currently in research preview, it will be available to ChatGPT Plus and Enterprise customers in early October.

Like its predecessor, DALLE-3 is a text-to-image generator that creates novel images based on written descriptions called prompts. Although OpenAI released no technical details about DALL-E 3, the AI model at the heart of previous versions of DALL-E was trained on millions of images created by human artists and photographers, some of them licensed from stock websites like Shutterstock. It’s likely DALL-E 3 follows this same formula, but with new training techniques and more computational training time.

Judging by the samples provided by OpenAI on its promotional blog, DALL-E 3 appears to be a radically more capable image synthesis model than anything else available in terms of following prompts. While OpenAI’s examples have been cherry-picked for their effectiveness, they appear to follow the prompt instructions faithfully and convincingly render objects with minimal deformations. Compared to DALL-E 2, OpenAI says that DALL-E 3 refines small details like hands more effectively, creating engaging images by default with “no hacks or prompt engineering required.”

A DALL-E 3 image provided by OpenAI with the prompt: “An illustration of an avocado sitting in a therapist’s chair, saying ‘I just feel so empty inside’ with a pit-sized hole in its center. The therapist, a spoon, scribbles notes.”
A DALL-E 3 image provided by OpenAI with the prompt: “A vast landscape made entirely of various meats spreads out before the viewer. tender, succulent hills of roast beef, chicken drumstick trees, bacon rivers, and ham boulders create a surreal, yet appetizing scene. the sky is adorned with pepperoni sun and salami clouds.”
A DALL-E 3 image provided by OpenAI with the prompt: “A minimap diorama of a cafe adorned with indoor plants. Wooden beams crisscross above, and a cold brew station stands out with tiny bottles and glasses.”
A DALL-E 3 image provided by OpenAI with the prompt: “Close-up photograph of a hermit crab nestled in wet sand, with sea foam nearby and the details of its shell and texture of the sand accentuated.”
A DALL-E 3 image provided by OpenAI with the prompt: “A paper craft art depicting a girl giving her cat a gentle hug. Both sit amidst potted plants, with the cat purring contentedly while the girl smiles. The scene is adorned with handcrafted paper flowers and leaves.”
A DALL-E 3 image provided by OpenAI with the prompt: “Pixel art scene of Coit Tower standing tall on Telegraph Hill, with a panoramic view of the city below and birds flying around.”
A DALL-E 3 image provided by OpenAI with the prompt: “Tiny potato kings wearing majestic crowns, sitting on thrones, overseeing their vast potato kingdom filled with potato subjects and potato castles.”
A DALL-E 3 image provided by OpenAI with the prompt: “An illustration of a human heart made of translucent glass, standing on a pedestal amidst a stormy sea. Rays of sunlight pierce the clouds, illuminating the heart, revealing a tiny universe within. The quote ‘Find the universe within you’ is etched in bold letters across the horizon.”
A DALL-E 3 image provided by OpenAI with the prompt: “A middle-aged woman of Asian descent, her dark hair streaked with silver, appears fractured and splintered, intricately embedded within a sea of broken porcelain. The porcelain glistens with splatter paint patterns in a harmonious blend of glossy and matte blues, greens, oranges, and reds, capturing her dance in a surreal juxtaposition of movement and stillness. Her skin tone, a light hue like the porcelain, adds an almost mystical quality to her form.”

In comparison, Midjourney, a competing AI image synthesis model from another vendor, renders photorealistic details well, but it still requires a great deal of counter-intuitive tinkering with prompts to gain any control over the image output.

DALL-E 3 also appears to handle text within images in a way that its predecessor couldn’t (some competing models like Stable Diffusion XL and DeepFloyd are getting better at it). For example, a prompt that included the words, “An illustration of an avocado sitting in a therapist’s chair, saying ‘I feel so empty inside’ with a pit-sized hole in its center,” created a cartoon avocado with the character quote perfectly encapsulated in a speech bubble.

Notably, OpenAI says that DALL-E 3 has been “built natively” on ChatGPT and will arrive as an integrated feature of ChatGPT Plus, allowing conversational refinements to images in a way that will use the AI assistant as a brainstorming partner. It also means that ChatGPT will be able to generate images based on the context of the current conversation, which may lead to novel new capabilities. Microsoft’s Bing Chat AI assistant, also built on technology from OpenAI, has been able to generate images in conversation since March.

https://arstechnica.com/?p=1969719

OpenAI’s new AI image generator pushes the limits in detail and prompt fidelity

Evidenziatore

Ricerca avanzata

Evidenziatore

Tag

Ricerca avanzata

Related Post