On Friday, researchers from Nvidia announced Magic3D, an AI model that can generate 3D models from text descriptions. After entering a prompt such as, “A blue poison-dart frog sitting on a water lily,” Magic3D generates a 3D mesh model, complete with colored texture, in about 40 minutes. With modifications, the resulting model can be used in video games or CGI art scenes.
In its academic paper, Nvidia frames Magic3D as a response to DreamFusion, a text-to-3D model that Google researchers announced in September. Similar to how DreamFusion uses a text-to-image model to generate a 2D image that then gets optimized into volumetric NeRF (Neural radiance field) data, Magic3D uses a two-stage process that takes a coarse model generated in low resolution and optimizes it to higher resolution. According to the paper’s authors, the resulting Magic3D method can generate 3D objects two times faster than DreamFusion.
Magic3D can also perform prompt-based editing of 3D meshes. Given a low-resolution 3D model and a base prompt, it is possible to alter the text to change the resulting model. Also, Magic3D’s authors demonstrate preserving the same subject throughout several generations (a concept often called coherence) and applying the style of a 2D image (such as a cubist painting) to a 3D model.
Nvidia did not release any Magic3D code along with its academic paper.
The ability to generate 3D from text feels like a natural evolution in today’s diffusion models, which use neural networks to synthesize novel content after intense training on a body of data. In 2022 alone, we’ve seen the emergence of capable text-to-image models such as DALL-E and Stable Diffusion and rudimentary text-to-video generators from Google and Meta. Google also debuted the aforementioned text-to-3D model DreamFusion two months ago, and since then, people have adapted similar techniques to work with as an open source model based on Stable Diffusion.
As for Magic3D, the researchers behind it hope that it will allow anyone to create 3D models without the need for special training. Once refined, the resulting technology could speed up video game (and VR) development and perhaps eventually find applications in special effects for film and TV. Near the end of their paper, they write, “We hope with Magic3D, we can democratize 3D synthesis and open up everyone’s creativity in 3D content creation.”
https://arstechnica.com/?p=1899233