Nvidia claims a new AI audio generator can make sounds never heard before

Nvidia says its new AI music editor can create “sounds never heard before” — like a trumpet that meows. The tool, called Fugatto, is capable of generating music, sounds, and speech using text and audio inputs it’s never been trained on.

As shown in this video embedded below, this allows Fugatto to put together songs based on wild prompts, like “Create a saxophone howling, barking then electronic music with dogs barking.”

Some other examples shared by the company include the ability to produce unique sound effects based on a description, like “Deep, rumbling bass pulses paired with intermittent, high-pitched digital chirps, like the sound of a massive sentient machine waking up.”

It can even transform the sound of someone’s voice, changing their accent or giving them a different tone, like angry or calm. There are ways to edit music, too, as Fugatto can isolate the vocals in a song, add instruments, and even change up a melody by swapping out a piano for an opera singer.

A paper released with the announcement shows the long list of all the datasets Nvidia says Fugatto was trained on, one of which includes a library of sound effects from the BBC.

To build Fugatto, Nvidia says researchers had to put together a dataset with millions of audio samples. They then created instructions “that considerably expanded the range of tasks the model could perform, while achieving more accurate performance and enabling new tasks without requiring additional data.” Nvidia doesn’t say when — or if — the tool will be widely available.

https://www.theverge.com/2024/11/25/24305584/nvidia-fugatto-ai-audio-generator-music