NVIDIA launches new AI model Fugatto, capable of modifying and generating new sounds

NVIDIA has launched a new AI model called Fugatto, aimed at generating and modifying music and audio for music, film, and video game production. This model can create music snippets based on text and audio files, alter the accent and emotion of sounds, and even generate new sounds. Fugatto uses 2.5 billion parameters and was trained on Nvidia DGX systems, taking over a year. This technology may compete with similar products from companies like Meta

According to Zhitong Finance APP, NVIDIA (NVDA.US) has launched a new artificial intelligence (AI) model for generating music and audio, aimed at serving those who create music, movies, and video games.

According to NVIDIA, this model is named Fugatto (Foundational Generative Audio Transformer Opus) and can generate or modify music and sounds using any text and audio files.

For example, the model can create music snippets based on text prompts, remove or add instruments from existing songs, change accents or emotions in voices, and even produce sounds that have never been heard before.

Rafael Valle, NVIDIA's audio research manager, conductor, and composer, stated, "We want to create a model that understands and produces sound like a human."

NVIDIA pointed out that advertising agencies can use Fugatto to quickly localize existing advertisements for multiple regions and incorporate different accents and emotions into voiceovers. Additionally, video game developers can use the AI model to modify pre-recorded assets in games to adapt to users' constantly changing actions while playing.

Fugatto can make a trumpet bark like a dog or a saxophone meow like a cat. The company added that through fine-tuning and a small amount of singing data, researchers found it can handle untrained tasks, such as generating high-quality singing from text.

NVIDIA stated that the full version of Fugatto uses 2.5 billion parameters and was trained on an Nvidia DGX system containing 32 Nvidia H100 Tensor Core GPUs. The overall work on the model took more than a year.

Fugatto may compete with similar technologies from startups like Runway and large companies like Meta Platforms (META.US). In October, Meta released an AI model called Movie Gen, which can create realistic video and audio clips based on user prompts.

In February of this year, OpenAI, the maker of ChatGPT, launched Sora, which can create realistic and imaginative scenes based on text instructions. The Microsoft (MSFT.US)-backed company has yet to release its text-to-video model to the public