Alphabet-C releases MusicFX, an AI tool for music creation: Generate a song with just one sentence

The emergence of MusicFX may disrupt the music industry and lower the barriers to music creation. However, it also brings challenges in terms of copyright, ownership, and how to verify the originality of AI-generated content.

Alphabet-C continues to challenge the music field, bringing us one step closer to an era where everyone can compose music.

On December 14th, Alphabet-C launched an AI music creation tool called "MusicFX", which allows users to generate original music with just a few sentences.

According to Alphabet-C, this creative tool called "MusicFX" combines Alphabet-C's previously released MusicLM model and DeepMind's watermarking technology SynthID. This allows for the identification of whether the music was created by AI, to some extent addressing creators' concerns about copyright issues.

It is believed that the emergence of MusicFX will also be an important milestone in the field of AI, opening up new possibilities for musicians, producers, and music enthusiasts to experiment and create various types of music:

MusicFX provides music creators with a rich collection of sound effects and audio materials. Users can create music of various genres, adjust pitch, rhythm, and volume, and add effects such as reverb and echo. Whether you want to create a soothing atmosphere or an adventurous and tense atmosphere, MusicFX can meet your needs.

Currently, MusicFX can only be accessed through Alphabet-C's AI experimental product website, AI Test Kitchen. This platform is established to allow users to experience the latest AI technology as early as possible and provide early feedback. This collaborative approach helps Alphabet-C improve its technology and adhere to ethical standards.

Media analysis suggests that the release of MusicFX not only provides a new tool for music generation but also represents a trend in the development of AI. Users' role in helping companies improve and shape artificial intelligence is becoming increasingly important. By involving users in the early stages, Alphabet-C not only enhances its technology but also proactively addresses potential ethical issues.

In addition, the emergence of MusicFX may lower the barrier to entry for music creation, allowing more enthusiasts without professional music training to participate.

However, the release of MusicFX is not without controversy. Some argue that there is still no answer to how to address the impact of AI-generated content on copyright, ownership, and music originality. Alphabet-C's decision to use watermarks in AI-generated music shows their concern about these issues, but the problem is not fully resolved. Whether AI-generated content can be considered original remains a question. For the upcoming plans, Alphabet-C stated that they will continue to improve MusicFX based on user feedback. MusicFX has the potential to redefine the way music is created and interacted with. AI Test Kitchen could become a model for future AI development, pushing for a responsible new era where technology and social values and norms go hand in hand.

How powerful is MusicLM?

Earlier this year, Alphabet-C introduced MusicLM, a powerful model that can generate music directly from text and images. It can generate music in various styles, covering almost any genre.

MusicLM is a text-conditioned audio generation model that can generate high-fidelity music from textual descriptions. It adopts a hierarchical sequence-to-sequence approach, allowing it to generate consistent music within a few minutes.

MusicLM uses three models to extract audio representations that serve as conditional inputs for autoregressive music generation: SoundStream, w2v-BERT, and MuLan.

AudioLM can be seen as the predecessor of MusicLM. MusicLM utilizes the multi-stage autoregressive modeling of AudioLM as its generation condition. It can generate music at a frequency of 24kHz based on textual descriptions and maintain this frequency within a few minutes.

Compared to AudioLM, MusicLM has more training data. The research team introduced the first evaluation dataset specifically designed for the text-to-music generation task, called MusicCaps, to address the lack of evaluation data for the task. MusicCaps was co-created by professionals and covers 5,500 music-text pairs. Based on this, Alphabet-C trained MusicLM using a music dataset of 280,000 hours.

However, previous media analysis suggests that MusicLM is certainly not perfect or still has a considerable distance to perfection. Some samples still have quality issues, and although MusicLM can technically generate vocals, including harmonies, there is still room for improvement. Most of the "lyrics" are also poor English or pure gibberish, sung by synthesized voices, resulting in strange "mixtures".

Copyright Risks of AI-generated Music: Is it considered original?

Like humans, AI occasionally takes shortcuts and directly plagiarizes these materials. How can copyright be protected?

In an experiment, Alphabet-C researchers found that approximately 1% of the music generated by the system is directly copied from the songs it was trained on. This issue is enough to make researchers hesitant to release MusicLM too early. Additionally, using collected materials for AI learning raises questions about copyright infringement. In fact, there have been relevant cases. In 2020, American rapper Jay-Z's record company issued a copyright warning to the YouTube channel Vocal Synthesis, claiming that it used AI to create songs such as Jay-Z's cover of Billy Joel's "We Didn't Start the Fire".

A white paper written by Eric Sunray of the American Music Publishers Association argues that AI music generators like MusicLM infringe on the reproduction rights of the U.S. Copyright Act by "absorbing coherent audio from the training database".

Furthermore, although AI-generated music is "original", it often resembles a mixture of different musicians' works, which means it is suspected of being plagiarized or even counterfeit.

Therefore, the ID generated by Alphabet-C using DeepMind's watermarking technology SynthID is a reflection of their attention to copyright issues. Alphabet-C states that all generated songs are accompanied by digital watermarks, which are inaudible to the human ear and do not affect the quality of the music. This is mainly achieved by converting audio waves into two-dimensional visualizations. Even if the digital watermark is subjected to destructive operations such as adding noise, compressing the audio quality, or changing the audio speed, the watermark can still be detected in the song.

However, some analysts point out that although Alphabet-C has added watermarks to prove that the music is created by AI, it still does not solve the fundamental problem: whether music generated by AI systems can be considered original works and compete on the same stage as "human-made music".

With attention and controversy, perhaps in the not-too-distant future, these questions will have clear answers.