Difference between revisions of "Music"

From
Jump to: navigation, search
m
m
Line 52: Line 52:
  
 
* <b>Deep Learning Core:</b>
 
* <b>Deep Learning Core:</b>
 
 
** Architecture: All three likely utilize deep neural network architectures, specifically architectures suited for audio generation tasks like LSTMs (Long Short-Term Memory) or Transformers. These networks excel at capturing long-term dependencies within music data.
 
** Architecture: All three likely utilize deep neural network architectures, specifically architectures suited for audio generation tasks like LSTMs (Long Short-Term Memory) or Transformers. These networks excel at capturing long-term dependencies within music data.
 
** Training Data: The AI has been trained on colossal amounts of music data. This data encompasses various genres, instruments, and musical styles. The data is pre-processed, segmented, and fed into the neural network.
 
** Training Data: The AI has been trained on colossal amounts of music data. This data encompasses various genres, instruments, and musical styles. The data is pre-processed, segmented, and fed into the neural network.
Line 58: Line 57:
  
 
* <b>Text-to-Music Translation:</b>
 
* <b>Text-to-Music Translation:</b>
 
 
** Text Encoding: When you provide a text prompt, it's converted into a numerical representation the AI can understand. This might involve techniques like word embedding, where each word is mapped to a unique vector based on its meaning and context within the prompt.
 
** Text Encoding: When you provide a text prompt, it's converted into a numerical representation the AI can understand. This might involve techniques like word embedding, where each word is mapped to a unique vector based on its meaning and context within the prompt.
 
** Conditional Generation: The encoded prompt acts as a condition for the music generation process. The AI leverages the prompt information alongside its knowledge of music patterns learned from the training data.
 
** Conditional Generation: The encoded prompt acts as a condition for the music generation process. The AI leverages the prompt information alongside its knowledge of music patterns learned from the training data.
Line 64: Line 62:
  
 
* <b>Beyond the Basics:</b> While these are the core functionalities, there's likely more happening under the hood.  These AI models might also incorporate techniques for:
 
* <b>Beyond the Basics:</b> While these are the core functionalities, there's likely more happening under the hood.  These AI models might also incorporate techniques for:
 
 
** Music Style Transfer: Transferring stylistic elements from a reference piece of music you provide.
 
** Music Style Transfer: Transferring stylistic elements from a reference piece of music you provide.
 
** Melody and Harmony Generation: Independently generating unique melodies and chord progressions that align with the prompt and genre.
 
** Melody and Harmony Generation: Independently generating unique melodies and chord progressions that align with the prompt and genre.

Revision as of 17:30, 19 April 2024

Youtube search... ...Google search

Leverage the power of artificial intelligence, specifically a kind of AI called Deep learning. Typically these tools have been trained on massive datasets of music, allowing them to identify patterns and recreate them based on your prompts.

Let's peel back the layers and explore the technical aspects of how these AI music generators work:

  • Deep Learning Core:
    • Architecture: All three likely utilize deep neural network architectures, specifically architectures suited for audio generation tasks like LSTMs (Long Short-Term Memory) or Transformers. These networks excel at capturing long-term dependencies within music data.
    • Training Data: The AI has been trained on colossal amounts of music data. This data encompasses various genres, instruments, and musical styles. The data is pre-processed, segmented, and fed into the neural network.
    • Learning Process: During training, the network learns to identify patterns and relationships within the music data. It can recognize how melodies progress, harmonies interact, and rhythms develop across different musical styles.
  • Text-to-Music Translation:
    • Text Encoding: When you provide a text prompt, it's converted into a numerical representation the AI can understand. This might involve techniques like word embedding, where each word is mapped to a unique vector based on its meaning and context within the prompt.
    • Conditional Generation: The encoded prompt acts as a condition for the music generation process. The AI leverages the prompt information alongside its knowledge of music patterns learned from the training data.
    • Music Generation Loop: The AI iteratively generates musical elements like notes, rhythms, and instrument timbres. At each step, it evaluates the generated sequence against the prompt and the learned music patterns, refining its output until a coherent musical piece emerges.
  • Beyond the Basics: While these are the core functionalities, there's likely more happening under the hood. These AI models might also incorporate techniques for:
    • Music Style Transfer: Transferring stylistic elements from a reference piece of music you provide.
    • Melody and Harmony Generation: Independently generating unique melodies and chord progressions that align with the prompt and genre.
    • Real-time Music Generation: Potentially enabling interactive music generation where the AI responds to user input in real-time.



Udio

Udio promises powerful creation tools and high-fidelity audio. It allows you to describe the kind of music you want through prompts, similar to giving instructions to a musical genie. Allows browsing music by genre and liking songs. Users can also create playlists. Some of the most popular tracks on Udio include "Lorem Ipsum Dolor Sit Amet" by SirBitesalot, "I Hate You With All Your Heart" by jakemarsh, and "My Man Sh*% In His Pants At Work" by Mitch Burger And The Fries.

Suno.ai

Suno boasts a loyal following for its ability to craft impressive music in various styles. It excels at understanding your prompts and translating them into cohesive pieces.

"It’s one of the most popular AI music tools available. And more often than not, it’s my first choice when it comes to music creation. Suno allows you to input your own lyrics (or have ChatGPT write you some) and it lets you select the music style, which you can then customise. That’s more than enough to create a decent AI song! If you’re looking to create unique compositions and experiment with different musical translations, this one’s ideal for you!", AI Andy

Stable Audio

Stable Audio is a music generation tool from Stability AI that uses latent diffusion to create high-quality, 44.1 kHz music for commercial use. Latent diffusion is a type of generative AI that works by gradually introducing noise into a latent representation of a desired output. The model then learns to remove the noise, resulting in a generated output that resembles the desired output. Stable Audio's latent diffusion architecture is conditioned on text metadata as well as audio file duration and start time. This allows the model to generate audio of a specified length and style, and to ensure that the generated audio is musically coherent. Stable Audio is still under development, but it has already been used to generate music for a variety of projects, including video games, films, and commercials. Here are some of the key features of Stable Audio:

  • High-quality music: Stable Audio can generate music that is comparable to the quality of human-composed music.
  • Control over the content and length: Users can specify the desired style, mood, and length of the generated music.
  • Ease of use: Stable Audio has a simple and intuitive web interface.
  • Commercial use: Stable Audio is designed for commercial use, and users can generate and download tracks for commercial projects.

MusicLM


Google also shows off MusicLM's "long generation" (creating five-minute music clips from a simple prompt), "story mode" (which takes a sequence of text prompts and turns it into a morphing series of musical tunes), "text and melody conditioning" (which takes a human humming or whistling audio input and changes it to match the style laid out in a prompt), and generating music that matches the mood of image captions. ... MusicLM: Google AI generates music in various genres at 24 kHz | Benj Edwards - Ars Technica


Slow tempo, bass-and-drums-led reggae song. Sustained electric guitar. High-pitched bongos with ringing tones. Vocals are relaxed with a laid-back feel, very expressive.


Neurorack

The first deep AI based synthesizer. We developed the first musical audio synthesizer combining the power of deep generative models and the compacity of Eurorack format; comes in many formats and more specifically in the Eurorack format. The current prototype relies on the NVIDIA Jetson Nano. The goal of this project is to design the next generation of music instrument, providing a new tool for musician while enhancing the musician's creativity. It proposes a novel approach to think and compose music. We deeply think that AI can be used to achieve this quest. The Eurorack hardware and software have been developed by our team, with equal contributions from Ninon Devis, Philippe Esling and Martin Vert.

OpenAI JukeBox AI

Making Music

  • Boomy AI ... select the genre, choose the mood, and create original songs in seconds

Drums

Text-to-Song

Siraj Raval

Past