I tried out the new feature that allows you to create audio data by specifying the musical style and length in the composition plan of the ElevenLabs Music v2 API

I tried out the new feature that allows you to create audio data by specifying the musical style and length in the composition plan of the ElevenLabs Music v2 API

2026.06.19

This page has been translated by machine translation. View original

Hi! I'm Yuji Nishimura from the Operations Department!

In ElevenLabs' Changelog from June 15, 2026, the Music v2 API was added.

With music_v2, in addition to generating songs using only a prompt as before, you can use composition_plan to specify song sections, duration, lyrics, and style in chunk units.

In this article, I'll use the music_v2 composition plan to generate a short original J-pop demo and a 45-second orchestral piece.

What Changed in the Music v2 API

According to the Changelog, by specifying music_v2 in model_id, you can use the new model for Generate music, Stream music, Generate music detailed, and Upload music.

The major change is the chunk-based composition plan using GenerationChunk and AudioRefChunk. The official documentation explains that a music_v2 plan is treated as an ordered list of chunks, where each chunk can have lyrics, style, and duration for each section.

Simply put, here's how to choose between them:

Usage Best for
Generate with prompt only Quickly testing the vibe first
Generate with composition_plan Specifying song structure, lyrics, and duration of each section
Generate after creating a plan with /v1/music/plan Having AI draft the JSON rather than writing it from scratch yourself

Note that in the official guide for composition plans, prompt and composition_plan are not used simultaneously — you specify one or the other.

Testing Environment

The verification environment for this time is as follows:

  • Date of execution: 2026-06-17
  • OS: macOS
  • API: ElevenLabs Music API
  • Model: music_v2
  • Execution method: curl + jq
  • Output format: mp3_48000_192

The API key is set in the environment variable ELEVENLABS_API_KEY. This article does not handle the key value itself.

export ELEVENLABS_API_KEY="<YOUR_ELEVENLABS_API_KEY>"

First, Generate a Composition Plan

To start, use POST /v1/music/plan to generate a composition plan.

This endpoint returns the song structure JSON rather than the audio itself. According to the official Create composition plan, this endpoint can be used without consuming credits, but it is subject to rate limits based on your plan.

This time I tried it with a prompt to create a 10-second city pop-style instrumental.

jq -n \
  --arg prompt 'Create a 10-second instrumental city pop loop for a product demo. Structure it as a soft intro, a brighter main groove, and a clean ending. Use 95 BPM, electric piano, muted guitar, light bass, tight drums, and no vocals.' \
  '{prompt:$prompt,music_length_ms:10000,model_id:"music_v2"}' \
| curl -sS \
  -X POST 'https://api.elevenlabs.io/v1/music/plan' \
  -H "xi-api-key: ${ELEVENLABS_API_KEY}" \
  -H 'Content-Type: application/json' \
  --data-binary @- \
| jq .

The returned plan was a JSON with chunks. In practice, it was divided into three sections: Intro, Main Groove, and Ending.

{
  "chunks": [
    {
      "text": "[Soft Intro]",
      "duration_ms": 3000,
      "positive_styles": [
        "city pop",
        "instrumental",
        "95 bpm",
        "electric piano",
        "muted guitar"
      ],
      "negative_styles": [
        "vocals",
        "aggressive",
        "heavy metal"
      ],
      "context_adherence": "high"
    },
    {
      "text": "[Brighter Main Groove]",
      "duration_ms": 4500,
      "positive_styles": [
        "bright funky muted guitar riffing",
        "tight snare drum and hi-hat groove",
        "funky walking bassline"
      ],
      "negative_styles": [
        "distorted lead guitar",
        "vocals"
      ],
      "context_adherence": "high"
    },
    {
      "text": "[Clean Ending]",
      "duration_ms": 3000,
      "positive_styles": [
        "staccato electric piano stabs",
        "clean guitar tail",
        "snare hit ending on beat"
      ],
      "negative_styles": [
        "fade out",
        "reverb crash",
        "vocals"
      ],
      "context_adherence": "high"
    }
  ]
}

While there are some places in the official documentation's API reference examples where music_v1 and music_v2 plan shapes appear mixed, the response I got when running with model_id: "music_v2" this time was chunks-based.

Generating Music from a Composition Plan

Next, pass the generated plan to POST /v1/music to generate the audio file.

In the Compose music API, you can generate a song by specifying either prompt or composition_plan. The default output format for music_v2 is mp3_48000_192, but this time I specified it explicitly as a query parameter.

jq -n --slurpfile plan /tmp/elevenlabs_music_v2_plan.json '{
  model_id: "music_v2",
  composition_plan: $plan[0]
}' \
| curl -sS \
  -X POST 'https://api.elevenlabs.io/v1/music?output_format=mp3_48000_192' \
  -H "xi-api-key: ${ELEVENLABS_API_KEY}" \
  -H 'Content-Type: application/json' \
  --data-binary @- \
  -o elevenlabs_music_v2_citypop.mp3

The result was HTTP 200, and an MP3 file was generated.

Content-Type: audio/mpeg
MPEG Layer III, 192 kbps, 48 kHz, Joint Stereo
Duration: 10.512 seconds

I was able to confirm that the entire flow from plan generation to audio generation works using only the REST API.

The generated audio is below.
https://app.box.com/s/w4p0vcd8244bcq9ihpun38wuev3wu7eq

Creating a J-pop Demo

Now for the main event.

To create a short J-pop demo, I had AI write a composition plan corresponding to a J-pop song with lyrics. Without imitating any existing artists or songs, as a completely original demo, it's divided into 5 chunks: Intro, Verse, Pre-Chorus, Chorus, and Outro.

{
  "model_id": "music_v2",
  "composition_plan": {
    "chunks": [
      {
        "text": "[Intro]\n{instrumental hook}",
        "duration_ms": 4000,
        "positive_styles": [
          "modern J-pop",
          "bright female vocal production",
          "128 BPM",
          "A major",
          "sparkling electric piano",
          "clean electric guitar",
          "tight pop drums",
          "warm synth pad",
          "catchy instrumental hook",
          "polished radio mix"
        ],
        "negative_styles": [
          "existing artist imitation",
          "cover song",
          "dark",
          "heavy metal",
          "lo-fi",
          "distorted vocals"
        ],
        "context_adherence": "high"
      },
      {
        "text": "[Verse 1]\n夜明け前のホームで\nほどけた夢を結んだ",
        "duration_ms": 8000,
        "positive_styles": [
          "gentle verse",
          "clear Japanese female vocals",
          "natural Japanese pronunciation",
          "spacious phrasing",
          "soft electric piano",
          "light guitar arpeggios",
          "subtle sidechain synth",
          "steady pop groove"
        ],
        "negative_styles": [
          "rap",
          "shouting",
          "spoken word",
          "a cappella",
          "crammed lyrics",
          "rushed syllables",
          "overly dramatic vibrato"
        ],
        "context_adherence": "high"
      },
      {
        "text": "[Pre-Chorus]\n風が背中を押す",
        "duration_ms": 6000,
        "positive_styles": [
          "rising pre chorus",
          "clear Japanese female vocals",
          "natural Japanese pronunciation",
          "building drums",
          "layered harmonies",
          "uplifting synth lift",
          "snare build",
          "emotional but hopeful"
        ],
        "negative_styles": [
          "dropout",
          "quiet ending",
          "dark chord change",
          "crammed lyrics",
          "rushed syllables",
          "screaming"
        ],
        "context_adherence": "high"
      },
      {
        "text": "[Chorus]\n走れ 朝焼けライン\n胸の奥まで照らして\n新しい今日へ",
        "duration_ms": 11000,
        "positive_styles": [
          "big J-pop chorus",
          "clear Japanese female vocals",
          "natural Japanese pronunciation",
          "spacious singable topline",
          "catchy topline",
          "full band",
          "bright synth strings",
          "driving drums",
          "wide vocal harmonies",
          "uplifting melody",
          "radio-ready hook"
        ],
        "negative_styles": [
          "sad ballad",
          "minimal arrangement",
          "low energy",
          "crammed lyrics",
          "rushed syllables",
          "spoken voice"
        ],
        "context_adherence": "high"
      },
      {
        "text": "[Outro]\n{short instrumental tag}\n朝焼けライン",
        "duration_ms": 3000,
        "positive_styles": [
          "short outro",
          "clean final chord",
          "sparkling synth tail",
          "memorable pop ending"
        ],
        "negative_styles": [
          "long fade out",
          "abrupt cut",
          "dark ending"
        ],
        "context_adherence": "high"
      }
    ]
  }
}

This plan was passed directly to POST /v1/music for generation.

curl -sS \
  -X POST 'https://api.elevenlabs.io/v1/music?output_format=mp3_48000_192' \
  -H "xi-api-key: ${ELEVENLABS_API_KEY}" \
  -H 'Content-Type: application/json' \
  --data-binary @composition-plan.json \
  -o elevenlabs_jpop_asayake-line.mp3

The generation results are as follows.

HTTP 200
Content-Type: audio/mpeg
File size: 769,005 bytes
Format: MP3, 48 kHz, 192 kbps
Duration: 32.040 seconds

The generated song is below.
There are some rough spots, but the song came out nicely.
https://app.box.com/s/51wq9sd7px5dhpajza133gbij5ab3aji

Also Creating a 45-Second Song by Changing the Style and Length

The composition plan is also easy to work with when you want to change the style or length.

As a separate experiment from the J-pop demo, I also tried creating a 45-second orchestral piece with a specified duration that starts with the main melody immediately from the very beginning without an intro. The premise is a completely original fantasy RPG BGM, not a recreation of any existing game work or composer.

The full plan is long, so here I'll just highlight the key points.

{
  "model_id": "music_v2",
  "composition_plan": {
    "chunks": [
      {
        "text": "[Main Theme - immediate start]\n{full heroic orchestral theme begins on the first beat}",
        "duration_ms": 12000,
        "positive_styles": [
          "original fantasy RPG orchestral music",
          "immediate main theme",
          "no intro",
          "grand symphonic arrangement",
          "full strings",
          "heroic brass fanfare",
          "timpani and cymbal swells",
          "cinematic game soundtrack",
          "majestic and adventurous",
          "140 BPM"
        ],
        "negative_styles": [
          "existing game franchise imitation",
          "cover song",
          "recognizable melody",
          "slow intro",
          "fade in",
          "lyrics"
        ],
        "context_adherence": "high"
      },
      {
        "text": "[Rising Adventure]\n{strings drive forward while brass answers the main theme}",
        "duration_ms": 10000,
        "positive_styles": [
          "driving ostinato strings",
          "French horns answer melody",
          "snare and timpani rhythm",
          "heroic journey feeling",
          "fantasy world map energy"
        ],
        "context_adherence": "high"
      },
      {
        "text": "[Heroic Lift]\n{main theme modulates upward with brighter brass and soaring strings}",
        "duration_ms": 10000,
        "positive_styles": [
          "soaring violin melody",
          "bright brass counter melody",
          "wordless choir texture",
          "uplifting modulation",
          "large fantasy battle theme"
        ],
        "context_adherence": "high"
      },
      {
        "text": "[Climax]\n{full orchestra peaks with percussion hits and triumphant cadence}",
        "duration_ms": 10000,
        "positive_styles": [
          "full orchestra climax",
          "triumphant brass",
          "rapid string runs",
          "crash cymbals",
          "timpani rolls"
        ],
        "context_adherence": "high"
      },
      {
        "text": "[Final Hit]\n{short decisive orchestral ending with a resonant final chord}",
        "duration_ms": 3000,
        "positive_styles": [
          "decisive final chord",
          "orchestral hit",
          "short reverb tail",
          "complete ending",
          "no fade out"
        ],
        "context_adherence": "high"
      }
    ]
  }
}

This plan is also just passed to the same POST /v1/music.

curl -sS \
  -X POST 'https://api.elevenlabs.io/v1/music?output_format=mp3_48000_192' \
  -H "xi-api-key: ${ELEVENLABS_API_KEY}" \
  -H 'Content-Type: application/json' \
  --data-binary @fantasy-rpg-plan.json \
  -o elevenlabs_fantasy-rpg_45s_no-intro.mp3

The generation results are as follows.

HTTP 200
Content-Type: audio/mpeg
File size: 1,080,621 bytes
Format: MP3, 48 kHz, 192 kbps
Duration: 45.024 seconds

From a 32-second J-pop demo to a 45-second grand orchestral piece, it was possible to generate both from the same API simply by changing duration_ms and the style specifications in the JSON. Since the style, duration, and how the piece begins can all be included in the request, it seems well-suited for quickly creating short BGM assets or demo tracks.

The generated song is below.
https://app.box.com/s/ow3pc4qmjc9pux8jm74jmh3z6c630nm6

What I Learned from Actually Trying It

Being Able to Handle Song Structure as JSON Is Convenient

When generating songs with a prompt alone, even if you write something like "include a verse, pre-chorus, and chorus," how well the actual timing and section breaks are reflected tends to be left up to the model.

With a composition plan, you can at least explicitly state your intent on the request side — for example, Intro is 4 seconds, Verse is 8 seconds, Pre-Chorus is 6 seconds, Chorus is 11 seconds, Outro is 3 seconds.

In music_v2, section duration is treated as always being respected, and respect_sections_durations is described as a parameter for music_v1. This also makes things clearer when you want to create songs with structure specifications in mind.

Japanese Lyrics Could Be Passed as-is

In this J-pop demo, I entered Japanese lyrics directly into text. In my brief verification, no API error occurred and an MP3 was generated.

However, the intelligibility of the lyrics and the naturalness of the pronunciation are affected by the style, duration, and amount of lyrics. For short demos, it seems easier to handle if you don't pack too many lyrics into a single chunk. There are also cases of kanji being mispronounced, so minor adjustments like converting lyrics to hiragana may still be necessary.

You Can Experiment While Changing Style and Length

For the J-pop demo, I generated a 32.040-second MP3 with a structure of Intro, Verse, Pre-Chorus, Chorus, and Outro. Next, using the same endpoint, I passed a plan in a completely different direction — no intro, 45 seconds, a grand fantasy RPG-style orchestral piece — and a 45.024-second MP3 was generated.

Being able to create this difference with just API requests is convenient. Rather than roughly specifying style and length in text, you can write chunk-by-chunk instructions like "12 seconds here," "build up from here," and "wrap up in 3 seconds at the end," making it well-suited for creating short BGM tailored to specific use cases.

negative_styles Also Helps with Fine-Tuning

In this plan, I included things like existing artist imitation, cover song, a cappella, and spoken voice in negative_styles.

With music generation, being able to write not just "what you want to include" but also "what you want to avoid" makes trial and error easier. Especially when generating original songs, it's useful to not only avoid specifications that lean toward existing artists or songs, but also to explicitly state them on the negative side.

I'd Like to Try the Detailed Endpoint Separately

This time I received an MP3 using the standard POST /v1/music.

With Generate music detailed, a multipart/mixed response containing JSON metadata and a binary audio file is returned, so it seems convenient if you also want to handle the metadata of the generated result.

However, since it seemed better to use an SDK or a small parsing script rather than handling it with curl alone, I limited this time to standard audio generation.

Notes

Plan Generation Also Failed Without an API Key

/v1/music/plan is described as not consuming credits, but in my environment this time, calling it without an API key resulted in HTTP 401.

{
  "status": "needs_authorization",
  "message": "Neither authorization header nor xi-api-key received, please provide one."
}

It seems better to treat "not consuming credits" and "usable without authentication" as separate things.

Check Your Plan and Terms for Public or Commercial Use

In this article, I generated short demos for technical verification purposes.

If you plan to publish the generated content or incorporate it into videos, advertisements, or products, please check your current plan, commercial use conditions, and how to handle outputs on the official Pricing page and terms of service.

Summary

Using the ElevenLabs Music v2 API's composition plan, I generated a short original J-pop demo and a 45-second orchestral piece.

Within the scope I tested, the composition_plan in music_v2 was convenient for cases where you want to handle song structure as JSON. In particular, being able to separately specify the duration, lyrics, desired styles, and styles to avoid for each section makes it easier to organize your intent compared to single-prompt generation.

On the other hand, the final intelligibility of the lyrics and the impression of the melody need to be judged by listening to the generated results. It seems easiest to first create a composition draft with /v1/music/plan, generate a short version, and then adjust the plan while changing the style and length.

Next, I'd like to handle responses with metadata using POST /v1/music/detailed and also compare the differences between prompt-only generation and composition plan specification.

I hope this is helpful to someone.


Reference Links:

Share this article