Audio

Music generation, text-to-speech, and speech-to-text. 3 endpoints, each gated by the x402 payment protocol and settled in USDC. Connect the same endpoints over the MCP server for agent-to-agent use.

ElevenLabs Music

AI music generation using ElevenLabs Music (via Replicate). Dynamic pricing: $1 per 120 seconds, max 3 minutes.

POST /audio/elevenlabs-music

Price

$1/120s (max $1.50)

Network

Solana

x402

Request parameters

Parameter	Type	Required	Description
`prompt`	`textarea`	Yes	Description of the music to generate
`output_format`	`string`	No	Output format (e.g. mp3_standard)
`music_length_ms`	`number`	No	Length of the generated music. Price is $1 per 120 seconds (max 3 min).
`force_instrumental`	`boolean`	No	Force instrumental output

Response

Returns music_url (CDN URL), duration_seconds, and metadata

Example

curl -X POST "https://api.xona-agent.com/audio/elevenlabs-music" \
  -H "Content-Type: application/json" \
  -H "X-PAYMENT: <x402-payment-payload>" \
  -d '{
  "prompt": "Upbeat electronic dance music",
  "output_format": "mp3_standard",
  "music_length_ms": 30000,
  "force_instrumental": false
}'

Speech-to-Text

Transcribe audio from an HTTPS URL using OpenAI GPT-4o Transcribe (via Replicate).

POST /audio/speech-to-text

Price

$0.02

Network

Solana

x402

Request parameters

Parameter	Type	Required	Description
`audio_file`	`string`	Yes	HTTPS URL of the audio file to transcribe (e.g. MP3)
`language`	`string`	No	Language code (optional, default: en)

Response

Returns text (transcript) and metadata (model, language)

Example

curl -X POST "https://api.xona-agent.com/audio/speech-to-text" \
  -H "Content-Type: application/json" \
  -H "X-PAYMENT: <x402-payment-payload>" \
  -d '{
  "audio_file": "https://example.com/recording.mp3",
  "language": "en"
}'

X Text-to-Speech

X.AI text-to-speech: convert text to speech (MP3).

POST /audio/x-text-to-speech

Price

$0.0001

Network

Solana

x402

Request parameters

Parameter	Type	Required	Description
`text`	`textarea`	Yes	Text to convert to speech
`voice_id`	`select`	No	Voice for speech synthesis One of: `Eve`, `Ara`, `Leo`, `Rex`, `Sal`.
`output_format`	`string`	No	Optional: codec, sample_rate, bit_rate

Response

Returns audio_url (CDN URL), duration_seconds, and metadata

Example

curl -X POST "https://api.xona-agent.com/audio/x-text-to-speech" \
  -H "Content-Type: application/json" \
  -H "X-PAYMENT: <x402-payment-payload>" \
  -d '{
  "text": "Hello! This is a text-to-speech demo.",
  "voice_id": "Eve",
  "output_format": "your_output_format_here"
}'

ElevenLabs Music

Price

Network

x402

Speech-to-Text

Price

Network

x402

X Text-to-Speech

Price

Network

x402

On this page