Audio
Music generation, text-to-speech, and speech-to-text.
Music generation, text-to-speech, and speech-to-text. 3 endpoints, each gated by the x402 payment protocol and settled in USDC. Connect the same endpoints over the MCP server for agent-to-agent use.
ElevenLabs Music
AI music generation using ElevenLabs Music (via Replicate). Dynamic pricing: $1 per 120 seconds, max 3 minutes.
POST /audio/elevenlabs-musicPrice
$1/120s (max $1.50)
Network
Solana
x402
v2
Request parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
prompt | textarea | Yes | Description of the music to generate |
output_format | string | No | Output format (e.g. mp3_standard) |
music_length_ms | number | No | Length of the generated music. Price is $1 per 120 seconds (max 3 min). |
force_instrumental | boolean | No | Force instrumental output |
Response
Returns music_url (CDN URL), duration_seconds, and metadata
Example
curl -X POST "https://api.xona-agent.com/audio/elevenlabs-music" \
-H "Content-Type: application/json" \
-H "X-PAYMENT: <x402-payment-payload>" \
-d '{
"prompt": "Upbeat electronic dance music",
"output_format": "mp3_standard",
"music_length_ms": 30000,
"force_instrumental": false
}'Speech-to-Text
Transcribe audio from an HTTPS URL using OpenAI GPT-4o Transcribe (via Replicate).
POST /audio/speech-to-textPrice
$0.02
Network
Solana
x402
v2
Request parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
audio_file | string | Yes | HTTPS URL of the audio file to transcribe (e.g. MP3) |
language | string | No | Language code (optional, default: en) |
Response
Returns text (transcript) and metadata (model, language)
Example
curl -X POST "https://api.xona-agent.com/audio/speech-to-text" \
-H "Content-Type: application/json" \
-H "X-PAYMENT: <x402-payment-payload>" \
-d '{
"audio_file": "https://example.com/recording.mp3",
"language": "en"
}'X Text-to-Speech
X.AI text-to-speech: convert text to speech (MP3).
POST /audio/x-text-to-speechPrice
$0.0001
Network
Solana
x402
v2
Request parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
text | textarea | Yes | Text to convert to speech |
voice_id | select | No | Voice for speech synthesis One of: Eve, Ara, Leo, Rex, Sal. |
output_format | string | No | Optional: codec, sample_rate, bit_rate |
Response
Returns audio_url (CDN URL), duration_seconds, and metadata
Example
curl -X POST "https://api.xona-agent.com/audio/x-text-to-speech" \
-H "Content-Type: application/json" \
-H "X-PAYMENT: <x402-payment-payload>" \
-d '{
"text": "Hello! This is a text-to-speech demo.",
"voice_id": "Eve",
"output_format": "your_output_format_here"
}'