XonaXona Docs
OverviewResourcesxPayInference

Audio

Music generation, text-to-speech, and speech-to-text.

Music generation, text-to-speech, and speech-to-text. 3 endpoints, each gated by the x402 payment protocol and settled in USDC. Connect the same endpoints over the MCP server for agent-to-agent use.

ElevenLabs Music

AI music generation using ElevenLabs Music (via Replicate). Dynamic pricing: $1 per 120 seconds, max 3 minutes.

POST /audio/elevenlabs-music

Price

$1/120s (max $1.50)

Network

Solana

x402

v2

Request parameters

ParameterTypeRequiredDescription
prompttextareaYesDescription of the music to generate
output_formatstringNoOutput format (e.g. mp3_standard)
music_length_msnumberNoLength of the generated music. Price is $1 per 120 seconds (max 3 min).
force_instrumentalbooleanNoForce instrumental output

Response

Returns music_url (CDN URL), duration_seconds, and metadata

Example

curl -X POST "https://api.xona-agent.com/audio/elevenlabs-music" \
  -H "Content-Type: application/json" \
  -H "X-PAYMENT: <x402-payment-payload>" \
  -d '{
  "prompt": "Upbeat electronic dance music",
  "output_format": "mp3_standard",
  "music_length_ms": 30000,
  "force_instrumental": false
}'

Speech-to-Text

Transcribe audio from an HTTPS URL using OpenAI GPT-4o Transcribe (via Replicate).

POST /audio/speech-to-text

Price

$0.02

Network

Solana

x402

v2

Request parameters

ParameterTypeRequiredDescription
audio_filestringYesHTTPS URL of the audio file to transcribe (e.g. MP3)
languagestringNoLanguage code (optional, default: en)

Response

Returns text (transcript) and metadata (model, language)

Example

curl -X POST "https://api.xona-agent.com/audio/speech-to-text" \
  -H "Content-Type: application/json" \
  -H "X-PAYMENT: <x402-payment-payload>" \
  -d '{
  "audio_file": "https://example.com/recording.mp3",
  "language": "en"
}'

X Text-to-Speech

X.AI text-to-speech: convert text to speech (MP3).

POST /audio/x-text-to-speech

Price

$0.0001

Network

Solana

x402

v2

Request parameters

ParameterTypeRequiredDescription
texttextareaYesText to convert to speech
voice_idselectNoVoice for speech synthesis One of: Eve, Ara, Leo, Rex, Sal.
output_formatstringNoOptional: codec, sample_rate, bit_rate

Response

Returns audio_url (CDN URL), duration_seconds, and metadata

Example

curl -X POST "https://api.xona-agent.com/audio/x-text-to-speech" \
  -H "Content-Type: application/json" \
  -H "X-PAYMENT: <x402-payment-payload>" \
  -d '{
  "text": "Hello! This is a text-to-speech demo.",
  "voice_id": "Eve",
  "output_format": "your_output_format_here"
}'

On this page