TTS Arena: Benchmarking TTS Models in the Wild

Vote to help the community find the best available text-to-speech model!

Vote

  • Input text (English only) to synthesize audio (or press 🎲 for random text).
  • Listen to the two audio clips, one after the other.
  • Vote on which audio sounds more natural to you.
  • Note: Model names are revealed after the vote is cast. Note: It may take up to 30 seconds to synthesize audio.

If you use this data in your publication, please cite us!

Copy the BibTeX citation to cite this source:

@misc{tts-arena,
        title        = {Text to Speech Arena},
        author       = {mrfakename and Srivastav, Vaibhav and Fourrier, Clémentine and Pouget, Lucain and Lacombe, Yoach and main and Gandhi, Sanchit},
        year         = 2024,
        publisher    = {Hugging Face},
        howpublished = "\url{https://huggingface.co/spaces/TTS-AGI/TTS-Arena}"
}

Please note that all generated audio clips should be assumed unsuitable for redistribution or commercial use.

Keyboard Shortcuts

Global:

  • ? or Shift + / - Show this help dialog
  • Esc - Close this dialog

Vote & Battle Mode:

  • r - Generate random text
  • Ctrl/Cmd + Enter - Synthesize text
  • a - Vote for option A
  • b - Vote for option B