Introduction
qwen-tts is a cross-platform command-line interface for Qwen3-TTS, Alibaba's state-of-the-art text-to-speech model. It provides a simple, ergonomic way to generate natural-sounding speech from your terminal.
Key Features
- Text-to-speech -- Convert any text or file to spoken audio with a single command.
- Voice design -- Describe a voice in plain English (e.g., "a deep, calm British narrator") and the model will synthesize speech in that style.
- Voice cloning -- Provide a short reference audio clip and qwen-tts will reproduce that voice for new text.
- Cross-platform -- Runs on macOS (Apple Silicon and Intel), Linux, and Windows. Automatically selects the best backend for your hardware: MLX on Apple Silicon, CUDA on NVIDIA GPUs, or CPU fallback.
- Saved voices -- Enroll reference audio clips as named voices so you can reuse them without specifying file paths every time.
- Auto-play -- Generated audio plays immediately by default. Disable this with a single config toggle.
How It Works
qwen-tts is a Rust CLI that orchestrates a Python-based TTS pipeline under the hood. It manages model downloads from Hugging Face, handles configuration, and delegates the actual inference to either the mlx_audio package (on Apple Silicon) or a PyTorch-based generation script (on CUDA and CPU platforms).
Next Steps
Installation
Prerequisites
Before installing qwen-tts, make sure you have the following:
- Python 3.10+ -- Required for the TTS inference backend.
- Rust / Cargo -- Required to compile the CLI. Install from rustup.rs.
- git -- Required for model downloads and install scripts.
Option 1: One-liner (macOS / Linux)
curl -fsSL https://raw.githubusercontent.com/andreisuslov/qwen-tts/main/scripts/install.sh | bash
This script will:
- Install the
qwen-ttsbinary viacargo install. - Create a Python virtual environment at
~/.qwen-tts/venv. - Install the required Python dependencies into the venv.
- Run
qwen-tts config initto generate a default configuration.
Option 2: One-liner (Windows)
Open PowerShell and run:
irm https://raw.githubusercontent.com/andreisuslov/qwen-tts/main/scripts/install.ps1 | iex
This performs the same steps as the macOS/Linux script, adapted for Windows paths and tooling.
Option 3: Manual Installation
If you prefer to install manually or need more control over the process:
1. Install the CLI
cargo install --git https://github.com/andreisuslov/qwen-tts
2. Create the Python virtual environment
python3 -m venv ~/.qwen-tts/venv
3. Install Python dependencies
On macOS Apple Silicon (MLX backend):
~/.qwen-tts/venv/bin/pip install mlx-audio huggingface-hub
On Linux / Windows with NVIDIA GPU (CUDA backend):
~/.qwen-tts/venv/bin/pip install torch transformers huggingface-hub
On CPU-only systems:
~/.qwen-tts/venv/bin/pip install torch transformers huggingface-hub --extra-index-url https://download.pytorch.org/whl/cpu
4. Initialize configuration
qwen-tts config init
This creates ~/.config/qwen-tts/config.toml with auto-detected platform settings, and sets up the directory structure at ~/.qwen-tts/.
Verifying the Installation
qwen-tts --version
qwen-tts config show
The config show command will print your current configuration, including the detected backend. If everything is correct, proceed to the Quick Start.
Quick Start
This page walks you through generating your first speech output in three commands.
1. Initialize configuration
If you used one of the install scripts, this step is already done. Otherwise:
qwen-tts config init
This auto-detects your platform and backend, then writes a config file to ~/.config/qwen-tts/config.toml.
2. Download a model
qwen-tts models download --variant pro
The pro variant downloads the full-precision model. Use --variant lite for a smaller quantized model (MLX only: 4-bit; recommended if disk space or memory is limited).
Model files are saved to ~/.qwen-tts/models/pro/ (or lite/).
3. Generate speech
qwen-tts speak "Hello, world!"
That's it. The audio is saved to ~/.qwen-tts/outputs/ and plays automatically.
What's next?
Try a few more things:
# Use a specific emotion
qwen-tts speak "I can't believe we did it!" --emotion "Excited"
# Read text from a file
qwen-tts speak --file article.txt --output narration.wav
# Design a voice from a description
qwen-tts design "A warm, friendly female narrator" --text "Welcome to the show."
# Clone a voice from a reference clip
qwen-tts clone --ref speaker.wav --ref-text "Hello, my name is Alex." --text "Now I can say anything."
For complete details on every command and flag, see the Commands section.
speak
Generate speech from text using the Qwen3-TTS model.
Usage
qwen-tts speak [TEXT] [OPTIONS]
Arguments
| Argument | Description |
|---|---|
TEXT | The text to speak. Optional if --file is provided. |
Options
| Option | Description |
|---|---|
--file <PATH> | Read the input text from a file instead of the command line. |
--voice <NAME> | Voice name for the speaker identity. Uses the default_voice config value if not specified (default: Vivian). |
--emotion <STYLE> | Emotion or style instruction, such as "Excited", "Calm", or "Whispered". When set, the model is prompted to speak with the given emotion. |
--speed <FLOAT> | Speech speed multiplier. 1.0 is normal speed. Values below 1.0 slow down; above 1.0 speed up. Uses the default_speed config value if not specified. |
-o, --output <PATH> | Output file path. If omitted, a timestamped .wav file is written to the configured output_dir (default: ~/.qwen-tts/outputs/). |
Examples
Basic text-to-speech:
qwen-tts speak "The quick brown fox jumps over the lazy dog."
With a specific voice and emotion:
qwen-tts speak "Breaking news from the capital." --voice "Ethan" --emotion "Serious"
Read from a file and save to a specific path:
qwen-tts speak --file chapter1.txt --output chapter1.wav
Slow down the speech:
qwen-tts speak "Take your time." --speed 0.8
Behavior
- Text is resolved from the positional argument or
--file(positional takes priority). - A voice instruction is built from the
--voiceand optional--emotionflags. - The TTS backend generates a
.wavfile. - If
auto_playis enabled in the config, the audio plays immediately after generation.
speak
Generate speech from text using the Qwen3-TTS model.
Usage
qwen-tts speak [TEXT] [OPTIONS]
Arguments
| Argument | Description |
|---|---|
TEXT | The text to speak. Optional if --file is provided. |
Options
| Option | Description |
|---|---|
--file <PATH> | Read the input text from a file instead of the command line. |
--voice <NAME> | Voice name for the speaker identity. Uses the default_voice config value if not specified (default: Vivian). |
--emotion <STYLE> | Emotion or style instruction, such as "Excited", "Calm", or "Whispered". When set, the model is prompted to speak with the given emotion. |
--speed <FLOAT> | Speech speed multiplier. 1.0 is normal speed. Values below 1.0 slow down; above 1.0 speed up. Uses the default_speed config value if not specified. |
-o, --output <PATH> | Output file path. If omitted, a timestamped .wav file is written to the configured output_dir (default: ~/.qwen-tts/outputs/). |
Examples
Basic text-to-speech:
qwen-tts speak "The quick brown fox jumps over the lazy dog."
With a specific voice and emotion:
qwen-tts speak "Breaking news from the capital." --voice "Ethan" --emotion "Serious"
Read from a file and save to a specific path:
qwen-tts speak --file chapter1.txt --output chapter1.wav
Slow down the speech:
qwen-tts speak "Take your time." --speed 0.8
Behavior
- Text is resolved from the positional argument or
--file(positional takes priority). - A voice instruction is built from the
--voiceand optional--emotionflags. - The TTS backend generates a
.wavfile. - If
auto_playis enabled in the config, the audio plays immediately after generation.
design
Design a voice from a free-form text description and generate speech with it.
Instead of choosing from a fixed set of voice names, you describe the voice you want in natural language. The model interprets the description and synthesizes speech that matches it.
Usage
qwen-tts design <DESCRIPTION> [OPTIONS]
Arguments
| Argument | Description |
|---|---|
DESCRIPTION | Required. A text description of the desired voice (e.g., "A deep calm British narrator", "An energetic young woman"). |
Options
| Option | Description |
|---|---|
--text <STRING> | The text to speak with the designed voice. |
--file <PATH> | Read the text to speak from a file. |
--speed <FLOAT> | Speech speed multiplier (default: config value, typically 1.0). |
-o, --output <PATH> | Output file path. If omitted, a timestamped .wav file is written to the configured output_dir. |
Note: You must provide either
--textor--file. If neither is given, the command will return an error.
Examples
Design a voice and speak a sentence:
qwen-tts design "A warm, friendly male voice with a slight Southern accent" --text "Howdy, partner."
Read text from a file:
qwen-tts design "A crisp, professional female newsreader" --file headlines.txt -o news.wav
How It Works
The description string is passed directly as the instruction prompt to the TTS model. Qwen3-TTS uses this instruction to condition its output, producing speech that reflects the described characteristics. This does not use any reference audio -- the voice is synthesized entirely from the text description.
clone
Clone a voice from reference audio and use it to speak new text.
Provide a short audio sample of the target voice (and optionally its transcript), and qwen-tts will generate new speech that sounds like the same speaker.
Usage
qwen-tts clone [OPTIONS]
Options
| Option | Description |
|---|---|
--ref <PATH> | Path to a reference audio file (.wav). Required unless --voice is used. |
--ref-text <STRING> | Transcript of the reference audio. Providing this improves cloning accuracy. |
--voice <NAME> | Use a previously saved voice by name (see voices). Mutually exclusive with --ref. |
--text <STRING> | The text to speak with the cloned voice. |
--file <PATH> | Read the text to speak from a file. |
--speed <FLOAT> | Speech speed multiplier (default: config value, typically 1.0). |
-o, --output <PATH> | Output file path. If omitted, a timestamped .wav file is written to the configured output_dir. |
Note: You must provide either
--refor--voiceto specify the reference voice. You must also provide either--textor--filefor the content to speak.
Examples
Clone from a reference audio file:
qwen-tts clone --ref speaker.wav --ref-text "Hello, my name is Alex." --text "Now I can say anything in Alex's voice."
Clone using a saved voice:
qwen-tts clone --voice alex --text "This uses the saved reference for Alex."
Clone and save the output:
qwen-tts clone --ref narrator.wav --file script.txt -o narration.wav
Voice Resolution
When --voice is provided, qwen-tts looks up the corresponding .wav file in the voices directory (~/.qwen-tts/voices/<name>.wav). If a .txt transcript file exists alongside it, that transcript is used automatically. You can still override the transcript with --ref-text.
When --ref is provided, the audio file is used directly without copying it to the voices directory. To save it for future reuse, see the voices add command.
For a deeper guide on voice cloning, see Voice Cloning.
voices
Manage saved voices for voice cloning. Saved voices let you reuse reference audio clips by name instead of specifying file paths each time.
Subcommands
voices list
List all saved voices.
qwen-tts voices list
Displays each saved voice name along with a preview of its transcript (if available). Voice files are stored as .wav files in the voices directory (~/.qwen-tts/voices/ by default).
voices add
Enroll a new voice from a reference audio file.
qwen-tts voices add <NAME> --ref <PATH> [--transcript <TEXT>]
| Argument / Option | Description |
|---|---|
NAME | Required. A name for the voice (used to reference it later). |
--ref <PATH> | Required. Path to a reference audio file (.wav). The file is copied into the voices directory. |
--transcript <TEXT> | Optional transcript of the reference audio. Stored alongside the audio as <name>.txt. Providing a transcript improves cloning quality. |
Example:
qwen-tts voices add alex --ref ~/recordings/alex_sample.wav --transcript "Hi, my name is Alex and this is how I normally speak."
After enrollment, you can use --voice alex with the clone command:
qwen-tts clone --voice alex --text "Any new text in Alex's voice."
voices remove
Remove a saved voice.
qwen-tts voices remove <NAME>
| Argument | Description |
|---|---|
NAME | Required. The name of the voice to remove. |
Example:
qwen-tts voices remove alex
This deletes both the .wav file and the associated .txt transcript (if present) from the voices directory.
models
Manage TTS model downloads and installations.
Subcommands
models list
List all installed models.
qwen-tts models list
Shows each installed model variant along with its size on disk. Models are stored in the models directory (~/.qwen-tts/models/ by default).
models download
Download a model from Hugging Face.
qwen-tts models download [--variant <VARIANT>]
| Option | Description |
|---|---|
--variant <VARIANT> | Model variant to download: pro or lite. Defaults to pro. |
Example:
# Download the full-precision model
qwen-tts models download --variant pro
# Download the smaller quantized model
qwen-tts models download --variant lite
Model Variants
| Variant | Backend | Hugging Face Repository | Notes |
|---|---|---|---|
pro | MLX | mlx-community/Qwen3-TTS-bf16 | Full bf16 precision. Best quality on Apple Silicon. |
lite | MLX | mlx-community/Qwen3-TTS-4bit | 4-bit quantized. Lower memory usage, slightly reduced quality. |
pro | CUDA / CPU | Qwen/Qwen3-TTS | Official PyTorch checkpoint. |
lite | CUDA / CPU | Qwen/Qwen3-TTS | Same checkpoint (quantization handled at runtime). |
The download command uses the huggingface_hub Python library to fetch model files. The appropriate repository is selected automatically based on your configured backend.
Storage
Downloaded models are saved to ~/.qwen-tts/models/<variant>/. You can change the models directory with:
qwen-tts config set models_dir /path/to/models
config
View and modify qwen-tts configuration.
Configuration is stored in ~/.config/qwen-tts/config.toml.
Subcommands
config init
Initialize configuration with auto-detected platform settings.
qwen-tts config init
This command:
- Detects your operating system and hardware (Apple Silicon, NVIDIA GPU, or CPU-only).
- Selects the appropriate backend (
mlx,cuda, orcpu). - Creates the directory structure at
~/.qwen-tts/(models, voices, outputs). - Writes default values to
~/.config/qwen-tts/config.toml.
Run this once after installation, or again to reset to defaults.
config show
Display the current configuration.
qwen-tts config show
Prints the full contents of config.toml in TOML format.
config set
Set a single configuration value.
qwen-tts config set <KEY> <VALUE>
| Argument | Description |
|---|---|
KEY | The configuration key to set. |
VALUE | The new value. |
Examples:
qwen-tts config set default_voice "Ethan"
qwen-tts config set default_speed 1.2
qwen-tts config set auto_play false
qwen-tts config set backend cuda
qwen-tts config set model_variant lite
qwen-tts config set auto_cleanup false
qwen-tts config set cleanup_age_hours 48
Configuration Keys
| Key | Type | Default | Description |
|---|---|---|---|
python_path | string | ~/.qwen-tts/venv/bin/python | Path to the Python interpreter in the virtual environment. |
models_dir | string | ~/.qwen-tts/models | Directory where downloaded models are stored. |
voices_dir | string | ~/.qwen-tts/voices | Directory where saved voice references are stored. |
output_dir | string | ~/.qwen-tts/outputs | Default directory for generated audio files. |
backend | string | auto-detected | Inference backend: mlx, cuda, or cpu. |
default_voice | string | Vivian | Default voice name for the speak command. |
default_speed | float | 1.0 | Default speech speed multiplier. |
auto_play | bool | true | Automatically play audio after generation. |
model_variant | string | pro | Active model variant: pro or lite. |
auto_cleanup | bool | true | Automatically delete old output files on each run. |
cleanup_age_hours | integer | 24 | Minimum age in hours before an output file is cleaned up. |
For a detailed description of each key, see Configuration.
Configuration
qwen-tts stores its configuration in a TOML file at:
~/.config/qwen-tts/config.toml
On Windows, this is typically:
C:\Users\<you>\AppData\Roaming\qwen-tts\config.toml
Full Reference
Below is a complete example with default values:
python_path = "~/.qwen-tts/venv/bin/python"
models_dir = "~/.qwen-tts/models"
voices_dir = "~/.qwen-tts/voices"
output_dir = "~/.qwen-tts/outputs"
backend = "mlx"
default_voice = "Vivian"
default_speed = 1.0
auto_play = true
model_variant = "pro"
auto_cleanup = true
cleanup_age_hours = 24
Key Descriptions
python_path
Path to the Python interpreter used for TTS inference. This should point to the Python binary inside the virtual environment created during installation. On Windows, the default is ~/.qwen-tts/venv/Scripts/python.exe.
models_dir
Directory where model files are stored after downloading. Each variant (pro, lite) is stored in its own subdirectory.
voices_dir
Directory where saved voice references are stored. Each voice consists of a .wav audio file and an optional .txt transcript file.
output_dir
Default directory for generated audio output. When you run a generation command without specifying --output, the resulting .wav file is written here with a timestamp-based filename (e.g., tts_1706140800.wav).
backend
The inference backend. Auto-detected by config init, but can be overridden manually. Valid values:
| Value | Description |
|---|---|
mlx | Apple MLX framework. Best performance on Apple Silicon Macs. Uses mlx_audio for inference. |
cuda | NVIDIA CUDA. Requires an NVIDIA GPU with CUDA drivers. Uses PyTorch for inference. |
cpu | CPU-only fallback. Works everywhere but is significantly slower. Uses PyTorch for inference. |
default_voice
The voice name used by the speak command when --voice is not specified. This is a string identifier passed to the model's instruction prompt (e.g., "Vivian", "Ethan").
default_speed
The speech speed multiplier used when --speed is not specified. A value of 1.0 produces normal speed. Lower values slow down speech; higher values speed it up.
auto_play
When true, generated audio files are played immediately after creation. Playback uses platform-native tools:
- macOS:
afplay - Windows: PowerShell
SoundPlayer - Linux:
aplay,paplay, orffplay(tried in order)
Set to false to disable automatic playback.
model_variant
The active model variant: "pro" for full precision or "lite" for the quantized version. This determines which subdirectory under models_dir is used for inference. Must be either pro or lite.
auto_cleanup
When true, old output files in output_dir are automatically deleted at the start of each run. Only files older than cleanup_age_hours are removed. Set to false to keep all generated files indefinitely.
cleanup_age_hours
The minimum age (in hours) an output file must reach before it is eligible for automatic cleanup. Only takes effect when auto_cleanup is true. For example, the default value of 24 means files older than 24 hours are deleted on the next run.
Editing the Config File Directly
You can edit ~/.config/qwen-tts/config.toml in any text editor. Changes take effect the next time you run a qwen-tts command. Alternatively, use qwen-tts config set to modify individual values from the command line.
Directory Structure
After initialization, the ~/.qwen-tts/ directory contains:
~/.qwen-tts/
venv/ # Python virtual environment
models/ # Downloaded model files
pro/ # Full-precision model
lite/ # Quantized model
voices/ # Saved voice references
outputs/ # Generated audio files
Voice Cloning
Voice cloning lets you reproduce a specific person's voice from a short audio sample. This page explains how it works, how to prepare good reference audio, and how to save voices for repeated use.
How It Works
Qwen3-TTS supports zero-shot voice cloning. You provide:
- Reference audio -- A short
.wavclip of the target speaker. - Reference transcript -- The text spoken in the reference audio (optional but recommended).
- Target text -- The new text you want spoken in the cloned voice.
The model analyzes the speaker characteristics in the reference audio (pitch, timbre, cadence) and applies them when generating the target text. No fine-tuning is required.
Preparing Reference Audio
For best results, follow these guidelines:
- Length: 5 to 15 seconds is ideal. Shorter clips may not capture enough speaker characteristics. Longer clips increase processing time without proportional quality gains.
- Format: WAV format is required. Convert other formats with
ffmpeg:ffmpeg -i recording.mp3 -ar 16000 -ac 1 recording.wav - Quality: Use clean audio with minimal background noise. Avoid clips with music, multiple speakers, or heavy compression artifacts.
- Content: The reference audio should contain natural, conversational speech. Avoid whispering, shouting, or singing unless you want those characteristics reproduced.
- Transcript accuracy: If you provide a transcript, make sure it matches the audio exactly. Mismatched transcripts degrade cloning quality.
Basic Cloning
Clone from a one-off reference file:
qwen-tts clone \
--ref ~/recordings/speaker.wav \
--ref-text "This is how I normally speak." \
--text "The cloned voice will say this sentence."
Saving Voices for Reuse
If you plan to use the same voice repeatedly, save it with voices add:
qwen-tts voices add sarah \
--ref ~/recordings/sarah_sample.wav \
--transcript "Hi, I'm Sarah and this is a sample of my voice."
This copies the audio and transcript into the voices directory. Now you can reference it by name:
qwen-tts clone --voice sarah --text "Any new text in Sarah's voice."
To see all saved voices:
qwen-tts voices list
To remove a saved voice:
qwen-tts voices remove sarah
Tips
- Provide transcripts. The model uses the transcript to align audio features with linguistic content. Cloning quality improves noticeably when transcripts are provided.
- Test with short text first. Before generating a long narration, test the cloned voice with a short sentence to verify quality.
- Multiple references. The current implementation supports a single reference clip per invocation. If you have multiple samples of the same speaker, choose the cleanest one.
- Combining with speed control. You can adjust the speed of cloned speech with
--speedwithout affecting voice quality:qwen-tts clone --voice sarah --text "Slower speech." --speed 0.8
Platform Support
qwen-tts runs on macOS, Linux, and Windows. The CLI automatically detects your hardware and selects the best inference backend during config init.
Support Matrix
| Platform | Backend | Performance | Notes |
|---|---|---|---|
| macOS Apple Silicon (M1/M2/M3/M4) | mlx | Best | Native MLX acceleration. Recommended platform. Uses mlx_audio for inference with optimized MLX model weights. |
| macOS Intel | cpu | Slow | No GPU acceleration available. Falls back to PyTorch CPU inference. |
| Linux + NVIDIA GPU | cuda | Fast | Requires NVIDIA drivers and CUDA toolkit. Uses PyTorch with CUDA for inference. |
| Linux CPU-only | cpu | Slow | PyTorch CPU inference. Functional but not recommended for regular use. |
| Windows + NVIDIA GPU | cuda | Fast | Requires NVIDIA drivers and CUDA toolkit. Uses PyTorch with CUDA for inference. |
| Windows CPU-only | cpu | Slow | PyTorch CPU inference. Functional but not recommended for regular use. |
Backend Detection
When you run qwen-tts config init, the following logic determines your backend:
- If the OS is macOS and the architecture is
aarch64(Apple Silicon) ->mlx - Otherwise, if
nvidia-smiis found and returns success ->cuda - Otherwise ->
cpu
You can override the auto-detected backend manually:
qwen-tts config set backend cuda
Python Dependencies by Backend
Each backend requires different Python packages in the virtual environment:
MLX (Apple Silicon)
pip install mlx-audio huggingface-hub
CUDA (NVIDIA GPU)
pip install torch transformers huggingface-hub
CPU
pip install torch transformers huggingface-hub --extra-index-url https://download.pytorch.org/whl/cpu
Audio Playback
Generated audio is played automatically when auto_play is enabled. The playback command depends on the platform:
| Platform | Command |
|---|---|
| macOS | afplay (built-in) |
| Windows | PowerShell SoundPlayer |
| Linux | aplay, paplay, or ffplay (tried in order) |
If no audio player is found, a warning is printed and the generated file is still saved to disk.
Model Variants by Backend
| Backend | Pro Variant | Lite Variant |
|---|---|---|
| MLX | mlx-community/Qwen3-TTS-bf16 | mlx-community/Qwen3-TTS-4bit |
| CUDA | Qwen/Qwen3-TTS | Qwen/Qwen3-TTS |
| CPU | Qwen/Qwen3-TTS | Qwen/Qwen3-TTS |
On non-MLX backends, both pro and lite use the same upstream PyTorch checkpoint from Qwen.
Examples
A collection of practical examples for common qwen-tts workflows.
Basic Speech
Generate speech from a string:
qwen-tts speak "Hello, world!"
Save to a specific file:
qwen-tts speak "Hello, world!" -o hello.wav
Reading Files
Narrate a text file:
qwen-tts speak --file article.txt
Narrate a file with a specific voice and save the result:
qwen-tts speak --file chapter1.txt --voice "Ethan" -o chapter1.wav
Voice and Emotion
Speak with emotion:
qwen-tts speak "We won the championship!" --emotion "Excited"
qwen-tts speak "I'm sorry for your loss." --emotion "Sad and gentle"
Change the default voice:
qwen-tts config set default_voice "Ethan"
qwen-tts speak "This now uses Ethan by default."
Voice Design
Create a voice from a description:
qwen-tts design "A deep, authoritative male narrator with a British accent" \
--text "In a world where technology reigns supreme..."
Design a voice and narrate a file:
qwen-tts design "A cheerful young woman with an upbeat tone" \
--file welcome_message.txt -o welcome.wav
Voice Cloning
Clone a voice from a one-off sample:
qwen-tts clone \
--ref ~/recordings/speaker.wav \
--ref-text "This is a sample of my natural speaking voice." \
--text "Now the model can generate new speech in this voice."
Save a voice for reuse, then use it:
# Enroll the voice
qwen-tts voices add narrator \
--ref ~/recordings/narrator_sample.wav \
--transcript "Welcome to the audiobook. My name is James."
# Use the saved voice
qwen-tts clone --voice narrator --text "Chapter one. It was a dark and stormy night."
qwen-tts clone --voice narrator --file chapter2.txt -o chapter2.wav
List and manage saved voices:
qwen-tts voices list
qwen-tts voices remove narrator
Speed Control
Slow down for clarity:
qwen-tts speak "Please listen carefully to the following instructions." --speed 0.75
Speed up for previewing:
qwen-tts speak --file draft.txt --speed 1.5
Set a permanent default speed:
qwen-tts config set default_speed 0.9
Batch Processing
Generate speech for multiple files using a shell loop:
for f in chapters/*.txt; do
name=$(basename "$f" .txt)
qwen-tts speak --file "$f" -o "output/${name}.wav"
done
Clone a voice across multiple files:
for f in scripts/*.txt; do
name=$(basename "$f" .txt)
qwen-tts clone --voice narrator --file "$f" -o "output/${name}.wav"
done
Disabling Auto-Play
If you are generating many files and do not want each one to play:
qwen-tts config set auto_play false
Re-enable later:
qwen-tts config set auto_play true
Model Management
Download models:
# Full-precision model (recommended)
qwen-tts models download --variant pro
# Quantized model (smaller, faster on Apple Silicon)
qwen-tts models download --variant lite
Switch between variants:
qwen-tts config set model_variant lite
List installed models:
qwen-tts models list