Introduction

qwen-tts is a cross-platform command-line interface for Qwen3-TTS, Alibaba's state-of-the-art text-to-speech model. It provides a simple, ergonomic way to generate natural-sounding speech from your terminal.

Key Features

Text-to-speech -- Convert any text or file to spoken audio with a single command.
Voice design -- Describe a voice in plain English (e.g., "a deep, calm British narrator") and the model will synthesize speech in that style.
Voice cloning -- Provide a short reference audio clip and qwen-tts will reproduce that voice for new text.
Cross-platform -- Runs on macOS (Apple Silicon and Intel), Linux, and Windows. Automatically selects the best backend for your hardware: MLX on Apple Silicon, CUDA on NVIDIA GPUs, or CPU fallback.
Saved voices -- Enroll reference audio clips as named voices so you can reuse them without specifying file paths every time.
Auto-play -- Generated audio plays immediately by default. Disable this with a single config toggle.

How It Works

qwen-tts is a Rust CLI that orchestrates a Python-based TTS pipeline under the hood. It manages model downloads from Hugging Face, handles configuration, and delegates the actual inference to either the mlx_audio package (on Apple Silicon) or a PyTorch-based generation script (on CUDA and CPU platforms).

Next Steps

Installation

Prerequisites

Before installing qwen-tts, make sure you have the following:

Python 3.10+ -- Required for the TTS inference backend.
Rust / Cargo -- Required to compile the CLI. Install from rustup.rs.
git -- Required for model downloads and install scripts.

Option 1: One-liner (macOS / Linux)

curl -fsSL https://raw.githubusercontent.com/andreisuslov/qwen-tts/main/scripts/install.sh | bash

This script will:

Install the qwen-tts binary via cargo install.
Create a Python virtual environment at ~/.qwen-tts/venv.
Install the required Python dependencies into the venv.
Run qwen-tts config init to generate a default configuration.

Option 2: One-liner (Windows)

Open PowerShell and run:

irm https://raw.githubusercontent.com/andreisuslov/qwen-tts/main/scripts/install.ps1 | iex

This performs the same steps as the macOS/Linux script, adapted for Windows paths and tooling.

Option 3: Manual Installation

If you prefer to install manually or need more control over the process:

1. Install the CLI

cargo install --git https://github.com/andreisuslov/qwen-tts

2. Create the Python virtual environment

python3 -m venv ~/.qwen-tts/venv

3. Install Python dependencies

On macOS Apple Silicon (MLX backend):

~/.qwen-tts/venv/bin/pip install mlx-audio huggingface-hub

On Linux / Windows with NVIDIA GPU (CUDA backend):

~/.qwen-tts/venv/bin/pip install torch transformers huggingface-hub

On CPU-only systems:

~/.qwen-tts/venv/bin/pip install torch transformers huggingface-hub --extra-index-url https://download.pytorch.org/whl/cpu

4. Initialize configuration

qwen-tts config init

This creates ~/.config/qwen-tts/config.toml with auto-detected platform settings, and sets up the directory structure at ~/.qwen-tts/.

Verifying the Installation

qwen-tts --version
qwen-tts config show

The config show command will print your current configuration, including the detected backend. If everything is correct, proceed to the Quick Start.

Quick Start

This page walks you through generating your first speech output in three commands.

1. Initialize configuration

If you used one of the install scripts, this step is already done. Otherwise:

qwen-tts config init

This auto-detects your platform and backend, then writes a config file to ~/.config/qwen-tts/config.toml.

2. Download a model

qwen-tts models download --variant pro

The pro variant downloads the full-precision model. Use --variant lite for a smaller quantized model (MLX only: 4-bit; recommended if disk space or memory is limited).

Model files are saved to ~/.qwen-tts/models/pro/ (or lite/).

3. Generate speech

qwen-tts speak "Hello, world!"

That's it. The audio is saved to ~/.qwen-tts/outputs/ and plays automatically.

What's next?

Try a few more things:

# Use a specific emotion
qwen-tts speak "I can't believe we did it!" --emotion "Excited"

# Read text from a file
qwen-tts speak --file article.txt --output narration.wav

# Design a voice from a description
qwen-tts design "A warm, friendly female narrator" --text "Welcome to the show."

# Clone a voice from a reference clip
qwen-tts clone --ref speaker.wav --ref-text "Hello, my name is Alex." --text "Now I can say anything."

For complete details on every command and flag, see the Commands section.

speak

Generate speech from text using the Qwen3-TTS model.

Usage

qwen-tts speak [TEXT] [OPTIONS]

Arguments

Argument	Description
`TEXT`	The text to speak. Optional if `--file` is provided.

Options

Option	Description
`--file <PATH>`	Read the input text from a file instead of the command line.
`--voice <NAME>`	Voice name for the speaker identity. Uses the `default_voice` config value if not specified (default: `Vivian`).
`--emotion <STYLE>`	Emotion or style instruction, such as `"Excited"`, `"Calm"`, or `"Whispered"`. When set, the model is prompted to speak with the given emotion.
`--speed <FLOAT>`	Speech speed multiplier. `1.0` is normal speed. Values below 1.0 slow down; above 1.0 speed up. Uses the `default_speed` config value if not specified.
`-o, --output <PATH>`	Output file path. If omitted, a timestamped `.wav` file is written to the configured `output_dir` (default: `~/.qwen-tts/outputs/`).

Examples

Basic text-to-speech:

qwen-tts speak "The quick brown fox jumps over the lazy dog."

With a specific voice and emotion:

qwen-tts speak "Breaking news from the capital." --voice "Ethan" --emotion "Serious"

Read from a file and save to a specific path:

qwen-tts speak --file chapter1.txt --output chapter1.wav

Slow down the speech:

qwen-tts speak "Take your time." --speed 0.8

Behavior

Text is resolved from the positional argument or --file (positional takes priority).
A voice instruction is built from the --voice and optional --emotion flags.
The TTS backend generates a .wav file.
If auto_play is enabled in the config, the audio plays immediately after generation.

speak

Generate speech from text using the Qwen3-TTS model.

Usage

qwen-tts speak [TEXT] [OPTIONS]

Arguments

Argument	Description
`TEXT`	The text to speak. Optional if `--file` is provided.

Options

Option	Description
`--file <PATH>`	Read the input text from a file instead of the command line.
`--voice <NAME>`	Voice name for the speaker identity. Uses the `default_voice` config value if not specified (default: `Vivian`).
`--emotion <STYLE>`	Emotion or style instruction, such as `"Excited"`, `"Calm"`, or `"Whispered"`. When set, the model is prompted to speak with the given emotion.
`--speed <FLOAT>`	Speech speed multiplier. `1.0` is normal speed. Values below 1.0 slow down; above 1.0 speed up. Uses the `default_speed` config value if not specified.
`-o, --output <PATH>`	Output file path. If omitted, a timestamped `.wav` file is written to the configured `output_dir` (default: `~/.qwen-tts/outputs/`).

Examples

Basic text-to-speech:

qwen-tts speak "The quick brown fox jumps over the lazy dog."

With a specific voice and emotion:

qwen-tts speak "Breaking news from the capital." --voice "Ethan" --emotion "Serious"

Read from a file and save to a specific path:

qwen-tts speak --file chapter1.txt --output chapter1.wav

Slow down the speech:

qwen-tts speak "Take your time." --speed 0.8

Behavior

Text is resolved from the positional argument or --file (positional takes priority).
A voice instruction is built from the --voice and optional --emotion flags.
The TTS backend generates a .wav file.
If auto_play is enabled in the config, the audio plays immediately after generation.

design

Design a voice from a free-form text description and generate speech with it.

Instead of choosing from a fixed set of voice names, you describe the voice you want in natural language. The model interprets the description and synthesizes speech that matches it.

Usage

qwen-tts design <DESCRIPTION> [OPTIONS]

Arguments

Argument	Description
`DESCRIPTION`	Required. A text description of the desired voice (e.g., `"A deep calm British narrator"`, `"An energetic young woman"`).

Options

Option	Description
`--text <STRING>`	The text to speak with the designed voice.
`--file <PATH>`	Read the text to speak from a file.
`--speed <FLOAT>`	Speech speed multiplier (default: config value, typically `1.0`).
`-o, --output <PATH>`	Output file path. If omitted, a timestamped `.wav` file is written to the configured `output_dir`.

Note: You must provide either --text or --file. If neither is given, the command will return an error.

Examples

Design a voice and speak a sentence:

qwen-tts design "A warm, friendly male voice with a slight Southern accent" --text "Howdy, partner."

Read text from a file:

qwen-tts design "A crisp, professional female newsreader" --file headlines.txt -o news.wav

How It Works

The description string is passed directly as the instruction prompt to the TTS model. Qwen3-TTS uses this instruction to condition its output, producing speech that reflects the described characteristics. This does not use any reference audio -- the voice is synthesized entirely from the text description.

clone

Clone a voice from reference audio and use it to speak new text.

Provide a short audio sample of the target voice (and optionally its transcript), and qwen-tts will generate new speech that sounds like the same speaker.

Usage

qwen-tts clone [OPTIONS]

Options

Option	Description
`--ref <PATH>`	Path to a reference audio file (`.wav`). Required unless `--voice` is used.
`--ref-text <STRING>`	Transcript of the reference audio. Providing this improves cloning accuracy.
`--voice <NAME>`	Use a previously saved voice by name (see voices). Mutually exclusive with `--ref`.
`--text <STRING>`	The text to speak with the cloned voice.
`--file <PATH>`	Read the text to speak from a file.
`--speed <FLOAT>`	Speech speed multiplier (default: config value, typically `1.0`).
`-o, --output <PATH>`	Output file path. If omitted, a timestamped `.wav` file is written to the configured `output_dir`.

Note: You must provide either --ref or --voice to specify the reference voice. You must also provide either --text or --file for the content to speak.

Examples

Clone from a reference audio file:

qwen-tts clone --ref speaker.wav --ref-text "Hello, my name is Alex." --text "Now I can say anything in Alex's voice."

Clone using a saved voice:

qwen-tts clone --voice alex --text "This uses the saved reference for Alex."

Clone and save the output:

qwen-tts clone --ref narrator.wav --file script.txt -o narration.wav

Voice Resolution

When --voice is provided, qwen-tts looks up the corresponding .wav file in the voices directory (~/.qwen-tts/voices/<name>.wav). If a .txt transcript file exists alongside it, that transcript is used automatically. You can still override the transcript with --ref-text.

When --ref is provided, the audio file is used directly without copying it to the voices directory. To save it for future reuse, see the voices add command.

For a deeper guide on voice cloning, see Voice Cloning.

voices

Manage saved voices for voice cloning. Saved voices let you reuse reference audio clips by name instead of specifying file paths each time.

Subcommands

voices list

List all saved voices.

qwen-tts voices list

Displays each saved voice name along with a preview of its transcript (if available). Voice files are stored as .wav files in the voices directory (~/.qwen-tts/voices/ by default).

voices add

Enroll a new voice from a reference audio file.

qwen-tts voices add <NAME> --ref <PATH> [--transcript <TEXT>]

Argument / Option	Description
`NAME`	Required. A name for the voice (used to reference it later).
`--ref <PATH>`	Required. Path to a reference audio file (`.wav`). The file is copied into the voices directory.
`--transcript <TEXT>`	Optional transcript of the reference audio. Stored alongside the audio as `<name>.txt`. Providing a transcript improves cloning quality.

Example:

qwen-tts voices add alex --ref ~/recordings/alex_sample.wav --transcript "Hi, my name is Alex and this is how I normally speak."

After enrollment, you can use --voice alex with the clone command:

qwen-tts clone --voice alex --text "Any new text in Alex's voice."

voices remove

Remove a saved voice.

qwen-tts voices remove <NAME>

Argument	Description
`NAME`	Required. The name of the voice to remove.

Example:

qwen-tts voices remove alex

This deletes both the .wav file and the associated .txt transcript (if present) from the voices directory.

models

Manage TTS model downloads and installations.

Subcommands

models list

List all installed models.

qwen-tts models list

Shows each installed model variant along with its size on disk. Models are stored in the models directory (~/.qwen-tts/models/ by default).

models download

Download a model from Hugging Face.

qwen-tts models download [--variant <VARIANT>]

Option	Description
`--variant <VARIANT>`	Model variant to download: `pro` or `lite`. Defaults to `pro`.

Example:

# Download the full-precision model
qwen-tts models download --variant pro

# Download the smaller quantized model
qwen-tts models download --variant lite

Model Variants

Variant	Backend	Hugging Face Repository	Notes
`pro`	MLX	`mlx-community/Qwen3-TTS-bf16`	Full bf16 precision. Best quality on Apple Silicon.
`lite`	MLX	`mlx-community/Qwen3-TTS-4bit`	4-bit quantized. Lower memory usage, slightly reduced quality.
`pro`	CUDA / CPU	`Qwen/Qwen3-TTS`	Official PyTorch checkpoint.
`lite`	CUDA / CPU	`Qwen/Qwen3-TTS`	Same checkpoint (quantization handled at runtime).

The download command uses the huggingface_hub Python library to fetch model files. The appropriate repository is selected automatically based on your configured backend.

Storage

Downloaded models are saved to ~/.qwen-tts/models/<variant>/. You can change the models directory with:

qwen-tts config set models_dir /path/to/models

config

View and modify qwen-tts configuration.

Configuration is stored in ~/.config/qwen-tts/config.toml.

Subcommands

config init

Initialize configuration with auto-detected platform settings.

qwen-tts config init

This command:

Detects your operating system and hardware (Apple Silicon, NVIDIA GPU, or CPU-only).
Selects the appropriate backend (mlx, cuda, or cpu).
Creates the directory structure at ~/.qwen-tts/ (models, voices, outputs).
Writes default values to ~/.config/qwen-tts/config.toml.

Run this once after installation, or again to reset to defaults.

config show

Display the current configuration.

qwen-tts config show

Prints the full contents of config.toml in TOML format.

config set

Set a single configuration value.

qwen-tts config set <KEY> <VALUE>

Argument	Description
`KEY`	The configuration key to set.
`VALUE`	The new value.

Examples:

qwen-tts config set default_voice "Ethan"
qwen-tts config set default_speed 1.2
qwen-tts config set auto_play false
qwen-tts config set backend cuda
qwen-tts config set model_variant lite
qwen-tts config set auto_cleanup false
qwen-tts config set cleanup_age_hours 48

Configuration Keys

Key	Type	Default	Description
`python_path`	string	`~/.qwen-tts/venv/bin/python`	Path to the Python interpreter in the virtual environment.
`models_dir`	string	`~/.qwen-tts/models`	Directory where downloaded models are stored.
`voices_dir`	string	`~/.qwen-tts/voices`	Directory where saved voice references are stored.
`output_dir`	string	`~/.qwen-tts/outputs`	Default directory for generated audio files.
`backend`	string	auto-detected	Inference backend: `mlx`, `cuda`, or `cpu`.
`default_voice`	string	`Vivian`	Default voice name for the `speak` command.
`default_speed`	float	`1.0`	Default speech speed multiplier.
`auto_play`	bool	`true`	Automatically play audio after generation.
`model_variant`	string	`pro`	Active model variant: `pro` or `lite`.
`auto_cleanup`	bool	`true`	Automatically delete old output files on each run.
`cleanup_age_hours`	integer	`24`	Minimum age in hours before an output file is cleaned up.

For a detailed description of each key, see Configuration.

Configuration

qwen-tts stores its configuration in a TOML file at:

~/.config/qwen-tts/config.toml

On Windows, this is typically:

C:\Users\<you>\AppData\Roaming\qwen-tts\config.toml

Full Reference

Below is a complete example with default values:

python_path = "~/.qwen-tts/venv/bin/python"
models_dir = "~/.qwen-tts/models"
voices_dir = "~/.qwen-tts/voices"
output_dir = "~/.qwen-tts/outputs"
backend = "mlx"
default_voice = "Vivian"
default_speed = 1.0
auto_play = true
model_variant = "pro"
auto_cleanup = true
cleanup_age_hours = 24

Key Descriptions

python_path

Path to the Python interpreter used for TTS inference. This should point to the Python binary inside the virtual environment created during installation. On Windows, the default is ~/.qwen-tts/venv/Scripts/python.exe.

models_dir

Directory where model files are stored after downloading. Each variant (pro, lite) is stored in its own subdirectory.

voices_dir

Directory where saved voice references are stored. Each voice consists of a .wav audio file and an optional .txt transcript file.

output_dir

Default directory for generated audio output. When you run a generation command without specifying --output, the resulting .wav file is written here with a timestamp-based filename (e.g., tts_1706140800.wav).

backend

The inference backend. Auto-detected by config init, but can be overridden manually. Valid values:

Value	Description
`mlx`	Apple MLX framework. Best performance on Apple Silicon Macs. Uses `mlx_audio` for inference.
`cuda`	NVIDIA CUDA. Requires an NVIDIA GPU with CUDA drivers. Uses PyTorch for inference.
`cpu`	CPU-only fallback. Works everywhere but is significantly slower. Uses PyTorch for inference.

default_voice

The voice name used by the speak command when --voice is not specified. This is a string identifier passed to the model's instruction prompt (e.g., "Vivian", "Ethan").

default_speed

The speech speed multiplier used when --speed is not specified. A value of 1.0 produces normal speed. Lower values slow down speech; higher values speed it up.

auto_play

When true, generated audio files are played immediately after creation. Playback uses platform-native tools:

macOS: afplay
Windows: PowerShell SoundPlayer
Linux: aplay, paplay, or ffplay (tried in order)

Set to false to disable automatic playback.

model_variant

The active model variant: "pro" for full precision or "lite" for the quantized version. This determines which subdirectory under models_dir is used for inference. Must be either pro or lite.

auto_cleanup

When true, old output files in output_dir are automatically deleted at the start of each run. Only files older than cleanup_age_hours are removed. Set to false to keep all generated files indefinitely.

cleanup_age_hours

The minimum age (in hours) an output file must reach before it is eligible for automatic cleanup. Only takes effect when auto_cleanup is true. For example, the default value of 24 means files older than 24 hours are deleted on the next run.

Editing the Config File Directly

You can edit ~/.config/qwen-tts/config.toml in any text editor. Changes take effect the next time you run a qwen-tts command. Alternatively, use qwen-tts config set to modify individual values from the command line.

Directory Structure

After initialization, the ~/.qwen-tts/ directory contains:

~/.qwen-tts/
  venv/          # Python virtual environment
  models/        # Downloaded model files
    pro/         # Full-precision model
    lite/        # Quantized model
  voices/        # Saved voice references
  outputs/       # Generated audio files

Voice Cloning

Voice cloning lets you reproduce a specific person's voice from a short audio sample. This page explains how it works, how to prepare good reference audio, and how to save voices for repeated use.

How It Works

Qwen3-TTS supports zero-shot voice cloning. You provide:

Reference audio -- A short .wav clip of the target speaker.
Reference transcript -- The text spoken in the reference audio (optional but recommended).
Target text -- The new text you want spoken in the cloned voice.

The model analyzes the speaker characteristics in the reference audio (pitch, timbre, cadence) and applies them when generating the target text. No fine-tuning is required.

Preparing Reference Audio

For best results, follow these guidelines:

Length: 5 to 15 seconds is ideal. Shorter clips may not capture enough speaker characteristics. Longer clips increase processing time without proportional quality gains.
Format: WAV format is required. Convert other formats with ffmpeg:
```
ffmpeg -i recording.mp3 -ar 16000 -ac 1 recording.wav
```
Quality: Use clean audio with minimal background noise. Avoid clips with music, multiple speakers, or heavy compression artifacts.
Content: The reference audio should contain natural, conversational speech. Avoid whispering, shouting, or singing unless you want those characteristics reproduced.
Transcript accuracy: If you provide a transcript, make sure it matches the audio exactly. Mismatched transcripts degrade cloning quality.

Basic Cloning

Clone from a one-off reference file:

qwen-tts clone \
  --ref ~/recordings/speaker.wav \
  --ref-text "This is how I normally speak." \
  --text "The cloned voice will say this sentence."

Saving Voices for Reuse

If you plan to use the same voice repeatedly, save it with voices add:

qwen-tts voices add sarah \
  --ref ~/recordings/sarah_sample.wav \
  --transcript "Hi, I'm Sarah and this is a sample of my voice."

This copies the audio and transcript into the voices directory. Now you can reference it by name:

qwen-tts clone --voice sarah --text "Any new text in Sarah's voice."

To see all saved voices:

qwen-tts voices list

To remove a saved voice:

qwen-tts voices remove sarah

Tips

Provide transcripts. The model uses the transcript to align audio features with linguistic content. Cloning quality improves noticeably when transcripts are provided.
Test with short text first. Before generating a long narration, test the cloned voice with a short sentence to verify quality.
Multiple references. The current implementation supports a single reference clip per invocation. If you have multiple samples of the same speaker, choose the cleanest one.
Combining with speed control. You can adjust the speed of cloned speech with --speed without affecting voice quality:
```
qwen-tts clone --voice sarah --text "Slower speech." --speed 0.8
```

Platform Support

qwen-tts runs on macOS, Linux, and Windows. The CLI automatically detects your hardware and selects the best inference backend during config init.

Support Matrix

Platform	Backend	Performance	Notes
macOS Apple Silicon (M1/M2/M3/M4)	`mlx`	Best	Native MLX acceleration. Recommended platform. Uses `mlx_audio` for inference with optimized MLX model weights.
macOS Intel	`cpu`	Slow	No GPU acceleration available. Falls back to PyTorch CPU inference.
Linux + NVIDIA GPU	`cuda`	Fast	Requires NVIDIA drivers and CUDA toolkit. Uses PyTorch with CUDA for inference.
Linux CPU-only	`cpu`	Slow	PyTorch CPU inference. Functional but not recommended for regular use.
Windows + NVIDIA GPU	`cuda`	Fast	Requires NVIDIA drivers and CUDA toolkit. Uses PyTorch with CUDA for inference.
Windows CPU-only	`cpu`	Slow	PyTorch CPU inference. Functional but not recommended for regular use.

Backend Detection

When you run qwen-tts config init, the following logic determines your backend:

If the OS is macOS and the architecture is aarch64 (Apple Silicon) -> mlx
Otherwise, if nvidia-smi is found and returns success -> cuda
Otherwise -> cpu

You can override the auto-detected backend manually:

qwen-tts config set backend cuda

Python Dependencies by Backend

Each backend requires different Python packages in the virtual environment:

MLX (Apple Silicon)

pip install mlx-audio huggingface-hub

CUDA (NVIDIA GPU)

pip install torch transformers huggingface-hub

CPU

pip install torch transformers huggingface-hub --extra-index-url https://download.pytorch.org/whl/cpu

Audio Playback

Generated audio is played automatically when auto_play is enabled. The playback command depends on the platform:

Platform	Command
macOS	`afplay` (built-in)
Windows	PowerShell `SoundPlayer`
Linux	`aplay`, `paplay`, or `ffplay` (tried in order)

If no audio player is found, a warning is printed and the generated file is still saved to disk.

Model Variants by Backend

Backend	Pro Variant	Lite Variant
MLX	`mlx-community/Qwen3-TTS-bf16`	`mlx-community/Qwen3-TTS-4bit`
CUDA	`Qwen/Qwen3-TTS`	`Qwen/Qwen3-TTS`
CPU	`Qwen/Qwen3-TTS`	`Qwen/Qwen3-TTS`

On non-MLX backends, both pro and lite use the same upstream PyTorch checkpoint from Qwen.

Examples

A collection of practical examples for common qwen-tts workflows.

Basic Speech

Generate speech from a string:

qwen-tts speak "Hello, world!"

Save to a specific file:

qwen-tts speak "Hello, world!" -o hello.wav

Reading Files

Narrate a text file:

qwen-tts speak --file article.txt

Narrate a file with a specific voice and save the result:

qwen-tts speak --file chapter1.txt --voice "Ethan" -o chapter1.wav

Voice and Emotion

Speak with emotion:

qwen-tts speak "We won the championship!" --emotion "Excited"
qwen-tts speak "I'm sorry for your loss." --emotion "Sad and gentle"

Change the default voice:

qwen-tts config set default_voice "Ethan"
qwen-tts speak "This now uses Ethan by default."

Voice Design

Create a voice from a description:

qwen-tts design "A deep, authoritative male narrator with a British accent" \
  --text "In a world where technology reigns supreme..."

Design a voice and narrate a file:

qwen-tts design "A cheerful young woman with an upbeat tone" \
  --file welcome_message.txt -o welcome.wav

Voice Cloning

Clone a voice from a one-off sample:

qwen-tts clone \
  --ref ~/recordings/speaker.wav \
  --ref-text "This is a sample of my natural speaking voice." \
  --text "Now the model can generate new speech in this voice."

Save a voice for reuse, then use it:

# Enroll the voice
qwen-tts voices add narrator \
  --ref ~/recordings/narrator_sample.wav \
  --transcript "Welcome to the audiobook. My name is James."

# Use the saved voice
qwen-tts clone --voice narrator --text "Chapter one. It was a dark and stormy night."
qwen-tts clone --voice narrator --file chapter2.txt -o chapter2.wav

List and manage saved voices:

qwen-tts voices list
qwen-tts voices remove narrator

Speed Control

Slow down for clarity:

qwen-tts speak "Please listen carefully to the following instructions." --speed 0.75

Speed up for previewing:

qwen-tts speak --file draft.txt --speed 1.5

Set a permanent default speed:

qwen-tts config set default_speed 0.9

Batch Processing

Generate speech for multiple files using a shell loop:

for f in chapters/*.txt; do
  name=$(basename "$f" .txt)
  qwen-tts speak --file "$f" -o "output/${name}.wav"
done

Clone a voice across multiple files:

for f in scripts/*.txt; do
  name=$(basename "$f" .txt)
  qwen-tts clone --voice narrator --file "$f" -o "output/${name}.wav"
done

Disabling Auto-Play

If you are generating many files and do not want each one to play:

qwen-tts config set auto_play false

Re-enable later:

qwen-tts config set auto_play true

Model Management

Download models:

# Full-precision model (recommended)
qwen-tts models download --variant pro

# Quantized model (smaller, faster on Apple Silicon)
qwen-tts models download --variant lite

Switch between variants:

qwen-tts config set model_variant lite

List installed models:

qwen-tts models list

qwen-tts Documentation