WT3000TX Text-to-Speech (TTS) Voice Synthesis Chip

  • Control method: UART, default baud rate 9600;
  • Power-on defaults to not playing; Has a BUSY status indicator, default low level when playing BUSY on power-on, high level when not playing (code can modify default configuration);
  • Audio output method, sample defaults to DAC output;
  • Supports high-quality audio formats for voice (8kbps~320kbps), beautiful sound quality, .MP3, .WAV formats;
  • Supports random playback of commands, seamless loop playback function, etc.;
  • Maximum support for 128Mbit Flash;

Product Overview

The WT3000TX series is a range of powerful high-quality Voice Synthesis Chip that utilize high-performance 32-bit processors with a maximum frequency of 240MHz. These highly integrated voice synthesis chips support Chinese, English letters, or bilingual voice synthesis. They also incorporate voice encoding and decoding functions, enabling users to perform voice synthesis and playback. The chips feature low cost, low power consumption, high reliability, and strong versatility. Currently available packages include WT3000T8-32N QFN32 (small size 4x4mm) and WT3000T3-32N QFN32 (4x4mm). They offer features such as address playback, insert playback, single track loop, all tracks loop, and random playback. Volume can be adjusted over 31 levels, and they support up to 128Mbit external Flash memory.

Product features

  • Control method: UART, default baud rate 9600;
  • Power-on defaults to not playing; Has a BUSY status indicator, default low level when playing BUSY on power-on, high level when not playing (code can modify default configuration);
  • Audio output method, sample defaults to DAC output;
  • Supports high-quality audio formats for voice (8kbps~320kbps), beautiful sound quality, .MP3, .WAV formats;
  • Supports random playback of commands, seamless loop playback function, etc.;
  • Maximum support for 128Mbit Flash;
  • Volume adjustable, 31 levels of volume;
  • High-power IO drive capability, up to 32mA direct drive;
  • WT3000T8 A version supports synthesis of arbitrary Chinese text and English letters, and supports mixed reading of Chinese and English letters, English letters do not currently support using markers to achieve variable speed and pitch; WT3000T3 D version supports mixed reading synthesis and playback of arbitrary Chinese and English, English letters can use markers to achieve variable speed and pitch; The chip supports synthesis of arbitrary Chinese and English letters, GB 2312 encoding method can be used. Each synthesis text volume can reach up to 2K bytes. The chip analyzes the text, and for common formats such as numbers, phone numbers, time, date, measurement symbols, the chip can correctly identify and process according to built-in text matching rules;
  • Supports multiple control commands including text synthesis, stop synthesis, pause synthesis, resume synthesis, status query, enter sleep mode, wake up, etc. The controller sends control commands through the communication interface to control the chip. The chip’s control commands are very simple and easy to use, for example, the chip can play prompt sounds and synthesize Chinese and English text by referring to corresponding command descriptions, and can also set parameters for synthesized speech through marked text;
  • Supports multiple ways to query the chip’s working status including querying the status pin level, reading the automatically returned work status word from the chip, sending query commands to obtain the chip’s working status feedback data;
  • When using a single chip (using internal capacity), the built-in voice needs to be written before leaving the factory;
  • In deep sleep mode, power consumption is less than 6uA.

Introduction to program and module selection

Module Introduction
SeriesFunctionalcodeCommunication methodsModule selectionAudio outputFunctional description
WT3000TT001UART(9600)M01DACThis module uses the WT3000T8-32N chip by default, supporting playback of synthesized Chinese and English letters, and only supports playback from internal Flash memory, capable of storing 30 seconds of fixed voice.
Program Introduction
ChipsFunctionalcodeCommunication methodsVersionAudio outputFunctional description
WT3000T8-32NT001UART(9600)ADACChinese/English character synthesis playback, only supports built-in Flash playback, can store fixed voice for 30 seconds
WT3000T3-32NT001UART(9600)DDACChinese-English audio synthesis playback, only supports built-in Flash playback, can store fixed voice for up to 500 seconds.

Circuit Design Reference

Six Major Advantages

Voice Synthesis Chip
Voice Synthesis Chip

The chip supports the synthesis of any Chinese or English text and can use GB 2312 encoding to synthesize up to 2 kilobytes of text at a time.

WT3000T-32N multi-language TTS chip

  • 1. Supports Chinese and English
  • 2. UTF-8 encoding
  • 3. 2K text bytes
  • 4. Text-to-speech synthesis

Technical Parameters

  • Communication Method: UART, SPI
  • External Flash: 4-128M
  • Operating Voltage: 2.6-5.5V
  • Sleep Power Consumption: 6μA

Voice Algorithm

Intelligent text analysis, accurately processing numbers, symbols, and homophones.

The chip analyzes text, correctly identifies and processes common formats of numerical data, numbers, time, dates, measurement units, and other symbols based on built-in text matching rules. It can also accurately determine the pronunciation of homophones according to their context. Additionally, it supports reading mixed Chinese and English text.

Integrate voice decoding functionality to directly play audio files.

Voice Synthesis Chip
Scroll to Top