TABLE OF CONTENTS

Next / Back to main page

Table of Contents

1. Introduction

1.1 Project Description

1.2 Introduction to Speech Synthesis

2. History and Development of Speech Synthesis

2.1 From Mechanical to Electrical Synthesis

2.2 Development of Electrical Synthesizers

2.3 History of Finnish Speech Synthesis

3. Phonetics and Theory of Speech Production

3.1 Representation and Analysis of Speech Signals

3.2 Speech Production

3.3 Phonetics

3.3.1 English Articulatory Phonetics

3.3.2 Finnish Articulatory Phonetics

4. Problems in Speech Synthesis

4.1 Text-to-Phonetic Conversion

4.1.1 Text preprocessing

4.1.2 Pronunciation

4.1.3 Prosody

4.2 Problems in Low Level Synthesis

4.3 Language Specific Problems and Features

5. Methods, Techniques, and Algorithms

5.1 Articulatory Synthesis

5.2 Formant Synthesis

5.3 Concatenative Synthesis

5.3.1 PSOLA Methods

5.3.2 Microphonemic Method

5.4 Linear Prediction based Methods

5.5 Sinusoidal Models

5.6 High-Level Synthesis

5.6.1 Text Preprocessing

5.6.2 Pronunciation

5.6.3 Prosody

5.7 Other Methods and Techniques

6. Applications of Synthetic Speech

6.1 Applications for the Blind

6.2 Applications for the Deafened and Vocally Handicapped

6.3 Educational Applications

6.4 Applications for Telecommunications and Multimedia

6.5 Other Applications and Future Directions

7. Application Frameworks

7.1 Speech Application Programming Interface

7.1.1 Control Tags

7.2 Internet Speech Markup Languages

7.3 MPEG-4 TTS

7.3.1 MPEG-4 TTS Bitstream

7.3.2 Structure of MPEG-4 TTS Decoder

7.3.3 Applications of MPEG-4 TTS

8. Audiovisual Speech Synthesis

8.1 Introduction and History

8.2 Techniques and Models

9.1 Infovox

9.2 DECTalk

9.3 Bell Labs Text-to-Speech

9.4 Laureate

9.5 SoftVoice

9.6 CNET PSOLA

9.7 ORATOR

9.8 Eurovocs

9.9 Lernout & Hauspies

9.10 Apple Plain Talk

9.11 AcuVoice

9.12 CyberTalk

9.13 ETI Eloquence

9.14 Festival TTS System

9.15 ModelTalker

9.16 MBROLA

9.17 Whistler

9.18 NeuroTalker

9.19 Listen2

9.20 SPRUCE

9.21 HADIFIX

9.22 SVOX

9.23 SYNTE2 and SYNTE3

9.24 Timehouse Mikropuhe

9.25 Sanosse

9.26 Summary

10. Speech Quality and Evaluation

10.1 Segmental Evaluation Methods

10.1.1 Diagnostic Rhyme Test (DRT)

10.1.2 Modified Rhyme Test (MRT)

10.1.3 Diagnostic Medial Consonant Test (DMCT)

10.1.4 Standard Segmental Test

10.1.5 Cluster Identification Test (CLID)

10.1.6 Phonetically Balanced Word Lists (PB)

10.1.7 Nonsense words and Vowel-Consonant transitions

10.2 Sentence Level Tests

10.2.1 Harvard Psychoacoustic Sentences

10.2.2 Haskins Sentences

10.2.3 Semantically Unpredictable Sentences (SUS)

10.3 Comprehension tests

10.4 Prosody evaluation

10.5 Intelligibility of Proper Names

10.6 Overall Quality Evaluation

10.6.1 Mean Opinion Score (MOS)

10.6.2 Categorical Estimation (CE)

10.6.3 Pair Comparison (PC)

10.6.4 Magnitude and Ratio Estimation

10.7 Field Tests

10.8 Audiovisual Assessment

10.9 Summary

11. Conclusions and Future Strategies

References and Literature

Appendix A: Speech Synthesis Demonstrations

Appendix B: Summary of Speech Synthesis Products