Speech synthesis

deutsch english français Print

4.4 SPEECH SYNTHESIS

INTRODUCTION

In speech synthesis, a human voice is generated by the computer. A text-to-speech system (TTS) converts written text into a speech output. The automatic generation of human language is complicated, but it has made a lot of progress in recent years. Compared to the playback of pre-made voice recordings, TTS has the advantage of being very flexible and able to speak any text. Speech synthesis is a part of computational linguistics. Therefore, a close collaboration between linguists and computer scientists is necessary in the development of a TTS.

The speech synthesis software used in TigerJython is called MaryTTS and was developed at the Department of Computational Linguistics and Phonetics of the University of Saarland in Germany.

The system uses large library files that you download separately here and then unzip. In the same directory as tigerjython2.jar, create the subdirectory Lib (only if it does not already exist) and copy the unzipped files into it.

PROGRAMMING CONCEPTS: Speech synthesis, artificial speech, text-to-speech system

SPEAKING A TEXT IN 4 LANGUAGES

IIn this release, MaryTTS provides you with different voices speaking German, English, French and Italien. You can choose the voice with selectVoice(). After that you can call the function generateVoice() by passing it the text to be spoken. It will return a list with the generated sound samples that you can play back with a sound player.

from soundsystem import *

initTTS()

selectVoice("german-man")
#selectVoice("german-woman")
#selectVoice("english-man")
#selectVoice("english-woman")
#selectVoice("french-woman")
#selectVoice("french-man")
#selectVoice("italian-woman")

text = "Danke dass du mir eine Sprache gibst. Viel Spass beim Programmieren" 
#text = "Thank you to give me a voice. Enjoy programming" 
#text = "Merci pour me donner une voix. Profitez de la programmation"
#text = "Grazie che tu mi dia una lingua. Godere della programmazione"
voice = generateVoice(text)
openSoundPlayer(voice)
play()

Highlight program code (Ctrl+C copy, Ctrl+V paste)

MEMO

You can change the commented lines to let the program speak the text using the different voices. You first always have to call initTTS() in order to prepare the speech synthesis software.

You could also pass the function initTTS() a path to the directory containing the MaryTTS data files as a parameter. By default it is the subdirectory Lib.

ANNOUNCING TODAY'S DATE AND THE CURRENT TIME

There are numerous applications of speech synthesis. People with visual impairments can have texts read aloud to them, and navigation systems or train station or train announcements often use synthetically generated voices.

Many interactive computer games also use artificially generated voices.
Your program determines the current time from the computer system, and it reads it out loud with a German or an English speaking voice.

from soundsystem import *
import datetime

language = "german"
#language = "english"
#language = "french"

initTTS()
if language == "german":
    selectVoice("german-woman")
    month = ["Januar", "Februar", "März", "April", "Mai", 
        "Juni", "Juli", "August", "September", "Oktober", 
        "November", "Dezember"]
if language == "english":
    selectVoice("english-man")
    month = ["January", "February", "March", "April", "May", 
        "June", "July", "August", "September", "October", 
        "November", "December"]
        
if language == "french":
    selectVoice("french-man")
    month = ["Janvier", "Février", "Mars", "Avril", "Mai", 
        "Juin", "Juillet", "Aout", "Septembre", "Octobre", 
        "Novembre", "Décembre"]      

now = datetime.datetime.now()

if language == "german":
    text = "Heute ist der " + str(now.day) + ". " \
        + month[now.month - 1] + " " + str(now.year) + ".\n" \
        + "Die genaue Zeit ist " + str(now.hour) + " Uhr " + str(now.minute)
if language == "english":
    text = "Today we have  " + month[now.month - 1] + " "  \
        + str(now.day) + ", "+ str(now.year) + ".\n" \
        + "The time is " + str(now.hour) + " hours " + str(now.minute) \
        + " minutes."
if language == "french":
    text = "Nous sommes le " + str(now.day) + " " \
        + month[now.month - 1] + " " + str(now.year) + ".\n" \
        + "Il est exactement " + str(now.hour) + " heures " \
        + str(now.minute) + " minutes."
print(text)
voice = generateVoice(text)
openSoundPlayer(voice)
play()

Highlight program code (Ctrl+C copy, Ctrl+V paste)

MEMO

By selecting the commented lines, you can decide between the German or the English speaker. The class datetime.datetime.now() provides you with information about the current date and the current time, via its attributes year, month, day, hour, minute, second, microsecond.

As you can see, you can use the backslash as a line extension in the definition of long strings.

CREATING YOUR OWN GRAPHICAL USER INTERFACE

As you have already learned in chapter 3.13 it is quite easy to create a simple dialog window based on TigerJython's EntryDialog class. As usual in many programming environments the classic controls like text fields, push, check and radio buttons, as well as sliders are modeled by software objects. These objects appear in a surrounding rectangular pane and the dialog remains open while the program continues (such a dialog is called a modeless dialog). For a comprehensive information please consult the APLU documentation.

Your program opens a modeless dialog where you select the speaker using radio buttons. When clicking the confirmation button, the text in the text field is read by a synthetic voice.

from soundsystem import *
from entrydialog import *

speaker1 = RadioEntry("Mann (Deutsch)")
speaker1.setValue(True)
speaker2 = RadioEntry("Man (English)")
speaker3 = RadioEntry("Homme (Français)")
speaker4 = RadioEntry("Donna (Italiano)")
pane1 = EntryPane("Speaker Selection", 
                   speaker1, speaker2, speaker3, speaker4)
textEntry = StringEntry("Message:", "Viel Spass am Programmieren")
pane2 = EntryPane(textEntry)
okButton = ButtonEntry("Speak")
pane3 = EntryPane(okButton)
dlg = EntryDialog(pane1, pane2, pane3)
dlg.setTitle("Synthetic Voice")

initTTS()

while not dlg.isDisposed():
    if speaker1.isTouched():
        textEntry.setValue("Viel Spass am Programmieren")
    elif speaker2.isTouched():
        textEntry.setValue("Enjoy programming")
    elif speaker3.isTouched():
        textEntry.setValue("Profitez de la programmation")
    elif speaker4.isTouched():
        textEntry.setValue("Godere della programmazione")
                
    if okButton.isTouched():
        if speaker1.getValue():
            selectVoice("german-man")
            text = textEntry.getValue()
        elif speaker2.getValue():
            selectVoice("english-man")
            text = textEntry.getValue()
        elif speaker3.getValue():
            selectVoice("french-man")
            text = textEntry.getValue()
        elif speaker4.getValue():
            selectVoice("italian-woman")
            text = textEntry.getValue()
        if text != "":
            voice = generateVoice(text)
            openSoundPlayer(voice)
            play()

Highlight program code (Ctrl+C copy, Ctrl+V paste)

MEMO

The while loop executes until the dialog is closed with the title bar's close button. You check with isTouched() in every cycle, if the confirmation button was clicked since the last call of this function. In this case you get the current values of the GUI elements by calling getValue()and transform the text in the text field to a voice like in the preceding examples.

It is a bit dangerous to go through such "narrow" loops, because you waste lot of processing time for nothing other than just a check whether the button was pressed. However, when you call isTouched() the program will automatically stop for a short time (1ms) so that the throughput is slightly slowed down.

EXERCISES

Find or write a short poem as a text file, for example:

Advice To A Son by Ernest Hemingway.

Never trust a white man,
Never kill a Jew,
Never sign a contract,
Never rent a pew.
Don't enlist in armies;
Nor marry many wives;
Never write for magazines;
Never scratch your hives.
Always put paper on the seat,
Don't believe in wars,
Keep yourself both clean and neat,
Never marry whores.
Never pay a blackmailer,
Never go to law,
Never trust a publisher,
Or you'll sleep on straw.
All your friends will leave you
All your friends will die
So lead a clean and wholesome life
And join them in the sky.

Ernest Hemingway (Download)

With the line text = open("poem.txt", "r").read() you can read the text from the text file sorcery.txt, in the same directory as your program, as string. Let the text be read by the English voice.

Define the function fac(n) either iteratively or recursively, which returns the factorial

n! = 1 * 2 * ... *n

Your program should ask you for a number between 0 and 10 using readInt() and also speak the question out loud. It then calculates the factorial n! of the entered number and outputs the result as spoken text.