PhatVoice User's Guide

4.1.4 Obtaining additional Natural Voices

In addition to the Crystal and Mike voices that ship with the Natural Voices engine, additional voices are available. As of the publication of this manual, the following voices were available:

Julia - American English female voice
Lauren - American English female voice
Claire - American English female voice
Mel - American English male voice
Ray - American English male voice
Rich - American English male voice
Audrey - UK English female voice
Anjali - UK English with Indian accent female voice
Charles - UK English male voice
Rosa - Latin American Spanish female voice
Alberto - Latin American Spanish male voice
Juliette - French female voice
Alain - French male voice
Klara - German female voice
Reiner - German male voice

Note

The non-English voices cannot be used to produce English speech with a foreign accent without a great deal of additional pronunciation hinting. However, they are fine for pronouncing words in their native language. Also, the UK English voices use different phonetic symbols for their speech hints.

The Natural Voices engine and additional voices are available from a number of vendors. During the development of the PhatVoice software, we purchased voices from NextUp Technologies, LLC. Their order page is located at:

https://www.regsoft.net/regsoft/vieworderpage.php3?productid=56116

Note

In order to use the additional Natural Voices, you must be running Version 1.4 or later of the Natural Voices. If you purchase the Version 1.4 engine, you should first deinstall the 1.2 engine using the Windows Start / Settings / Control Panel / Add/Remove Programs menu.

4.2 Testing your pronunciations

Depending on which additional software packages you have installed on your computer, you will be able to test your pronunciations with one or more utilities. If you have only installed the PhatNoise Music Manager and PhatVoice, you will need to test using PhatVoice. If you have installed the Microsoft Speech SDK 5.1, you may also use the TTSApp.exe program included in that SDK. If you purchased and installed Natural Voices version 1.3 or newer, you may use the WinDictEdit.exe program. In each of the examples below, we will show the process used to refine the pronunciation of the word Apocalyptica.

Note

As noted in the previous chapter, the Microsoft Speech API does not do a good job of converting the sample rate of speech generated by the Natural Voices engine. If you are using either PhatVoice or the Microsoft Speech TTSApp to test your hints, you should make sure that the speech format in the application is set to the native mode for the Natural Voices engine you are using (8KHz 16 Bit Mono for voices that do not end in 16 and 16KHz 16 bit Mono for voices ending in 16.

If you have included a phonetic pronunciation or other SAPI markup tags in your text, you may receive the message "Speak error". This means that an invalid tag or phoneme was detected in your text. It is not always obvious what the incorrect tag is, and you may have to resort to shortening your text to isolate the offending syntax.

4.2.1 Testing hints using PhatVoice

To test hints using PhatVoice, start the PhatVoice program (normally via Start / Programs / PhatVoice / PhatVoice) and enter the text you want spoken in the Text to Speech box, then click the Sample button to generate the speech. If the Regexps box is checked, the sample text will be processed through the currently-loaded hints file, displayed in the Post Regexp Processing box, and spoken. To refine your hint, enter new text in the Text to Speech box and repeat. Once you have the speech sounding the way you want it, you may enter it in your hints file using a text editor as described in the previous chapter.

4.2.2 Testing hints using the Microsoft Speech engine

The Microsoft Speech TTSapp program operates in a manner similar to the PhatVoice program, except that it does not process your input through your hints file. Start this program via Start / Programs / Microsoft Speech SDK 5.1 / Tools / TTSApp. You need to have the Process XML box checked if you plan on including phonetic pronunciation or other SAPI hints in your text. Once you have perfected the speech, you may enter it in your hints file using a text editor as described in the previous chapter.

4.2.3 Testing hints using the AT&T Natural Voices engine

Natural Voices 1.3 and later provides a Dictionary Editor utility to enter pronunciations into a custom dictionary. While neither the PhatNoise Music Manager nor PhatVoice use custom dictionaries, this utility has a very nice feature which will show you the phonetic representation of words you type in, and also has a Sounds Like box to let you explore the phonetic spelling of similar words. Consult the Natural Voices documentation for additional information on using this program. Again, once you have perfected the speech, you may enter it in your hints file using a text editor as described in the previous chapter.

Note

As of this writing, the PhatVoice Music Manager ships with Version 1.2.1 of Natural Voices, which does not include the Dictionary Editor utility. You would need to purchase and install a newer version of Natural Voices as described above in order to obtain this utility.

Note

The Dictionary Editor generates phonetic data based on the DARPA phonetic alphabet, which is similar, but not identical to the SAPI phonetic alphabet used by the Microsoft Speech and PhatVoice software. When copying phonetic pronunciations from the Dictionary Editor, you will need to remove all instances of the digit zero ("0") and convert all instances of "hh" to "h".

4.2.4 Generating debugging output to to verify pronunciations

Note

This reference to debugging output refers to debugging your pronunciation hints, not debugging the PhatVoice program itself.

One problem with using regular-expression-based substitutions is that your substitutions may act on text where you weren't expecting it to. For example, if you wanted to change all instances of "Vol" to "Volume", you might try a substitution of the form:

s{Vol}{Volume}

However, if you had a disc with info that said "Volume" already, you would end up with "Volumeume" instead. In order to assist you in locating these unintended consequences without having to listen to all of the generated speech, debugging output is available.

To enable this output, ensure that the Debug Output box is checked and that the Always Replace Output Files box is not checked. You will probably want to have all of the Playlist, Artist, Album, Track, and Genre boxes checked.

Before loading your new substitutions file, enter a filename such as old.txt in the Debug Output filename chooser box and click the large Go button. This will create a debug output file with the current substitutions (if any).

Now, load your new substitutions file using the Regexps filename chooser box and change your Debug Output filename to something like new.txt and then click the Go button.

You now have two debug files named old.txt and new.txt which you can compare to locate any differences. You can do this under Windows with the DOS fc command. You can access the DOS environment in Windows by accessing the Start / Run menu and typing command in the selection box. Windows 2000 and newer users may also use cmd instead of command, which has a number of additional features. Here is a sample session:

C:\Temp> cd "C:\Program Files\PhatVoice" C:\Program Files\PhatVoice>fc old.txt new.txt Comparing files old.txt and NEW.TXT ***** old.txt Various Artists Best of Trance 2000 - Volumeume 1 Unknown Genre ***** NEW.TXT Various Artists Best of Trance 2000 - Volume 1 Unknown Genre ***** ***** old.txt Various Artists Best of Trance 2000 - Volumeume 1 Unknown Genre ***** NEW.TXT Various Artists Best of Trance 2000 - Volume 1 Unknown Genre ***** . . . C:\Program Files\PhatVoice>

Note

For advanced users, if you have access to a Unix system or have Unix-like tools installed on your PC, you can perform some steps to reduce the bulk of this output while preserving the ability to easily find the differences:

(0:1) host:~terry/test> sort old.txt | uniq > old.sort (0:2) host:~terry/test> sort new.txt | uniq > new.sort (0:3) host:~terry/test> diff old.sort new.sort 89,90c89,90 < <pron sym="f ah ng k 1 k ah ng f y uw 1 zh ax n"/> - Ninja Cuts Volumeume 3 , disc 1 < <pron sym="f ah ng k 1 k ah ng f y uw 1 zh ax n"/> - Ninja Cuts Volumeume 3 , disc 2 --- > <pron sym="f ah ng k 1 k ah ng f y uw 1 zh ax n"/> - Ninja Cuts Volume 3 , disc 1 > <pron sym="f ah ng k 1 k ah ng f y uw 1 zh ax n"/> - Ninja Cuts Volume 3 , disc 2 109c109 < <pron sym="l ay v 1"/> at the Knitting Factory Volumeume 1 --- > <pron sym="l ay v 1"/> at the Knitting Factory Volume 1 428c428 < Best of Trance 2000 - Volumeume 1 --- > Best of Trance 2000 - Volume 1 2158c2158 < New Wave Hits of the 80's - Volumeume 9 --- > New Wave Hits of the 80's - Volume 9 2237c2237 < Ohm Lounge Volumeume 5 --- > Ohm Lounge Volume 5 3319,3320c3319,3320 < Trans Volumeume 2 - A State of Altered Consciousness , disc 1 < Trans Volumeume 2 - A State of Altered Consciousness , disc 2 --- > Trans Volume 2 - A State of Altered Consciousness , disc 1 > Trans Volume 2 - A State of Altered Consciousness , disc 2 3576c3576 < Yevgeniy Osin - Tanya + Volumeodya --- > Yevgeniy Osin - Tanya + Volodya (0:3) host:~terry/test>

After you have checked for unwanted substitutions, you can either edit your pronunciation hints file (if you found any problems) or proceed to generate speech from your hints file as shown in the previous chapter.

Note

Debug output is an alternative to generating the actual speech files. If you have checked the Debug Output box, speech files will not be generated. Once you have verified the debug output, re-run the speech generation with Debug Output un-checked to generate the actual speech files. Be sure to check the Always Replace Output Files box to replace existing speech with the new files.

Chapter 5
Uninstalling PhatVoice

While we hope that PhatVoice has been useful to you, we can understand that you might need to uninstall it for some reason. This chapter documents the removal process.

5.1 Uninstalling using the Uninstall PhatVoice Start Menu item

If you installed PhatVoice using the Setup Wizard, your Start Menu will have an item in the Windows Start / Programs / PhatVoice / Uninstall PhatVoice which will perform an automated uninstall of PhatVoice. Note that if you have created any additional files in the installation directory (normally C:\Program Files\PhatVoice), those additional files (and the PhatVoice directory itself) will not be removed during the uninstall. If you don't want to keep those files, you may delete them and remove the directory yourself.

5.2 Manually uninstalling PhatVoice

If you manually installed PhatVoice from the zipfile, you can delete the program and data files, along with the installation directory as well as removing any Start Menu items and Desktop shortcuts.

Chapter 6
Changes from previous versions, corrected problems, and known problems

This chapter lists changes made from previous versions, as well as providing lists of corrected problems and known problems.

6.1 Changes from previous versions

The main changes from PhatVoice V1.0 are:

Support for pronunciation hints.
Support for speech generation of playlist names.
Various bugfixes.

Refer to the rest of this manual for more details on these changes.

6.2 Corrected problems

The following problems have been corrected for the release of PhatVoice V2.0.

Defect/Enhancement ID: D/E 0003 Affected Version(s): 1.0 through 2.0-beta Fixed-in Version: 2.0-beta2 Status: resolved Summary: Incorrect output file names generated for items containing "!". Description: If a pronunciation item contains an exclamation point ("!"), the generated filename is incorrect. It is missing the "!" character. De- pending on whether or not a file of the same name already exists on the DMS, users will either hear silence or the previous file's contents. ------ Defect/Enhancement ID: D/E 0004 Affected Version(s): 2.0-alpha, 2.0-beta Fixed-in Version: 2.0-beta2 Status: resolved Summary: "|" character in MP3 tag generates COM error. Description: If a MP3 tag (artist, title, track, etc.) contains a vertical bar ("|"), when that track is processed by PhatVoice, the user will receive the error "COM Error -2147287038 %1 could not be found." ------ Defect/Enhancement ID: D/E 0006 Affected Version(s): 1.0 through 2.0-alpha Fixed-in Version: 2.0-beta and subsequent Status: resolved Summary: Enter key doesn't work in "Sample" text box. Description: PhatVoice should speak the "Sample" text if the Enter key is pressed while focus is in the Sample text box. ------ Defect/Enhancement ID: D/E 0007 Affected Version(s): 1.0 through 2.0-alpha Fixed-in Version: 2.0-beta and subsequent Status: resolved Summary: Enter and Escape keys causes undesired PhatVoice exit. Description: If the user presses the Enter or Escape key, PhatVoice will exit. This is surprising to users and these keypresses should be ignored except for Enter within the Sample text box. ------ Defect/Enhancement ID: D/E 0008 Affected Version(s): 1.0 through 2.0-alpha Fixed-in Version: 2.0-beta and subsequent Status: resolved Summary: PhatVoice lacks the Windows "minimize window" widget. Description: The PhatVoice application can't be minimized. The only item on the top right of the PhatVoice window is the close ("X") widget. ------ Defect/Enhancement ID: D/E 0009 Affected Version(s): 1.0 through 2.0-alpha Fixed-in Version: 2.0-beta and subsequent Status: resolved Summary: PhatVoice lacks the ability to save user preferences. Description: There is no option to save the user's preferences. There should be a "Save Settings" button. ------ Defect/Enhancement ID: D/E 0010 Affected Version(s): All Fixed-in Version: 2.0 Status: resolved Summary: Sample text box doesn't support mouse-based cut/copy/paste operations. Description: The "Sample text" input box doesn't support right-click mouse cut, copy, and paste operations. As a workaround, use the keyboard accelerator keys Control-I for cut, Control-C for copy, and Control-V for paste. ------ Defect/Enhancement ID: D/E 0011 Affected Version(s): 1.0 through 2.0-alpha Fixed-in Version: 2.0-beta and subsequent Status: resolved Summary: PhatVoice doesn't generate speech for playlists. Description: The "Playlist" checkbox is grayed out. This isn't a problem for V1.0 as it doesn't do pronunciation hints, but it is vital for V2.0 as the major purpose of the new version is to generate hinted pronunciations. ------ Defect/Enhancement ID: D/E 0012 Affected Version(s): 1.0 Fixed-in Version: 2.0-alpha and subsequent Status: resolved Summary: PhatVoice lacks support for pronunciation hints. Description: The ATT Natural Voices product does an amazing job with most text, but it needs assistance with some pronunciations. PhatVoice should support a regular-expression-based substitution engine which also allows the MS SAPI 5 tags for phonetic hints. ------ Defect/Enhancement ID: D/E 0013 Affected Version(s): 1.0 and subsequent Fixed-in Version: 2.0-beta Status: resolved Summary: PhatVoice needs more info in its "About PhatVoice" dialog box. Description: PhatVoice should have more complete version and contact info in its "About" box. ------ Defect/Enhancement ID: D/E 0015 Affected Version(s): 2.0-alpha and subsequent Fixed-in Version: 2.0 Status: resolved Summary: "Speak error" alert box should say what text was being spoken. Description: When the post-regular-expression-processing speech is being generated, it is possible for the Natural Voices engine to report a fail- ure to generated speech for a number of reasons, including the voice not being installed, incorrect phoneme string for the selected voice, and so forth. It would help if PhatVoice would report what was being spoken in order to help the user track down the incorrect SAPI syntax.

6.3 Known problems

The following problems are still present in the PhatVoice 2.0 release. They may be corrected in a subsequent release.

Defect/Enhancement ID: D/E 0001 Affected Version(s): 2.0-alpha and subsequent Fixed-in Version: Status: open Summary: Some syntax errors are ignored. Description: Some syntax errors are not detected when the regexp file is loaded by PhatVoice. In particular, errors of the form {}(} are not flagged as errors. User feedback of any other cases would be appreciated. ------ Defect/Enhancement ID: D/E 0002 Affected Version(s): All Fixed-in Version: Status: open Summary: Generated speech sounds "scratchy" or is missing. Description: Choosing an incorrect text-to-speech bitrate or width can gen- erate substandard speech or speech that does not play or makes screeching noises on the PhatBox. There are actually two separate issues here: 1) The MS Speech API introduces resampling errors at bitrates other than the native Natural Voices rates (16KHz for voices ending in "16", otherwise 8Khz). 2) The PhatBox can only play 16-bit files at 8KHz, 11KHz, 22KHz, and 44KHz (newer versions can also play 8-bit files). So, if you have an 8KHz Natural Voice engine, select "8KHz 16 Bit Mono". If you have a 16KHz engine, things are more complicated. Your best bet is to select "16KHz 16 Bit Mono" and use an out-board utility to re-encode to 22KHz 16 bit mono. If you don't have such a utility or it is too much trou- ble, you can select "44KHz 16 Bit Mono" with some slight loss of speech quality and a substantial disk space penalty. Pending input from PhatNoise regarding plans to add 16Khz support to the PhatBox, we may either add resampling to PhatVoice or provide an outboard conversion utility. ------ Defect/Enhancement ID: D/E 0005 Affected Version(s): 2.0-alpha and subsequent Fixed-in Version: Status: open Summary: Uninstalled Natural Voices generate "Speak Error" messages. Description: While the PhatVoice "Voice" selection box will only allow the user to select from the installed Natural Voices, it is possible to request a different voice in a substitution regexp. For example: <voice required="name=Klara16">$1</voice> If the selected voice is not installed on the system where PhatVoice is running, any attempt to use that voice will generate a "Speak Error" message. ------ Defect/Enhancement ID: D/E 0014 Affected Version(s): 2.0-alpha and subsequent Fixed-in Version: Status: open Summary: PhatVoice should be statically linked with the Perl library. Description: PhatVoice 2.0-alpha and subsequent requires a Perl Library in order to perform regular expression substitutions. Having an outboard Perl DLL means that a) users have one more file to worry about and b) might have an incompatible DLL on their system. ------ Defect/Enhancement ID: D/E 0016 Affected Version(s): 2.0-alpha and subsequent Fixed-in Version: Status: open Summary: Need additional option for only generating hinted speech. Description: Instead of the current checkbox for "Always Replace Output Files", have 3 radio buttons: "Always Replace Output Files", "Only Replace Hinted Files", and "Never Replace Output Files". This will let users rapidly re-generate only the hinted speech if desired.

Contents

PhatVoice User's Guide

4.1.4 Obtaining additional Natural Voices

4.2 Testing your pronunciations

4.2.1 Testing hints using PhatVoice

4.2.2 Testing hints using the Microsoft Speech engine

4.2.3 Testing hints using the AT&T Natural Voices engine

4.2.4 Generating debugging output to to verify pronunciations

Chapter 5Uninstalling PhatVoice

5.1 Uninstalling using the Uninstall PhatVoice Start Menu item

5.2 Manually uninstalling PhatVoice

Chapter 6Changes from previous versions, corrected problems, and known problems

6.1 Changes from previous versions

6.2 Corrected problems

6.3 Known problems

Chapter 5
Uninstalling PhatVoice

Chapter 6
Changes from previous versions, corrected problems, and known problems