PIONEERS Exploring Speech to Text

lzaf

Guru
Joined
Jan 12, 2012
Messages
13
Reaction score
2
Has anyone tried out lzaf's googletts script that uses google's text to speech? I am curious to know if the quality of speech is any better than flite?

There is a cli application (under the folder cli ;) ) that you can use to get an idea of the produced speech quality without involving asterisk.
Having worked a lot with speech synthesis and developed many asterisk tts modules that work with differed engines I have to admit that google's engine gives by far the best results. The sound is very natural and as close to a natural voice as possible, plus the fact that it supports a wide variety of languages. To my ears it even beats most of the expensive commercial engines available for telephony applications.
Possible drawbacks are the facts that the terms of use are not yet clearly defined and that you have to contact a remote server to get the voice data.
I would not suggest to be used in a production environment yet, at least until google defines the exact terms of use for that service, but for home/hobbyist/hackish :biggrin5: use i think its the best free available option.
 

KUMARULLAL

Guru
Joined
Feb 20, 2008
Messages
243
Reaction score
28
I was using older PBIAF. Some of the things I installed it manually.
To install mpg123 you need to download the rpm package
Code:
wget ftp://195.220.108.108/linux/dag/redhat/el5/en/i386/dag/RPMS/mpg123-1.9.1-1.el5.rf.i386.rpm


Then install rpm by rpm -ivh mpg123*.rpm

You also need to
Code:
yum install libesd.so.0
sox must be installed, if not
Code:
yum install sox
To install perl-libwww
Code:
yum install perl-libwww-perl
You also need
Code:
yum install perl-XML-Simple

You need to download googtts from lzaf's website
Hope this helps
 

wardmundy

Nerd Uno
Joined
Oct 12, 2007
Messages
19,168
Reaction score
5,199
Nooooooooo. I'm enjoying every minute. Can't wait to try it. This is what the Developer's Corner is for!

FYI: All of the dependencies outlined above already are included in PIAF2 with the exception of perl-XML-Simple. It was added to new installs a few days ago.

Here is the download syntax to get the current release of Googletts:

Code:
wget --no-check-certificate https://github.com/downloads/zaf/asterisk-googletts/asterisk-googletts-0.5.tar.gz


And here is an excerpt from the documentation showing dialplan usage:


-----
Usage
-----
agi(googletts.agi,text,[language],[intkey]): This will invoke the Google TTS engine,
render the text string to speech and play it back to the user. If 'intkey' is set
the script will wait for user input. Any given interrupt keys will cause the playback
to immediately terminate and the dialplan to proceed to the matching extension (for use in IVR).

The script contacts google's TTS service in order to get the voice data
which then stores in a local cache (by default /tmp/) for future use.
Parameters like default language, enabling or disabling caching and cache dir
can be set up by editing the script.

--------
Examples
--------
sample dialplan code for your extensions.conf

;GoogleTTS Demo
;PLayback messages to user

exten => 1234,1,Answer()
;;Play mesage in English:
exten => 1234,n,agi(googletts.agi,"This is a simple google text to speech test in english.",en)
;;Play message in Spanish
exten => 1234,n,agi(googletts.agi,"Esta es una simple prueba en español.",es)
;;Play message in Greek
exten => 1234,n,agi(googletts.agi,"Αυτό είναι ένα απλό τέστ στα ελληνικά.",el)

;A simple dynamic IVR using GoogleTTS

[my_ivr]
exten => s,1,Answer()
exten => s,n,Set(TIMEOUT(digit)=5)
exten => s,n,agi(googletts.agi,"Welcome to my small interactive voice response menu.",en)
;;Wait for digit:
exten => s,n(start),agi(googletts.agi,"Please dial a digit.",en,any)
exten => s,n,WaitExten()

;;PLayback the name of the digit and wait for another one:
exten => _X,1,agi(googletts.agi,"You just pressed ${EXTEN}. Try another one please.",en,any)
exten => _X,n,WaitExten()

exten => i,1,agi(googletts.agi,"Invalid extension.",en)
exten => i,n,goto(s,start)

exten => t,1,agi(googletts.agi,"Request timed out.",en)
exten => t,n,goto(s,start)

exten => h,1,Hangup()

-------------------
Supported Languages
-------------------
"af" Afrikaans, "sq" Albanian, "am" Amharic, "ar" Arabic, "hy" Armenian, "az" Azerbaijani,
"eu" Basque, "be" Belarusian, "bn" Bengali, "bh" Bihari, "bs" Bosnian, "br" Breton, "bg" Bulgarian,
"km" Cambodian, "ca" Catalan, "zh-CN" Chinese (Simplified), "zh-TW" Chinese (Traditional),
"co" Corsican, "hr" Croatian, "cs" Czech, "da" Danish, "nl" Dutch, "en" English, "eo" Esperanto,
"et" Estonian, "fo" Faroese, "tl" Filipino, "fi" Finnish, "fr" French, "fy" Frisian, "gl" Galician,
"ka" Georgian, "de" German, "el" Greek, "gn" Guarani, "gu" Gujarati, "xx-hacker" Hacker, "ha" Hausa,
"iw" Hebrew, "hi" Hindi, "hu" Hungarian, "is" Icelandic, "id" Indonesian, "ia" Interlingua, "ga" Irish,
"it" Italian, "ja" Japanese, "jw" Javanese, "kn" Kannada, "kk" Kazakh, "rw" Kinyarwanda,
"rn" Kirundi, "xx-klingon" Klingon, "ko" Korean, "ku" Kurdish, "ky" Kyrgyz, "lo" Laothian,
"la" Latin, "lv" Latvian, "ln" Lingala, "lt" Lithuanian, "mk" Macedonian, "mg" Malagasy,
"ms" Malay, "ml" Malayalam, "mt" Maltese, "mi" Maori, "mr" Marathi, "mo" Moldavian, "mn" Mongolian,
"sr-ME" Montenegrin, "ne" Nepali, "no" Norwegian, "nn" Norwegian (Nynorsk), "oc" Occitan, "or" Oriya,
"om" Oromo, "ps" Pashto, "fa" Persian, "xx-pirate" Pirate, "pl" Polish, "pt-BR" Portuguese (Brazil),
"pt-PT" Portuguese (Portugal), "pa" Punjabi, "qu" Quechua, "ro" Romanian, "rm" Romansh, "ru" Russian,
"gd" Scots Gaelic, "sr" Serbian, "sh" Serbo-Croatian, "st" Sesotho, "sn" Shona, "sd" Sindhi,
"si" Sinhalese, "sk" Slovak, "sl" Slovenian, "so" Somali, "es" Spanish, "su" Sundanese, "sw" Swahili,
"sv" Swedish, "tg" Tajik, "ta" Tamil, "tt" Tatar, "te" Telugu, "th" Thai, "ti" Tigrinya, "to" Tonga,
"tr" Turkish, "tk" Turkmen, "tw" Twi, "ug" Uighur, "uk" Ukrainian, "ur" Urdu, "uz" Uzbek,
"vi" Vietnamese, "cy" Welsh, "xh" Xhosa, "yi" Yiddish, "yo" Yoruba, "zu" Zulu.

-------
License
-------
The GoogleTTS script for asterisk is distributed under the GNU General Public
License v2. See COPYING for details.
 

mvoip

New Member
Joined
Dec 8, 2010
Messages
15
Reaction score
2
This is my first post here. This forum and PIAF application itself is great. Thank you all for your time.
I am trying to use lzaf's googletts for google tts for iRiss (instead of flite). Everything works fine until foo variable is passed to googletts AGI. For some reason, googletts AGI cannot process text in foo variable (/tmp/results.txt). I see that the value has been passed in asterisk log file and even see *.sln file in temp folder but it does not play back. Any suggestions?:banghead:

Thank you guys for your great work.
 

lzaf

Guru
Joined
Jan 12, 2012
Messages
13
Reaction score
2
For some reason, googletts AGI cannot process text in foo variable (/tmp/results.txt). I see that the value has been passed in asterisk log file and even see *.sln file in temp folder but it does not play back. Any suggestions?.

The idea of passing a text file as input to an agi script has some shortcomings. Newlines, quotes and some special characters might confuse the agi script and the way it handles stdin to get its arguments. As a first step i would suggest you to strip newlines and quotes from your text (results.txt) before passing it to googletts.agi
 

wardmundy

Nerd Uno
Joined
Oct 12, 2007
Messages
19,168
Reaction score
5,199
Code:
exten => 444,1,Answer()
exten => 444,n,Wait(1)
exten => 444,n,agi(googletts.agi,"As Rick Perry would say, Ooops! Something went haywire. Please try your call again later.",en)
exten => 444,n,hangup

If this code doesn't work, then you're missing one or more of the dependencies. sox or mpg123 are likely culprits. HINT: There are no problems on default PIAF2 systems.
 

mvoip

New Member
Joined
Dec 8, 2010
Messages
15
Reaction score
2
The idea of passing a text file as input to an agi script has some shortcomings. Newlines, quotes and some special characters might confuse the agi script and the way it handles stdin to get its arguments. As a first step i would suggest you to strip newlines and quotes from your text (results.txt) before passing it to googletts.agi
Thanks for your feedback. I have, however, tried putting plain sentence without any punctuation on result.txt file and then pass that file as variable, but still not getting anywhere. Here is the piece of code from extension_custom.conf file:

;code for wolfram alpha answerer
;added 2012-01-15
;begin
exten => 4747,1,Answer()
exten => 4747,2,Wait(1)
exten => 4747,3,agi(googletts.agi,"How can Eye Riss help you? Press the pound key when you're finished.",en)
exten => 4747,4(record),agi(speech-recog.agi,en-US)
exten => 4747,5,Noop(= Script returned: ${status} , ${id} , ${confidence} , ${utterance} =)
exten => 4747,6,agi(googletts.agi,"${utterance}",en)
exten => 4747,7,Background(vm-star-cancel)
exten => 4747,8,Background(continue-english-press)
exten => 4747,9,Background(digits/1)
exten => 4747,10,Read(PROCEED,beep,1)
exten => 4747,11,GotoIf($["foo${PROCEED}" = "foo1"]?12:14)
exten => 4747,12,Set(FILE(/tmp/query.txt)=${utterance})
exten => 4747,13,Background(one-moment-please)
exten => 4747,14,System(/var/lib/asterisk/agi-bin/iriss)
exten => 4747,15,Set(foo=${FILE(/tmp/results.txt)})
;exten => 4747,16,flite("${foo}")
exten => 4747,16,agi(googletts.agi,"${foo}",en)

;exten => 4747,17,flite("Have a nice day! Good bye.")
exten => 4747,17,agi(googletts.agi,"Have a nice day! Good bye.",en)
exten => 4747,18,hangup
;end

The one with red colour does not work with gogletts. It works fine with flite (flite is commented for testing). All the rest work fine.

Any help is appreciated. Thanks.
 

mvoip

New Member
Joined
Dec 8, 2010
Messages
15
Reaction score
2
Code:
exten => 444,1,Answer()
exten => 444,n,Wait(1)
exten => 444,n,agi(googletts.agi,"As Rick Perry would say, Ooops! Something went haywire. Please try your call again later.",en)
exten => 444,n,hangup

If this code doesn't work, then you're missing one or more of the dependencies. sox or mpg123 are likely culprits. HINT: There are no problems on default PIAF2 systems.
Thanks Ward. The text enclosed in quotes work fine. I was trying to pass it as a variable to read content of result.txt file.
 

ghurty

Senior Member
Joined
Jan 13, 2009
Messages
852
Reaction score
4
This googletts sounds better then Allision!

I have been fooling around with it, however when I try to pass on a number, instead of reading it out as a whole number (one thousand five hundred and forty five), it reads out the individual digits.

Any suggestions?

Thanks
 

lzaf

Guru
Joined
Jan 12, 2012
Messages
13
Reaction score
2
when I try to pass on a number, instead of reading it out as a whole number (one thousand five hundred and forty five), it reads out the individual digits.

For numbers up to 9 digits the engine will read it as a whole number (eg 284956286 will be read like "two hundred eighty-four million four hundred ninety-five thousand two hundred eighty-five").
For more than 9 digits it will read each digit individually.
(eg 1284956286 will be read like "one two eight four nine five ... etc)

This is something that cannot be tuned as far as i know.
 

wardmundy

Nerd Uno
Joined
Oct 12, 2007
Messages
19,168
Reaction score
5,199
Try this approach. Add a colon or space between digits for normal cadence, or add a colon and a comma for a pause after a digit is spoken. In a future version of googletts.agi, perhaps a syntax could be added to handle this automagically, e.g. "[843-123-4567]" or "[8431234567]" would actually send Google the string as shown below. This gets a little more complex with international dialing obviously. If you're grabbing a CallerID number and passing it to this AGI script, then you obviously want to pass the CallerID number in the way it was received (which is typically all digits with no punctuation).

Code:
exten => 444,n,agi(googletts.agi,"8 4 3:,1:2:3:,4:5:6:7",en)
 

sukasem

Guru
Joined
Sep 13, 2008
Messages
142
Reaction score
26
Hi,
Anyway that asterisk-speech-recog script will take both key in digit and voice input as well.

And maybe, some magic words that make script process right away like when you say Yes, No, or Stop...

Cheers,
 

lzaf

Guru
Joined
Jan 12, 2012
Messages
13
Reaction score
2
Hi,
Anyway that asterisk-speech-recog script will take both key in digit and voice input as well. And maybe, some magic words that make script process right away like when you say Yes, No, or Stop...

Speech recognition is not happening in real time. The voice data is first recorded and then send over to google for processing. This makes a voice controlling mechanism of the application highly impossible.
 

lzaf

Guru
Joined
Jan 12, 2012
Messages
13
Reaction score
2
The only thing that I can think of is a silence timeout. Is that feasible? Is silence in a phone audio stream difficult to define or detect?

That's actually a good idea, and yes it is possible. I 've just tweaked the script adding silence detection. Now after 3 seconds of silence the recording will stop and the script will proceed sending voice data to google and getting back the results. Keep in mind that silence detection is not always perfect and might not work very well on some old analog or low quality phones that add static noise or if there's lots of background environment noise.
The latest code can be found here. I'm not sure if the 3 seconds timeout is practical, I m always open to suggestions.
Have fun testing it :biggrin5:
 

wardmundy

Nerd Uno
Joined
Oct 12, 2007
Messages
19,168
Reaction score
5,199
3 seconds actually works pretty well. I've cleaned out all the previous calls so you can try the demo link for yourself: 1-405-FOR-WOLF. Everything can be triggered by doing nothing after the prompts. Here's the actual dialplan code for those that are curious:


Code:
; Wolfram Alpha Dialplan Interface for PIAF2 servers
exten => 4748,1,Answer()
exten => 4748,2,Wait(1)
exten => 4748,3,Set(calledbefore=${DB_EXISTS(blacklist/${CALLERID(num)})
exten => 4748,4,Noop(${CALLERID(num)})
exten => 4748,5,Noop(${calledbefore})
exten => 4748,6,GotoIf($["foo${calledbefore}" = "foo1"]?11:51)
exten => 4748,7,Goto(90)
exten => 4748,10,Set(removed=${DB_DELETE(blacklist/${CALLERID(num)/${CALLERID(num)})})
exten => 4748,11,Flite("Hi. Thanks for calling. We're very sorry. In order to give everyone an opportunity to try this service, we've had to limit calls to one call per person: You still can beat the system. Just call back from a different phone number. Have a great day. Good bye.")
exten => 4748,12,Goto(91)
exten => 4748,50,Set(DB(blacklist/${CALLERID(num)})=${CALLERID(num))
exten => 4748,51,swift("Seriously,, After the beep, Say your question, then Press the pound key, or remain quiet.")
exten => 4748,52(record),agi(speech-recog.agi,en-US)
exten => 4748,53,Noop(= Script returned: ${status} , ${id} , ${confidence} , ${utterance} =)
exten => 4748,54,swift("${utterance}")
exten => 4748,55,Background(vm-star-cancel)
exten => 4748,56,Background(continue-english-press)
exten => 4748,57,Background(digits/1)
exten => 4748,58,Read(PROCEED,beep,1,,1,3)                                        
exten => 4748,59,GotoIf($["foo${PROCEED}" = "foo1"]?70)
exten => 4748,60,GotoIf($["foo${PROCEED}" = "foo"]?70:90)
exten => 4748,70,Set(DB(blacklist/${CALLERID(num)})=${CALLERID(num))
exten => 4748,71,Set(FILE(/tmp/query.txt)=${utterance})
exten => 4748,72,Background(one-moment-please)
exten => 4748,73,System(/var/lib/asterisk/agi-bin/4747)
exten => 4748,74,Set(foo=${FILE(/tmp/results.txt)})
exten => 4748,75,swift("${foo}")
exten => 4748,76,Goto(90)
exten => 4748,90,swift("Have a nice day! Good bye.")
exten => 4748,91,hangup
 

Aaron D. Vail

New Member
Joined
May 27, 2013
Messages
6
Reaction score
0
I had posted in this thread about 6 months ago, and unfortunately it disappeared. And even worse, so did the response. Now I've changed things up a bit 6 months ago i was running on PIAF Purple, and now I am on PIAF Green. I hoped that my post (and the reply that fixed it) was still here, but in moving to Green, I lost my VM script that combined this thread with a MP3 script. The plus side id I can get the below test to work now, as it wouldn't on my abused install of Purple.
Code:
flac --best --sample-rate=8000 msg0000.wav -o msg0000.flac
speech-recog-cli.pl msg0000.flac | head -2 | tail -1 | cut -f 2 -d ":"

So my original post stated that I don't know PERL at all, and yet while I can debug it sorta, I still get lost in the below script. What I would like to do is some how execute the above commands in the script below and have the results from above parsed into the temp email file. The ending result is that I get an email with transcription and attached MP3 instead of wav file. Now the MP3 Code....
Code:
#!/usr/bin/perl
open(VOICEMAIL,"|/usr/sbin/sendmail -t");
open(LAMEDEC,"|/usr/bin/dos2unix|/usr/bin/base64 -di|/usr/local/bin/lame --quiet --preset voice - /var/spool/asterisk/tmp/vmout.$$.mp3");
open(VM,">/var/spool/asterisk/tmp/vmout.debug.txt");
my $inaudio = 0;
loop: while(<>){
  if(/^\.$/){
    last loop;
  }
  if(/^Content-Type: audio\/x-wav/i){
    $inaudio = 1;
  }
  if($inaudio){
    while(s/^(Content-.*)wav(.*)$/$1mp3$2/gi){}
    if(/^\n$/){
      iloop: while(<>){
        print LAMEDEC $_;
        if(/^\n$/){
          last iloop;
        }
      }
      close(LAMEDEC);
      print VOICEMAIL "\n";
      print VM "\n";
      open(B64,"/usr/bin/base64 /var/spool/asterisk/tmp/vmout.$$.mp3|");
      while(<B64>){
        print VOICEMAIL $_;
    print VM $_;   
      }
      close(B64);
      print VOICEMAIL "\n";
      print VM "\n";
      $inaudio = 0;
    }
  }
  print VOICEMAIL $_;
  print VM $_;
}
print VOICEMAIL "\.";
print VM "\.";
close(VOICEMAIL);
close(VM);
 
#CLEAN UP THE TEMP FILES CREATED
#This has to be done in a separate cron type job
#because unlinking at the end of this script is too fast,
#the message has not even gotten piped to send mail yet

So any help or possible restore of the "missing" or "removed" posts would be GREATLY appreciated, until then I'll just get MP3's (less space on my phone when I receive them).

Aaron
 

Members online

No members online now.

Forum statistics

Threads
25,778
Messages
167,504
Members
19,198
Latest member
serhii
Get 3CX - Absolutely Free!

Link up your team and customers Phone System Live Chat Video Conferencing

Hosted or Self-managed. Up to 10 users free forever. No credit card. Try risk free.

3CX
A 3CX Account with that email already exists. You will be redirected to the Customer Portal to sign in or reset your password if you've forgotten it.
Top