1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.
  2. If you had a PIAF Forum account in the vBulletin days, log in with your old credentials. Otherwise, sign up again and we'll get you back in business as soon as we can.
  3. A serious FreePBX vulnerability has been reported. Update your Framework Module immediately. Click here for details.
  4. Critical FreePBX vulnerability! Update your server immediately. Details here.

PIONEERS Exploring Speech to Text

Discussion in 'Developers' Corner' started by wardmundy, Jan 12, 2012.

  1. wardmundy Nerd Uno

    [IMG]

    I don't use F-A-N-T-A-S-T-I-C quite as often as Steve Jobs, but Lefteris Zafiris has really outdone himself this time around. His new Asterisk AGI script gives you near perfect speech-to-text recognition using Google's speech recognition service. And it's FREE!


    1. Install the AGI script by logging into your server and issuing the following commands:

    Code:
    cd /root
    wget --no-check-certificate https://github.com/downloads/zaf/asterisk-speech-recog/asterisk-speech-recog-0.4.tar.gz
    tar zxvf asterisk-speech*
    cd asterisk-speech-recog-0.4
    cp speech-recog.agi /var/lib/asterisk/agi-bin/.
    cd /etc/asterisk
    nano -w extensions_custom.conf
    

    2. Now add the following sample code to /etc/asterisk/extensions_custom.conf at the top of the [from-internal-custom] context:

    Code:
    exten => 77325,1,Answer()
    exten => 77325,n,flite("Say something in English, when done press the pound key.")
    exten => 77325,n(record),agi(speech-recog.agi,en-US)
    exten => 77325,n,Noop(= Script returned: ${status} , ${id} , ${confidence} , ${utterance} =)
    ;exten => 77325,n,GotoIf($["${status}" = "0"]?success:fail)
    exten => 77325,n,flite("${utterance}")
    exten => 77325,n,flite("Have a nice day! Good bye.")
    exten => 77325,n,hangup
    
    exten => 2255,1,Answer()
    exten => 2255,2,Wait(1)
    exten => 2255,3,flite("Say the number you wish to call. Then press the pound key.")
    exten => 2255,4(record),agi(speech-recog.agi,en-US)
    exten => 2255,5,Noop(= Script returned: ${status} , ${id} , ${confidence} , ${utterance} =)
    exten => 2255,6,Set(NUM2CALL=${utterance})
    exten => 2255,7,SayDigits("${NUM2CALL}")
    exten => 2255,8,Background(vm-star-cancel)
    exten => 2255,9,Background(vm-tocallnum)
    exten => 2255,10,Read(PROCEED,beep,1)                                        
    exten => 2255,11,GotoIf($["foo${PROCEED}" = "foo1"]?12:13)
    exten => 2255,12,Goto(outbound-allroutes,${NUM2CALL},1)
    exten => 2255,13,hangup
    

    3. Reload your dialplan: asterisk -rx "dialplan reload"

    4. Now pick up a phone and dial S-P-E-A-K. When prompted, say a few words and press #. The speech-to-text script will pass your memorable words to Google, have it converted to text, and then say it back to you using Flite's Egor.

    5. Next, pick up a phone and dial C-A-L-L. When prompted, say a phone number to dial and press #. Listen to the playback of the number. If it is correct, press 1 to place the call. Stay tuned for loads of apps!

    :party::party::party::party::party:
  2. tbrummell Guru

    Ohhh, can't wait for someone to make it transcribe a voicemail that is left and then email it with the email notification. Let the waiting begin!
  3. randy7376 Guru

    That's exactly what I was thinking when I read Ward's portion of this thread!

    I wish I had more time to play... :)
  4. wardmundy Nerd Uno

    Added another sample above. Email transcription shouldn't be that hard. There already are articles on how to do it. http://nerd.bz/zEUqfu
  5. rossiv Guru

    I just tried it on my PIAF2 box and it works! Voice transcriptions were almost perfect on my tries as were the numbers for Speak2Dial. YAY!
  6. lgaetz Pundit

    This is a game changer, or it could be. Is this a legitimate use of Google's S2T API or are we in for the same experience that has plagued successful GV integration?
  7. wardmundy Nerd Uno

    Good question. There's not much info from Google on this, mostly from third parties. The AGI script essentially masquerades as the Chrome web browser to use Google's freely available public service. Google could certainly add a layer of encryption if they wanted to keep the public out. There now are patent trolls to deal with as well. My guess is you probably can expect the same sort of Wild West ride that everyone came to know and love with Google Voice. :cowboyb:
  8. wardmundy Nerd Uno

    Speech to Text for Voicemails

    For those that want to experiment, here's a very rough cut at what would be needed to transcribe voicemails.

    1. Install the perl script that's included in the open source tarball above:

    Code:
    cd /root/asterisk-speech-recog-0.4/samples
    cp speech-recog-cli.pl /usr/local/sbin/.
    

    2. Copy any Asterisk voicemail message to a temporary folder. You'll find the messages in the directory tree for a particular extension, and they look like this:

    Code:
    /var/spool/asterisk/voicemail/default/[COLOR="Red"]702[/COLOR]/INBOX/msg0000.wav
    

    3. Run the following two commands to convert the voicemail to a Google-supported sound format and then pass the sound file to Google to do the heavy lifting transcribing the voicemail message via the open source perl script:

    Code:
    flac --best --sample-rate=8000 msg0000.wav -o msg0000.flac
    speech-recog-cli.pl msg0000.flac | head -2 | tail -1 | cut -f 2 -d ":"
    

    4a. The raw output from using speech-recog-cli.pl will look like this:

    Code:
    Openning msg0000.flac
    utterance  : here is a sample voicemail message that I'm going to leave after the tone have a nice day
    status     : 0
    confidence : 0.9633785  [COLOR="Magenta"]<-- The likelihood that the transcription is accurate, 96% in this case[/COLOR]
    id         : ac9869b4a460bae157a793245bdc0f36
    

    4b. After massaging with | head -2 | tail -1 | cut -f 2 -d ":", you get this:

    Code:
     here is a sample voicemail message that I'm going to leave after the tone have a nice day
    

    Enjoy! :hat:
  9. Very nice! Another possible use of this might be to allow a user to dial a feature code, record some speech, get it transcribed, and have it e-mailed to the address at which they normally receive voicemail notifications (as defined in the extension's settings).

    I haven't played with this yet (much too early in the morning) but the possibilities are quite interesting.
  10. lgaetz Pundit

    Dictation! I never even thought of that and it would be the most useful application for me. I have been doing some thinking about how to integrate this with an IVR, but my skills are not up to that. If google doesn't pull the rug out on this one, IVR's are sure to be the most requested feature.
  11. darmock PIAF Developer

    Yep and one of the most litigious ones also. Just imagine the patent trolls crawling out of the woodwork suing everyone in sight. All in the name of the almighty dollar. I keep wondering how google has gotten away with it..... Of course they have more lawyers on tap than our freedom loving government.... (sarcasm intentional)

    I can see them going after all the ip addresses that use this service and suing the lot. Amazing what blanket search warrants can do and they sure are easy to get unless you are the government and you don't need one any more..... Course we could use the tor network......

    Tom

    Sorry in a ranting mood this morning
  12. lgaetz Pundit

    Tried an install to PIAF ver. 1.7.5.6 PURPLE using the directions above. The first problem was the wget wouldn't work until I changed it to:
    Code:
    wget --no-check-certificate https://github.com/downloads/zaf/asterisk-speech-recog/asterisk-speech-recog-0.4.tar.gz
    Install seems to proceed just fine from there, but when trying out the sample feature code, S-P-E-A-K I get "Say something in English, when done press the pound key." Immediately followed by "Have a nice day! Good bye." It doesn't wait for me to say anything. Thinking it might be a permissions problem, I changed the script ownership to asterisk:asterisk 0777, the full asterisk log shows the script exits with code 0 indicating no errors, but I can't get it to work on this system. Anyone have any ideas? Is it possible that this PIAF version doesn't have the necessary dependencies, i.e. flac?

    *edit* Posts that follow indicate I am missing flac. This command:
    Code:
    yum install flac
    fixed it up for me.
  13. darmock PIAF Developer

    The short answer is yes it does not have all the dependencies. The long answer is about 20 pages long. Can 1757 get all the new dependencies installed? Yes.

    Keep a list if you get it working. It may be something simple or complex.

    Tom
  14. lgaetz Pundit

    So for the benefit of others, what version of PIAF is required for this to work unmodified? PIAF 2+? Ward can you amend your install directions in post 1 to include this?

    Tom, this is a production system so I will be stopping there. My sandbox at home is due for a 2.x install so I will be going that route, provided Centos 6 will install on my coal fired Athlon.
  15. darmock PIAF Developer

    Unmodified will work with PIAF 2062X Unknown if it will work with 2060X or 2061X. A definite maybe.

    May not work at all with any other prior version of PIAF without modifications up to and including modification of dialplans, installation of new dependencies, recompilation of multiple programs. I just cant predict it.

    What in the world are you doing installing this on a production system anyway? That is not a good thing ever. This code does not even qualify as alpha.... It is just something to play around with until it is more formalized.

    Unfortunately we dont have the resources to test with anything other than the current version of PIAF. So for clarity we are only testing on the current version of PIAF which is today 2.0.6.2.x. We no longer develop new stuff for anything other than the 2.0.6.2.x and above tree.

    The 1.7.5.7.X tree is the last stable version of the 1.7 tree and it is in security fix only mode. We no longer actively develop new products for it.

    Any new programs that we have released in the last couple of weeks or so only work on 2.0.6.2.X or above. So if you have a 2060x or 2061x box it might be time to upgrade it to Centos 6.2 using update source. You would need to update the kernels then let it do a yum update. Then let update-source continue as dahdi gets broken when you update the kernel and requires a recompile. Of course you can just do it all by hand if that is what you want. It is your box and you have a right to do anything to it.


    Tom
  16. lgaetz Pundit

    This is completely consistent behavior for my recent appointment as court jester. If things had have gone pear shaped, my post would would have started as "Help my boss is really mad at me now..." Luckily God has a soft spot for idiots.
  17. rg00dman Guru

    Well this will be another weekend I am not doing what the wife wants me to :) Just wondering in the first example given could it be configured so instead of reading back the number you say someones name it checks the phone book and calls them? I cant imagine it will be too hard (I hope), will give it a go later but if anyone has any ideas how that would be great.

    Had my PBX for over 3 years now and it just keeps getting better and better.
  18. For those who want to experiment with a speech to e-mail (text) application for your own personal use, you might try dropping something like this into extensions_custom.conf. I am only posting this as an example of what might work, and because of darmock's comment I'm specifically NOT saying anyone should actually try this. If you do try it then it's at your own risk (including any legal risks):

    Code:
    exten => 788,1,Answer
    exten => 788,n,Macro(user-callerid,)
    exten => 788,n,Noop(CallerID is ${AMPUSER})
    exten => 788,n,Set(DICTEMAIL=${DB(AMPUSER/${AMPUSER}/dictate/email)})
    exten => 788,n,Set(NAME=${DB(AMPUSER/${AMPUSER}/cidname)})
    ; exten => 788,n,Playback(silence/1&after-the-tone&custom/say-msg-prs-pound)
    exten => 788,n,agi(speech-recog.agi,en-US)
    exten => 788,n,Noop(= Script returned: ${status} , ${id} , ${confidence} , ${utterance} =)
    exten => 788,n,System(echo "${utterance}" | mail -s "Dictation from ${NAME} converted to text with ${confidence} confidence" ${DICTEMAIL});
    exten => 788,n,Playback(goodbye)
    exten => 788,n,hangup
    (EDIT: Uncomment the commented out line if you use my suggestion from post #24 in this thread)

    For this to work there must be a valid e-mail address in the "Dictation Services"/"Email Address" setting of the calling extension. Since this is simply a suggestion of what might work, there is no error checking and no announcements of any kind (I personally hate the quality of "Flite" synthesized speech, so you won't find it in any of my examples). You dial STT ("Speech To Text") and if the moon and the planets are all in proper alignment AND you don't hang up prematurely (after you stop talking you must wait until you hear "goodbye") your speech converted to text just might be e-mailed to the Dictation Services e-mail address for your extension.

    Please check with YOUR lawyer before you assume it's okay to use this, especially in any application that's even remotely commercial. I personally don't care if someone else builds on this, but someone else might. I also happen to think that our so-called "intellectual property" laws need to be seriously overhauled or even abolished altogether (ideas are not the same thing as property and never will be, no matter how many lawyers are willing to stand up in court and tell that lie under oath), but I'm too old to be trying to start any reform movements. The fact that we are in this state is just one symptom of the real cancer on our society, which is the amount of influence big corporations have over our elected officials, but that's another rant for anther forum (and this thought would not have even occurred to me in relation to this topic had it not been for darmock's comment).
  19. darmock PIAF Developer


    Ouch butt hurt when I fell off the chair laughing too hard! You owe me 400 quatloos for medical expenses.....

    Not to mention idiot developers...... sigh sometimes it is easier to just slam my head in the door......

    I understand however.

    Tom :crazy:
  20. wardmundy Nerd Uno

    I had reworked the download to make it simpler from GitHub. :crazy: But I substituted the production code link for the experimental one. I tested the latter and it worked fine. So it may be that this was a bug in the current production code. You might try downloading the latest and greatest to see if that fixes the problem. There's nothing that needs to be done to a base PIAF2 install to get it working.

Share This Page