1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.
  2. If you had a PIAF Forum account in the vBulletin days, log in with your old credentials. Otherwise, sign up again and we'll get you back in business as soon as we can.
  3. Please keep Message Subjects/Titles short until we can get suhosin and mod_security issues resolved. Otherwise, your thread cannot be read, edited, or deleted. Thanks.

Exploring Speech to Text

Discussion in 'Developers' Corner' started by wardmundy, Jan 12, 2012.

  1. ghurty Senior Member

    This googletts sounds better then Allision!

    I have been fooling around with it, however when I try to pass on a number, instead of reading it out as a whole number (one thousand five hundred and forty five), it reads out the individual digits.

    Any suggestions?

    Thanks
  2. wardmundy Nerd Uno

  3. lzaf Guru

    For numbers up to 9 digits the engine will read it as a whole number (eg 284956286 will be read like "two hundred eighty-four million four hundred ninety-five thousand two hundred eighty-five").
    For more than 9 digits it will read each digit individually.
    (eg 1284956286 will be read like "one two eight four nine five ... etc)

    This is something that cannot be tuned as far as i know.
  4. wardmundy Nerd Uno

    Try this approach. Add a colon or space between digits for normal cadence, or add a colon and a comma for a pause after a digit is spoken. In a future version of googletts.agi, perhaps a syntax could be added to handle this automagically, e.g. "[843-123-4567]" or "[8431234567]" would actually send Google the string as shown below. This gets a little more complex with international dialing obviously. If you're grabbing a CallerID number and passing it to this AGI script, then you obviously want to pass the CallerID number in the way it was received (which is typically all digits with no punctuation).

    Code:
    exten => 444,n,agi(googletts.agi,"8 4 3:,1:2:3:,4:5:6:7",en)
  5. sukasem Guru

    Hi,
    Anyway that asterisk-speech-recog script will take both key in digit and voice input as well.

    And maybe, some magic words that make script process right away like when you say Yes, No, or Stop...

    Cheers,
  6. lzaf Guru

    Speech recognition is not happening in real time. The voice data is first recorded and then send over to google for processing. This makes a voice controlling mechanism of the application highly impossible.
  7. lgaetz Pundit

    In the back of my mind I have been thinking that S2T should be useful for the rotary phone enthusiast crowd, the devices are still useable but it is getting harder and harder to get TDM/ATA devices that will accept pulse dialing. The only thing that I can think of is a silence timeout. Is that feasible? Is silence in a phone audio stream difficult to define or detect?
  8. lzaf Guru

    That's actually a good idea, and yes it is possible. I 've just tweaked the script adding silence detection. Now after 3 seconds of silence the recording will stop and the script will proceed sending voice data to google and getting back the results. Keep in mind that silence detection is not always perfect and might not work very well on some old analog or low quality phones that add static noise or if there's lots of background environment noise.
    The latest code can be found here. I'm not sure if the 3 seconds timeout is practical, I m always open to suggestions.
    Have fun testing it :biggrin5:
  9. lgaetz Pundit

    Perhaps a user selectable number of seconds with default of zero to disable it.
  10. wardmundy Nerd Uno

    3 seconds actually works pretty well. I've cleaned out all the previous calls so you can try the demo link for yourself: 1-405-FOR-WOLF. Everything can be triggered by doing nothing after the prompts. Here's the actual dialplan code for those that are curious:


    Code:
    ; Wolfram Alpha Dialplan Interface for PIAF2 servers
    exten => 4748,1,Answer()
    exten => 4748,2,Wait(1)
    exten => 4748,3,Set(calledbefore=${DB_EXISTS(blacklist/${CALLERID(num)})
    exten => 4748,4,Noop(${CALLERID(num)})
    exten => 4748,5,Noop(${calledbefore})
    exten => 4748,6,GotoIf($["foo${calledbefore}" = "foo1"]?11:51)
    exten => 4748,7,Goto(90)
    exten => 4748,10,Set(removed=${DB_DELETE(blacklist/${CALLERID(num)/${CALLERID(num)})})
    exten => 4748,11,Flite("Hi. Thanks for calling. We're very sorry. In order to give everyone an opportunity to try this service, we've had to limit calls to one call per person: You still can beat the system. Just call back from a different phone number. Have a great day. Good bye.")
    exten => 4748,12,Goto(91)
    exten => 4748,50,Set(DB(blacklist/${CALLERID(num)})=${CALLERID(num))
    exten => 4748,51,swift("Seriously,, After the beep, Say your question, then Press the pound key, or remain quiet.")
    exten => 4748,52(record),agi(speech-recog.agi,en-US)
    exten => 4748,53,Noop(= Script returned: ${status} , ${id} , ${confidence} , ${utterance} =)
    exten => 4748,54,swift("${utterance}")
    exten => 4748,55,Background(vm-star-cancel)
    exten => 4748,56,Background(continue-english-press)
    exten => 4748,57,Background(digits/1)
    exten => 4748,58,Read(PROCEED,beep,1,,1,3)                                        
    exten => 4748,59,GotoIf($["foo${PROCEED}" = "foo1"]?70)
    exten => 4748,60,GotoIf($["foo${PROCEED}" = "foo"]?70:90)
    exten => 4748,70,Set(DB(blacklist/${CALLERID(num)})=${CALLERID(num))
    exten => 4748,71,Set(FILE(/tmp/query.txt)=${utterance})
    exten => 4748,72,Background(one-moment-please)
    exten => 4748,73,System(/var/lib/asterisk/agi-bin/4747)
    exten => 4748,74,Set(foo=${FILE(/tmp/results.txt)})
    exten => 4748,75,swift("${foo}")
    exten => 4748,76,Goto(90)
    exten => 4748,90,swift("Have a nice day! Good bye.")
    exten => 4748,91,hangup
    
  11. lgaetz Pundit

Share This Page