PIONEERS Exploring Speech to Text

KUMARULLAL · Jan 13, 2012

wardmundy said:
In the IVR category, looking up a name from a directory and dialing their number would be a no-brainer, too.

So how would one try to use this in an IVR for dialing names (Internal extensions) as an example?
Context should local or from-internal, right?
Secondly, we need to install flac. Flac is not instlled by default.
"yum install flac"

wardmundy · Jan 13, 2012

Flac IS installed on all new PIAF2 systems.

If your older system doesn't have it: yum install flac

Appears to work fine on systems back as far as Asterisk 1.4 once the yum command above is run.

Dialplan examples were just examples. :wink5:

KUMARULLAL · Jan 13, 2012

My apologies.
I was using older PIAF.

phinphan · Jan 13, 2012

I tried out the sample code and it works. Now if we could get the email along with the voice file as an attachment, it could have some real good use. Sort of like how Google Voice does the translation and also gives you the recording.

MichiganTelephone · Jan 13, 2012

phinphan said:
I tried out the sample code and it works. Now if we could get the email along with the voice file as an attachment, it could have some real good use. Sort of like how Google Voice does the translation and also gives you the recording.

I may be entirely wrong about this (wouldn't be the first time) but I believe that whatever sends out the voicemail notifications is an internal function of Asterisk, not a part of PiaF or FPBX (though they let you configure it more easily). I think that in order to do what you're suggesting, you'd have to disable Asterisk's VM notification and write your own code to perform the same function, but also add the transcription. Unless, of course, you can figure out how to hack Asterisk's code, and that world probably get broken every time Asterisk was upgraded. And I doubt Digium would support such a thing, precisely because of the potential legal pitfalls discussed earlier in this thread.

This is not to say that what you want cannot be done, it's just that I think we're talking more than a few lines of added code here. This could potentially be a very non-trivial project, that no one could ever make a dime on. Again, I may not have the foggiest clue what I am talking about here, so perhaps those who are more into coding would care to comment.

phinphan · Jan 13, 2012

I think it is right here at this point where the file would need to be included:

exten => 788,n,Noop(= Script returned: ${status} , ${id} , ${confidence} , ${utterance} =)
exten => 788,n,System(echo "${utterance}" | mail -s "Dictation from ${NAME} converted to text with ${confidence} confidence" ${DICTEMAIL});

The speech-recog.agi script would need to return the variable tmpname to the dialplan which would attach the file to the email generated above. Then the dialplan would need to delete the file (if it has permission to do so or call a new agi script to delete the file once it has been emailed). In addition the following language in the speech-recog.agi would probably need to be commented out:
if ($tmpname) {
print STDERR "$name Cleaning temp files.\n" if ($debug);
unlink glob "$tmpname*";

That is the way it appears to this interested non-programmer. I think I will look at how the normal voice dictation dialplan works and see if that provides any clues on how to make this happen. A neat project for a long weekend.

MichiganTelephone · Jan 14, 2012

phinphan said:
I think it is right here at this point where the file would need to be included:

exten => 788,n,Noop(= Script returned: ${status} , ${id} , ${confidence} , ${utterance} =)
exten => 788,n,System(echo "${utterance}" | mail -s "Dictation from ${NAME} converted to text with ${confidence} confidence" ${DICTEMAIL});

The speech-recog.agi script would need to return the variable tmpname to the dialplan which would attach the file to the email generated above. Then the dialplan would need to delete the file (if it has permission to do so or call a new agi script to delete the file once it has been emailed). In addition the following language in the speech-recog.agi would probably need to be commented out:
if ($tmpname) {
print STDERR "$name Cleaning temp files.\n" if ($debug);
unlink glob "$tmpname*";

That is the way it appears to this interested non-programmer. I think I will look at how the normal voice dictation dialplan works and see if that provides any clues on how to make this happen. A neat project for a long weekend.

Sorry, for some reason I thought you were talking about getting a transcription of a voicemail message along with the voicemail audio file itself (and yes that was my fault for not reading more closely — guess I saw the reference to Google Voice and thought you wanted to duplicate that behavior, but upon re-reading your earlier post I see that's not the case). And even with regard to what I thought you wanted, Ward pretty much covered it in post #8. I have to stop posting when I am sleep-deprived!

What you actually want to do should be a whole lot easier. Good luck!

wardmundy · Jan 14, 2012

Meet iRiss: The Poor Man's Ass-Backwards SIRI Alternative

If you've enjoyed reading about the Magic of Siri, you might want to create a little magic of your own. Here's what's needed to take advantage of Wolfram Alpha using speech-to-text on your Asterisk server.

1. Get some background info on the free Wolfram Alpha API.

2. Sign up for a free Wolfram Alpha API account.

3. Create a free Wolfram Alpha app (Click on Get An App ID and make up a name). This will give you an APP-ID. You get 2,000 free queries a month, or you can pay for more.

4. Add dialplan code to the [from-internal-custom] context in /etc/asterisk/extensions_custom.conf:

Code:

exten => 4747,1,Answer()
exten => 4747,2,Wait(1)
exten => 4747,3,flite("How can Eye Riss help you? Press the pound key when you're finished.")
exten => 4747,4(record),agi(speech-recog.agi,en-US)
exten => 4747,5,Noop(= Script returned: ${status} , ${id} , ${confidence} , ${utterance} =)
exten => 4747,6,flite("${utterance}")
exten => 4747,7,Background(vm-star-cancel)
exten => 4747,8,Background(continue-english-press)
exten => 4747,9,Background(digits/1)
exten => 4747,10,Read(PROCEED,beep,1)                                        
exten => 4747,11,GotoIf($["foo${PROCEED}" = "foo1"]?12:14)
exten => 4747,12,Set(FILE(/tmp/query.txt)=${utterance})
exten => 4747,13,Background(one-moment-please)
exten => 4747,14,System(/var/lib/asterisk/agi-bin/iriss)
exten => 4747,15,Set(foo=${FILE(/tmp/results.txt)})
exten => 4747,16,flite("${foo}")
exten => 4747,17,flite("Have a nice day! Good bye.")
exten => 4747,18,hangup

5. Add a file called iriss in /var/lib/asterisk/agi-bin. Be sure to replace APP-ID with your actual APP-ID obtained from Wolfram Alpha:

Code:

QUERY=`cat /tmp/query.txt`
rm /tmp/theanswer.txt
wget -U "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 (.NET CLR 3.5.30729)" -O "/tmp/theanswer.txt" "http://api.wolframalpha.com/v2/query?input='$QUERY'&appid=[COLOR="Red"]APP-ID[/COLOR]&format=plaintext&scantimesout=35"
RESULTS=`awk '/plaintext/ {p=1}; p==1 {print}' /tmp/theanswer.txt | awk "/<subpod title=''>/ {p=1;next}; p==1" | awk '{if (match($0,"</subpod>")) exit; print}' | sed 's/<plaintext>/ /' | sed 's/<\/plaintext>/ /'`
echo $RESULTS > /tmp/results.txt
sed -i "s/|/:/g" /tmp/results.txt
sed -i "s/up/up:/g" /tmp/results.txt

6. Change the permissions on the iriss file as follows:

Code:

chmod +x /var/lib/asterisk/agi-bin/iriss
chown asterisk:asterisk /var/lib/asterisk/agi-bin/iriss

7. Reload your Asterisk dialplan: asterisk -rx "dialplan reload"

8. Pick up a phone and dial I-R-I-S. When prompted, say: "What planes are overhead" or "Weather in Charleston South Carolina" and then press the pound key.

9. If Egor reads back your message correctly, press 1. Otherwise, press * and try again.

10. Your results will look something like this:

Southwest Airlines flight 1297 | 38000 feet | 26 degrees up
Southwest Airlines flight 489 | 38000 feet | 17 degrees up
Allegiant Air flight 644 | 34500 feet | 15 degrees up
Air Canada flight 945 | 35000 feet | 14 degrees up
Air Wisconsin flight 3947 | 30000 feet | 14 degrees up

11. Read up on the Wolfram Alpha API and What's Available Using Wolfram Alpha.

MisterQ · Jan 14, 2012

Shouldn't step 6 be a chown asterisk:asterisk?

Any other easy wolfram test questions

wardmundy · Jan 15, 2012

yep. thanks.

wardmundy · Jan 15, 2012

I want to thank everyone for shaking out the kinks in the Wolfram Alpha demo above. We've rewritten a good bit of it to make it more versatile with a wide range of Wolfram Alpha content. We'll publish the article on Nerd Vittles tomorrow.

lzaf · Jan 16, 2012

Hello fellow asterisk geeks, it feels great to finally be able to post. :biggrin5:

As I already said to Ward I had been working on something similar to his wolfram script. It is an agi script that contacts wolfram engine and returs the answer as a dialplan variable that can be played back to the user. In other words it does the same as wards script, but in a slightly different way.
My approach differs a bit in the way it parses the data it gets from wolfram in order to locate where the answers are and return them to the user. Unfortunatelly the format that wolfram uses in its replies is a huge mess and totally inconsistent, so the script cannot be considered perfect yet but I belive it returns in a correct form answers for the majority of questions and filters out lots of junk info and useless data.
I post it here so people can try both scripts and developers get ideas from each other.
The script and a readme file with dialplan examples can be seen here. You can download it together with the speech recognition script from this link. In order to get it running in PIAF you might have to install perl-XML-Simple if it is not installed already.
Keep up the good work and the flow of new ideas. :thumbsup:

wardmundy · Jan 16, 2012

FYI: lzaf is Lefteris Zaferis, the author of the really incredible AGI script that lets Asterisk servers interface with Google's new speech transcription engine... as well as this new script, of course. Can't wait!!! And welcome, Lefteris.

Required update with PIAF2: yum install perl-XML-Simple

:party:

tm1000 · Jan 17, 2012

I've written a php version of this script if anyone's interested.

wardmundy · Jan 17, 2012

Always interested. Where can we find it??

tm1000 · Jan 17, 2012

I'm still cleaning it up. Basically it's just a 'script -i test.wav -o string'

type of file. meaning, send it any audio file and flac will convert and send it to google and get the result back unjsoned...

Also it kinda requires json wrapper on PHP less than 5.3(which I'd also include)

wardmundy said:
Always interested. Where can we find it??

tm1000 · Jan 17, 2012

Here it is: http://www.the159.com/googlespeech/gtr.phps

I still want it to use sox, just haven't gotten there yet.

KUMARULLAL · Jan 18, 2012

Fantastic job. lzap,
Works perfectly. However, I am getting this error message

"Use of uninitialized value in length at /var/lib/asterisk/agi-bin/googletts.agi line 117, <STDIN> line 19."

Any ideas?

lzaf · Jan 18, 2012

KUMARULLAL said:
Works perfectly. However, I am getting this error message

"Use of uninitialized value in length at /var/lib/asterisk/agi-bin/googletts.agi line 117, <STDIN> line 19."

Any ideas?

Its just a non fatal perl warning. From the top of my head I think its the part of the code that checks for interrupt digits or some other user specified options. Its not a real problem and it doesnt mean that there is some misbehavior.
If you really want it to go away edit the script and comment out the line 'use warnings;'
The warnings are enabled by default just to help users and developers since the code is still young and problems might appear.

lzaf · Jan 18, 2012

tm1000 said:
Here it is: http://www.the159.com/googlespeech/gtr.phps

I still want it to use sox, just haven't gotten there yet.

Good job tm. I would advice you not to bother with sox. I have already removed all sox related code from my script. Sox was used in order to see if sound normalising (and some other tricks like low/highpass filtering etc) would improve detection rates. In all my tests this didn't happen. I think googles engine is already highly optimised for this kind of input (telephone recorded voice data) and trying to edit the voice data before sending it doesn't really help.
And another note, in its current form your script accepts only raw sound data, I think it would be more practical if it could work with wav files.

PIONEERS Exploring Speech to Text

Guru

Nerd Uno

Guru

Active Member

Guru

Active Member

Guru

Nerd Uno

Member

Nerd Uno

Nerd Uno

Guru

Nerd Uno

Attachments

Schmoozecom INC/FreePBX

Nerd Uno

Schmoozecom INC/FreePBX

Schmoozecom INC/FreePBX

Guru

Guru

Guru

Forum statistics