PIONEERS Exploring Speech to Text

KUMARULLAL

Guru
Joined
Feb 20, 2008
Messages
243
Reaction score
28
In the IVR category, looking up a name from a directory and dialing their number would be a no-brainer, too.

So how would one try to use this in an IVR for dialing names (Internal extensions) as an example?
Context should local or from-internal, right?
Secondly, we need to install flac. Flac is not instlled by default.
"yum install flac"
 

wardmundy

Nerd Uno
Joined
Oct 12, 2007
Messages
19,201
Reaction score
5,220
Flac IS installed on all new PIAF2 systems.

If your older system doesn't have it: yum install flac

Appears to work fine on systems back as far as Asterisk 1.4 once the yum command above is run.

Dialplan examples were just examples. :wink5:
 

phinphan

Active Member
Joined
Oct 19, 2007
Messages
641
Reaction score
130
I tried out the sample code and it works. Now if we could get the email along with the voice file as an attachment, it could have some real good use. Sort of like how Google Voice does the translation and also gives you the recording.
 
Joined
Jun 29, 2009
Messages
258
Reaction score
0
I tried out the sample code and it works. Now if we could get the email along with the voice file as an attachment, it could have some real good use. Sort of like how Google Voice does the translation and also gives you the recording.

I may be entirely wrong about this (wouldn't be the first time) but I believe that whatever sends out the voicemail notifications is an internal function of Asterisk, not a part of PiaF or FPBX (though they let you configure it more easily). I think that in order to do what you're suggesting, you'd have to disable Asterisk's VM notification and write your own code to perform the same function, but also add the transcription. Unless, of course, you can figure out how to hack Asterisk's code, and that world probably get broken every time Asterisk was upgraded. And I doubt Digium would support such a thing, precisely because of the potential legal pitfalls discussed earlier in this thread.

This is not to say that what you want cannot be done, it's just that I think we're talking more than a few lines of added code here. This could potentially be a very non-trivial project, that no one could ever make a dime on. Again, I may not have the foggiest clue what I am talking about here, so perhaps those who are more into coding would care to comment.
 

phinphan

Active Member
Joined
Oct 19, 2007
Messages
641
Reaction score
130
I think it is right here at this point where the file would need to be included:

exten => 788,n,Noop(= Script returned: ${status} , ${id} , ${confidence} , ${utterance} =)
exten => 788,n,System(echo "${utterance}" | mail -s "Dictation from ${NAME} converted to text with ${confidence} confidence" ${DICTEMAIL});

The speech-recog.agi script would need to return the variable tmpname to the dialplan which would attach the file to the email generated above. Then the dialplan would need to delete the file (if it has permission to do so or call a new agi script to delete the file once it has been emailed). In addition the following language in the speech-recog.agi would probably need to be commented out:
if ($tmpname) {
print STDERR "$name Cleaning temp files.\n" if ($debug);
unlink glob "$tmpname*";

That is the way it appears to this interested non-programmer. I think I will look at how the normal voice dictation dialplan works and see if that provides any clues on how to make this happen. A neat project for a long weekend.
 
Joined
Jun 29, 2009
Messages
258
Reaction score
0
I think it is right here at this point where the file would need to be included:

exten => 788,n,Noop(= Script returned: ${status} , ${id} , ${confidence} , ${utterance} =)
exten => 788,n,System(echo "${utterance}" | mail -s "Dictation from ${NAME} converted to text with ${confidence} confidence" ${DICTEMAIL});

The speech-recog.agi script would need to return the variable tmpname to the dialplan which would attach the file to the email generated above. Then the dialplan would need to delete the file (if it has permission to do so or call a new agi script to delete the file once it has been emailed). In addition the following language in the speech-recog.agi would probably need to be commented out:
if ($tmpname) {
print STDERR "$name Cleaning temp files.\n" if ($debug);
unlink glob "$tmpname*";

That is the way it appears to this interested non-programmer. I think I will look at how the normal voice dictation dialplan works and see if that provides any clues on how to make this happen. A neat project for a long weekend.

Sorry, for some reason I thought you were talking about getting a transcription of a voicemail message along with the voicemail audio file itself (and yes that was my fault for not reading more closely — guess I saw the reference to Google Voice and thought you wanted to duplicate that behavior, but upon re-reading your earlier post I see that's not the case). And even with regard to what I thought you wanted, Ward pretty much covered it in post #8. I have to stop posting when I am sleep-deprived!

What you actually want to do should be a whole lot easier. Good luck!
 

wardmundy

Nerd Uno
Joined
Oct 12, 2007
Messages
19,201
Reaction score
5,220
Meet iRiss: The Poor Man's Ass-Backwards SIRI Alternative

If you've enjoyed reading about the Magic of Siri, you might want to create a little magic of your own. Here's what's needed to take advantage of Wolfram Alpha using speech-to-text on your Asterisk server.

1. Get some background info on the free Wolfram Alpha API.

2. Sign up for a free Wolfram Alpha API account.

3. Create a free Wolfram Alpha app (Click on Get An App ID and make up a name). This will give you an APP-ID. You get 2,000 free queries a month, or you can pay for more.

4. Add dialplan code to the [from-internal-custom] context in /etc/asterisk/extensions_custom.conf:

Code:
exten => 4747,1,Answer()
exten => 4747,2,Wait(1)
exten => 4747,3,flite("How can Eye Riss help you? Press the pound key when you're finished.")
exten => 4747,4(record),agi(speech-recog.agi,en-US)
exten => 4747,5,Noop(= Script returned: ${status} , ${id} , ${confidence} , ${utterance} =)
exten => 4747,6,flite("${utterance}")
exten => 4747,7,Background(vm-star-cancel)
exten => 4747,8,Background(continue-english-press)
exten => 4747,9,Background(digits/1)
exten => 4747,10,Read(PROCEED,beep,1)                                        
exten => 4747,11,GotoIf($["foo${PROCEED}" = "foo1"]?12:14)
exten => 4747,12,Set(FILE(/tmp/query.txt)=${utterance})
exten => 4747,13,Background(one-moment-please)
exten => 4747,14,System(/var/lib/asterisk/agi-bin/iriss)
exten => 4747,15,Set(foo=${FILE(/tmp/results.txt)})
exten => 4747,16,flite("${foo}")
exten => 4747,17,flite("Have a nice day! Good bye.")
exten => 4747,18,hangup


5. Add a file called iriss in /var/lib/asterisk/agi-bin. Be sure to replace APP-ID with your actual APP-ID obtained from Wolfram Alpha:

Code:
QUERY=`cat /tmp/query.txt`
rm /tmp/theanswer.txt
wget -U "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 (.NET CLR 3.5.30729)" -O "/tmp/theanswer.txt" "http://api.wolframalpha.com/v2/query?input='$QUERY'&appid=[COLOR="Red"]APP-ID[/COLOR]&format=plaintext&scantimesout=35"
RESULTS=`awk '/plaintext/ {p=1}; p==1 {print}' /tmp/theanswer.txt | awk "/<subpod title=''>/ {p=1;next}; p==1" | awk '{if (match($0,"</subpod>")) exit; print}' | sed 's/<plaintext>/ /' | sed 's/<\/plaintext>/ /'`
echo $RESULTS > /tmp/results.txt
sed -i "s/|/:/g" /tmp/results.txt
sed -i "s/up/up:/g" /tmp/results.txt


6. Change the permissions on the iriss file as follows:

Code:
chmod +x /var/lib/asterisk/agi-bin/iriss
chown asterisk:asterisk /var/lib/asterisk/agi-bin/iriss


7. Reload your Asterisk dialplan: asterisk -rx "dialplan reload"

8. Pick up a phone and dial I-R-I-S. When prompted, say: "What planes are overhead" or "Weather in Charleston South Carolina" and then press the pound key.

9. If Egor reads back your message correctly, press 1. Otherwise, press * and try again.

10. Your results will look something like this:

Southwest Airlines flight 1297 | 38000 feet | 26 degrees up
Southwest Airlines flight 489 | 38000 feet | 17 degrees up
Allegiant Air flight 644 | 34500 feet | 15 degrees up
Air Canada flight 945 | 35000 feet | 14 degrees up
Air Wisconsin flight 3947 | 30000 feet | 14 degrees up


11. Read up on the Wolfram Alpha API and What's Available Using Wolfram Alpha.
 

MisterQ

Member
Joined
Dec 11, 2007
Messages
188
Reaction score
5
Shouldn't step 6 be a chown asterisk:asterisk?

Any other easy wolfram test questions
 

wardmundy

Nerd Uno
Joined
Oct 12, 2007
Messages
19,201
Reaction score
5,220
I want to thank everyone for shaking out the kinks in the Wolfram Alpha demo above. We've rewritten a good bit of it to make it more versatile with a wide range of Wolfram Alpha content. We'll publish the article on Nerd Vittles tomorrow.
 

lzaf

Guru
Joined
Jan 12, 2012
Messages
13
Reaction score
2
Hello fellow asterisk geeks, it feels great to finally be able to post. :biggrin5:

As I already said to Ward I had been working on something similar to his wolfram script. It is an agi script that contacts wolfram engine and returs the answer as a dialplan variable that can be played back to the user. In other words it does the same as wards script, but in a slightly different way.
My approach differs a bit in the way it parses the data it gets from wolfram in order to locate where the answers are and return them to the user. Unfortunatelly the format that wolfram uses in its replies is a huge mess and totally inconsistent, so the script cannot be considered perfect yet but I belive it returns in a correct form answers for the majority of questions and filters out lots of junk info and useless data.
I post it here so people can try both scripts and developers get ideas from each other.
The script and a readme file with dialplan examples can be seen here. You can download it together with the speech recognition script from this link. In order to get it running in PIAF you might have to install perl-XML-Simple if it is not installed already.
Keep up the good work and the flow of new ideas. :thumbsup:
 

wardmundy

Nerd Uno
Joined
Oct 12, 2007
Messages
19,201
Reaction score
5,220
FYI: lzaf is Lefteris Zaferis, the author of the really incredible AGI script that lets Asterisk servers interface with Google's new speech transcription engine... as well as this new script, of course. Can't wait!!! And welcome, Lefteris.

Required update with PIAF2: yum install perl-XML-Simple

:party::party::party::party::party:
 

Attachments

  • wolfram.tar.gz
    2.6 KB · Views: 7

tm1000

Schmoozecom INC/FreePBX
Joined
Dec 1, 2009
Messages
1,360
Reaction score
78
I've written a php version of this script if anyone's interested.
 

tm1000

Schmoozecom INC/FreePBX
Joined
Dec 1, 2009
Messages
1,360
Reaction score
78
I'm still cleaning it up. Basically it's just a 'script -i test.wav -o string'

type of file. meaning, send it any audio file and flac will convert and send it to google and get the result back unjsoned...

Also it kinda requires json wrapper on PHP less than 5.3(which I'd also include)

Always interested. Where can we find it??
 

KUMARULLAL

Guru
Joined
Feb 20, 2008
Messages
243
Reaction score
28
Fantastic job. lzap,
Works perfectly. However, I am getting this error message

"Use of uninitialized value in length at /var/lib/asterisk/agi-bin/googletts.agi line 117, <STDIN> line 19."

Any ideas?
 

lzaf

Guru
Joined
Jan 12, 2012
Messages
13
Reaction score
2
Works perfectly. However, I am getting this error message

"Use of uninitialized value in length at /var/lib/asterisk/agi-bin/googletts.agi line 117, <STDIN> line 19."

Any ideas?

Its just a non fatal perl warning. From the top of my head I think its the part of the code that checks for interrupt digits or some other user specified options. Its not a real problem and it doesnt mean that there is some misbehavior.
If you really want it to go away edit the script and comment out the line 'use warnings;'
The warnings are enabled by default just to help users and developers since the code is still young and problems might appear.
 

lzaf

Guru
Joined
Jan 12, 2012
Messages
13
Reaction score
2
Here it is: http://www.the159.com/googlespeech/gtr.phps

I still want it to use sox, just haven't gotten there yet.

Good job tm. I would advice you not to bother with sox. I have already removed all sox related code from my script. Sox was used in order to see if sound normalising (and some other tricks like low/highpass filtering etc) would improve detection rates. In all my tests this didn't happen. I think googles engine is already highly optimised for this kind of input (telephone recorded voice data) and trying to edit the voice data before sending it doesn't really help.
And another note, in its current form your script accepts only raw sound data, I think it would be more practical if it could work with wav files.
 

Members online

Forum statistics

Threads
25,810
Messages
167,754
Members
19,240
Latest member
nikko
Get 3CX - Absolutely Free!

Link up your team and customers Phone System Live Chat Video Conferencing

Hosted or Self-managed. Up to 10 users free forever. No credit card. Try risk free.

3CX
A 3CX Account with that email already exists. You will be redirected to the Customer Portal to sign in or reset your password if you've forgotten it.
Top