GOOD NEWS Peepl, it's time to stop whispering to the clouds! with speech to text (STT) transcription

Joined
Jul 6, 2013
Messages
82
Reaction score
28
Look: as of December, deepspeech (speech to text) software now has a mature API and it even runs on RasPI.

No more need for IncrediblePBX to punt our private words out to the gOOgle sponge and similar for recognition.
This opens up opportunities to use speech-to-text in PBX systems wherever privacy and confidentiality legislation lurks!

Nudge, nudge, wink wink! And it goes without say'n that this can be a selling point for Incredible2020 appliances!
Disclosure of conflict of interest: None, I'm interested and not at all conflicted about this feature request.
 

jerrm

Guru
Joined
Sep 23, 2015
Messages
838
Reaction score
405
General accuracy, while not as good as Google/IBM/etc isn't too bad, but it doesn't format numbers/time/dates/etc.

Deepspeech:
hello this is the service manager at a boyle's kea i am calling to confirm your appointment monday the sixth at eight twenty a m if you will not be able to make it please call our service of whitman coordinator at one eighty date four seven one six zero four three again the number is one eighty day four seven one six zero four three thank you if you know longer wished to receive calls like this please call one eight seven seven nine five nine seven four nine nine

Google Cloud:
Hello, this is the service manager at Ed Voyles Kia. I am calling to confirm your appointment Monday the 6th at 8:20 a.m. If you will not be able to make it. Please call our service appointment coordinator at 108-847-1643. Again, the number is +81-884-716-0043. Thank you. If you no longer wish to receive calls like this, please call one. 877-959-7499.

IBM:
hello this is the service manager at ed Voyles kia I am calling to confirm your appointment Monday the sixth at 8:20 AM if you will not be able to make it please call our service appointment coordinator at 1-888-471-6043 again the number is 1-888-471-6043 thank you if you no longer wish to receive calls like this please call 1-877-959-7499

I have my personal VM going through multiple engines. IBM won the above test, but I would probably give the overall edge to Google. IBM and Google are pretty much on par with overall accuracy, but the google output is generally more readable with punctuation, formatting, etc. Maybe I'm missing some options for IBM, I haven't looked at the APIs recently.
 
Joined
Jul 6, 2013
Messages
82
Reaction score
28
Thanks for the comparison jerrm! There is s dirty little secret in speech recognition that I wish to illuminate here: Services like IBM and Google pass the raw output from the speech recognition engine through a contextual "semantics processing" engine which analyzes the grammar/context and then CHANGES the text to correct "errors" and give better formatting/accuracy. I'll tell you right now that the output does indeed look better, but it can also introduce some very serious errors/ommissions/additions which were not present in the raw text stream. My day job is at a hospital group where we use speech recognition from a very famous company to transcribe 750,000 dictated radiology reports yearly, and often there's nothing we'd like more than to just be able to turn OFF the semantics post-processor when and as needed. This from both an accuracy and a liability point of view. With deepspeech there is always opportunity to post-process (or not) through a separate configurable semantics engine specifically tuned for telephony. Yes, you heard right! I just said that our results with deepspeech AND a tuned for telephony semantics engine could potentially out-perform the generic cloud services!
<snip>please call one eight seven seven nine five nine seven four nine nine
One of our points of contention at the hospitals is the formatting of times, dates and measurements ...and this in an environment where we can both train our users AND actually enforce specific regional formats for such things. Having said this, it is better now than it was 10 years ago.
 
Last edited:

jerrm

Guru
Joined
Sep 23, 2015
Messages
838
Reaction score
405
There is s dirty little secret in speech recognition that I wish to illuminate here:
Not really a secret, Semantics and context have been at the core of speech recognition from the beginning. Modern recognition might be passable with the "raw stream" (or more likely a "rare stream"), but earlier iterations would have been pure junk without the context/semantics processing. I'd be shocked if there isn't some semantic logic in DeepSpeech.

For something like VM transcription, the formatting makes a big difference for quickly skimming through your inbox.

Overall I'm impressed with the DeepSpeech results (especially considering the project's age). A better telephony model is no doubt doable (and the tools are there for anyone interested), but I don't have that kind of time. Hopefully some organization does, and will have the good will to feed it back into the OSS ecosystem. Even as is, DeepSPeech is a great option for a free and/or private option and should only get better..
 
Joined
Jul 6, 2013
Messages
82
Reaction score
28
One thing about the deepspeech output is that it certainly is consistent (up above anyhow). If you can always count on it outputting the written forms of numerals then that's something people can work with programmatically (for IVR use and so on). Would be worse if sometimes it spit "seven" and other times "7". Then you'd need a simple preprocessor to make it consistent for IVR use.

I should say also, that I already don't like that the semantics processor in Google and IBM is stuffing dashes into the phone numbers and so on. That stuff was not there in the speech and for PBX control purposes it's trash a lot of the time and needs to be weeded out for the most part. A post-processor designed for readability does not behave like one designed for telephony control IMHO.
 
Last edited:

hawk#1

Well-Known Member
Joined
Nov 3, 2015
Messages
716
Reaction score
309
I appreciate the great comparison and Deepspeech is something that I may look at in the future. I think in time it will improve and be more accurate.

Tom
 

jerrm

Guru
Joined
Sep 23, 2015
Messages
838
Reaction score
405
I should say also, that I already don't like that the semantics processor in Google and IBM is stuffing dashes into the phone numbers and so on. That stuff was not there in the speech and for PBX control purposes it's trash a lot of the time and needs to be weeded out for the most part. A post-processor designed for readability does not behave like one designed for telephony control IMHO.
I'm using the phone_call model with punctuation enabled. Likely another set of options can provide less "magic."
 

wardmundy

Nerd Uno
Joined
Oct 12, 2007
Messages
19,168
Reaction score
5,199
Any tips on a successful DeepSpeech install on CentOS 7 would be most appreciated :willy nilly:
 
Joined
Jul 6, 2013
Messages
82
Reaction score
28
DeepSpeech install on CentOS 7
I found this (somewhat dated) tutorial about installation + test-run on Ubuntu and RasPi which might make a good initial read/overview. Unfortunately I'm totally overrun till mid February but will have a look at some point at bringing it up in CentOS. (I don't know much about this sort of stuff but I'd love to see this grow legs and I do have a sense of adventure ;o)
 

wardmundy

Nerd Uno
Joined
Oct 12, 2007
Messages
19,168
Reaction score
5,199
I found this (somewhat dated) tutorial about installation + test-run on Ubuntu and RasPi which might make a good initial read/overview. Unfortunately I'm totally overrun till mid February but will have a look at some point at bringing it up in CentOS. (I don't know much about this sort of stuff but I'd love to see this grow legs and I do have a sense of adventure ;o)

Thanks. At your convenience, that would be helpful. We didn't have much luck on the CentOS platform.
 

jerrm

Guru
Joined
Sep 23, 2015
Messages
838
Reaction score
405
Installation was simple on Debian buster amd64 (arm64/aarch64 is problematic).

I fired up my CentOS7 and tried essentially the same steps and it appears to work. This was booting from a fresh, minimal install snapshot. Some of the stuff online is overboard and tries to set up a development environment - I don't plan to train my own models.

In the real world, you might want to install as a user, in a venv, use better paths, etc - but those tweaks are up to the user:

Code:
yum -y install python3-pip wget
pip3 install deepspeech
wget -O - https://github.com/mozilla/DeepSpeech/releases/download/v0.6.1/deepspeech-0.6.1-models.tar.gz | tar xzv --no-same-owner
#point last parameter below to an existing wav file
deepspeech --model deepspeech-0.6.1-models/output_graph.pbmm --lm deepspeech-0.6.1-models/lm.binary --trie deepspeech-0.6.1-models/trie --audio msgtest.wav

If you want to use python2 for some reason (assuming epel or another repo with python2 is already setup):
Code:
yum -y install python-pip wget
pip install --upgrade pip
pip install deepspeech
wget -O - https://github.com/mozilla/DeepSpeech/releases/download/v0.6.1/deepspeech-0.6.1-models.tar.gz | tar xzv --no-same-owner
deepspeech --model deepspeech-0.6.1-models/output_graph.pbmm --lm deepspeech-0.6.1-models/lm.binary --trie deepspeech-0.6.1-models/trie --audio msgtest.wav
 
Last edited:

jerrm

Guru
Joined
Sep 23, 2015
Messages
838
Reaction score
405
I went back and looked at getting deepspeech working on debian arm64 (aarch64). It was ultimately easier than expected. A pip based install method works pointing to a specific wheel file, but it turns out the project also provides tarballs of the native client (basically the python components linked as an executable/library).

A alternative to the pip based install would be:
Code:
mkdir test
cd test
wget https://github.com/mozilla/DeepSpeech/releases/download/v0.6.1/native_client.amd64.cpu.linux.tar.xz
tar xf native_client.amd64.cpu.linux.tar.xz --no-same-owner
wget -O - https://github.com/mozilla/DeepSpeech/releases/download/v0.6.1/deepspeech-0.6.1-models.tar.gz | tar xzv */lm.binary */*.pbmm */trie --no-same-owner
./deepspeech --model deepspeech-0.6.1-models/output_graph.pbmm --lm deepspeech-0.6.1-models/lm.binary --trie deepspeech-0.6.1-models/trie --audio msgtest.wav

The extracted tar file provides an executable (deepspeech) and a library (libdeepspeech.so) that would ultimately need to be manually installed into appropriate directories.
 

wardmundy

Nerd Uno
Joined
Oct 12, 2007
Messages
19,168
Reaction score
5,199
Sneak Peek:

Return of Free Voicemail Transcription & Voice Dialing
EOewFTuX0AA2sjy
 

hawk#1

Well-Known Member
Joined
Nov 3, 2015
Messages
716
Reaction score
309
Thanks, to all that contributed to this project and instructions for installing it on Incredible PBX 2020
 

wardmundy

Nerd Uno
Joined
Oct 12, 2007
Messages
19,168
Reaction score
5,199
Here's a good way to test it once installed using Nerd Vittles tutorial:
Code:
cd /usr/local/sbin
deepspeech --model /usr/local/sbin/deepspeech-0.6.1-models/output_graph.pbmm --lm /usr/local/sbin/deepspeech-0.6.1-models/lm.binary --trie /usr/local/sbin/deepspeech-0.6.1-models/trie --audio /var/lib/asterisk/sounds/en/no-valid-responce-pls-try-again.wav
 
Joined
Jul 6, 2013
Messages
82
Reaction score
28
Nice work ...GENTLEMEN!
This needed to happen and it needed to happen oh so badly!
Totally A_mazing, big hearty THANK YOU, and I hope it gets seriously noticed by the entire FreePBX community.
 

tbrummell

Guru
Joined
Jan 8, 2011
Messages
1,275
Reaction score
339
Here's a good way to test it once installed using Nerd Vittles tutorial:
Code:
cd /usr/local/sbin
deepspeech --model /usr/local/sbin/deepspeech-0.6.1-models/output_graph.pbmm --lm /usr/local/sbin/deepspeech-0.6.1-models/lm.binary --trie /usr/local/sbin/deepspeech-0.6.1-models/trie --audio /var/lib/asterisk/sounds/en/no-valid-responce-pls-try-again.wav
Didn't work for me:
Code:
root@pbx:/usr/local/sbin $ deepspeech --model /usr/local/sbin/deepspeech-0.6.1-models/output_graph.pbmm --lm /usr/local/sbin/deepspeech-0.6.1-models/lm.binary --trie /usr/local/sbin/deepspeech-0.6.1-models/trie --audio /var/lib/asterisk/sounds/en/no-valid-responce-pls-try-again.wav
Illegal instruction
WARNING: Always run Incredible PBX behind a secure firewall.
root@pbx:/usr/local/sbin $
 

tbrummell

Guru
Joined
Jan 8, 2011
Messages
1,275
Reaction score
339
Code:
root@pbx:/var/log $ cat /etc/redhat-release
CentOS Linux release 7.7.1908 (Core)
Free Range Cloud single core instance, 1G RAM, Incredible 2020.
 

Members online

Forum statistics

Threads
25,779
Messages
167,505
Members
19,199
Latest member
leocipriano
Get 3CX - Absolutely Free!

Link up your team and customers Phone System Live Chat Video Conferencing

Hosted or Self-managed. Up to 10 users free forever. No credit card. Try risk free.

3CX
A 3CX Account with that email already exists. You will be redirected to the Customer Portal to sign in or reset your password if you've forgotten it.
Top