Julius 音声認識ソフト part1 フリーソフトだがセッティングでエラーの嵐!












ちゃんと入ってるようです。git versionが表示されました。

pi@raspberrypi:~ $ git --version
git version 2.1.4


Julius のソースコードを GitHub から入手します。

pi@raspberrypi:~ $ git clone https://github.com/julius-speech/julius.git
Cloning into 'julius'...
remote: Counting objects: 2329, done.
remote: Compressing objects: 100% (52/52), done.
remote: Total 2329 (delta 39), reused 50 (delta 25), pack-reused 2248
Receiving objects: 100% (2329/2329), 8.64 MiB | 55.00 KiB/s, done.
Resolving deltas: 100% (1029/1029), done.
Checking connectivity... done.



pi@raspberrypi:~ $ cd julius
pi@raspberrypi:~/julius $ ./configure --enable-words-int
checking build system type... armv7l-unknown-linux-gnueabi
checking host system type... armv7l-unknown-linux-gnueabi
checking host specific optimization flag... no
Julius/Julian libsent library rev. Audio I/O
primary mic device API : oss (Open Sound System compatible)
available mic device API : oss
supported audio format : RAW and WAV only
NetAudio support : no
- Language Modeling
class N-gram support : yes
- Libraries
file decompression by : zlib library
- Process management
fork on adinnet input : noNote: compilation time flags are now stored in "libsent-config".
If you link this library, please add output of
"libsent-config --cflags" to CFLAGS and
"libsent-config --libs" to LIBS.



pi@raspberrypi:~/julius $ make
make[1]: Entering directory '/home/pi/julius/libsent'
gcc -g -O2 -fopenmp -Iinclude -DHAVE_CONFIG_H -o src/adin/adin_file.o -c src/adin/adin_file.c
src/phmm/calc_dnn.c:743:5: error: ‘for’ loop initial declarations are only allowed in C99 or C11 mode
for (int i = 0; i < wrk->statenum; i++) {
Makefile:12: recipe for target 'src/phmm/calc_dnn.o' failed
make[1]: *** [src/phmm/calc_dnn.o] Error 1
make[1]: Leaving directory '/home/pi/julius/libsent'
Makefile:56: recipe for target 'libsent' failed
make: *** [libsent] Error 2
make[1]: Entering directory '/home/pi/julius/libsent'
gcc -g -O2 -fopenmp -Iinclude -DHAVE_CONFIG_H -o src/phmm/calc_dnn.o -c src/ph mm/calc_dnn.c
src/phmm/calc_dnn.c: In function ‘dnn_layer_load’:
src/phmm/calc_dnn.c:420:3: error: ‘for’ loop initial declarations are only all owed in C99 or C11 mode
for (int i = 0; i < thread_num; i++) {
src/phmm/calc_dnn.c:420:3: note: use option -std=c99, -std=gnu99, -std=c11 or -s td=gnu11 to compile your code
src/phmm/calc_dnn.c: In function ‘dnn_calc_outprob’:
src/phmm/calc_dnn.c:743:5: error: ‘for’ loop initial declarations are only all owed in C99 or C11 mode
for (int i = 0; i < wrk->statenum; i++) {
Makefile:12: recipe for target 'src/phmm/calc_dnn.o' failed
make[1]: *** [src/phmm/calc_dnn.o] Error 1
make[1]: Leaving directory '/home/pi/julius/libsent'
Makefile:56: recipe for target 'libsent' failed



ダウンロード 4.3.1


pi@raspberrypi:~ $ wget -O julius-4.3.1.tar.gz 'https://sourceforge.jp/frs/redir.php?m=osdn&f=%2Fjulius%2F60273%2Fjulius-4.3.1.tar.gz'


pi@raspberrypi:~ $ ls
Desktop Downloads Pictures Raspi Videos julius julius-4.4.2.tar.gz python_games sqlite3
Documents Music Public Templates graph_data_db.sqlite julius-4.3.1.tar.gz oldconffiles python_pg test.txt
pi@raspberrypi:~ $ tar zxvf julius-4.3.1.tar.gz

configure 4.3.1


pi@raspberrypi:~ $ cd julius-4.3.1/
pi@raspberrypi:~/julius-4.3.1 $ ./configure
creating cache ./config.cache
checking host system type... armv7l-unknown-linux-gnueabi
checking host specific optimization flag... no
Julius/Julian libsent library rev.4.3.1:- Audio I/O
primary mic device API : oss (Open Sound System compatible)
available mic device API : oss
supported audio format : RAW and WAV only
NetAudio support : no
- Language Modeling
class N-gram support : yes
- Libraries
file decompression by : zlib library
- Process management
fork on adinnet input : noNote: compilation time flags are now stored in "libsent-config".
If you link this library, please add output of
"libsent-config --cflags" to CFLAGS and
"libsent-config --libs" to LIBS.


make 4.3.1


pi@raspberrypi:~/julius-4.3.1 $ make
for d in libsent libjulius julius mkbingram mkbinhmm adinrec adintool mkgshmm mkss jcontrol gramtools generate-ngram jclient-perl man; do \
(cd $d; make); \
make[1]: Entering directory '/home/pi/julius-4.3.1/libsent'
gcc -g -O2 -Iinclude -DHAVE_CONFIG_H -o src/adin/adin_file.o -c src/adin/adin_file.c
make[1]: Entering directory '/home/pi/julius-4.3.1/man'
Nothing to build
make[1]: Leaving directory '/home/pi/julius-4.3.1/man'


make install 4.3.1

pi@raspberrypi:~/julius-4.3.1 $ sudo make install

これで、Julius のコンパイルとインストールは完了です。


pi@raspberrypi:~/julius-4.3.1 $ julius -version
JuliusLib rev.4.3.1 (fast)
Engine specification:
- Base setup : fast
- Supported LM : DFA, N-gram, Word
- Extension :
- Compiled by : gcc -g -O2
Library configuration: version 4.3.1
- Audio input
primary A/D-in driver : oss (Open Sound System compatible)
available drivers : oss
wavefile formats : RAW and WAV only
max. length of an input : 320000 samples, 150 words
- Language Model
class N-gram support : yes
MBR weight support : yes
word id unit : short (2 bytes)
- Acoustic Model
multi-path treatment : autodetect
- External library
file decompression by : zlib library
- Process hangling
fork on adinnet input : noTry `-help' for more information.




pi@raspberrypi:~ $ wget -O dictation-kit-v4.3.1-linux.tgz 'https://sourceforge.jp/frs/redir.php?m=jaist&f=%2Fjulius%2F60416%2Fdictation-kit-v4.3.1-linux.tgz'
pi@raspberrypi:~ $ wget -O grammar-kit-v4.1.tar.gz 'https://sourceforge.jp/frs/redir.php?m=osdn&f=%2Fjulius%2F51159%2Fgrammar-kit-v4.1.tar.gz'

解凍して julius-kits ディレクトリに纏めておきます。

pi@raspberrypi:~ $ tar zxvf dictation-kit-v4.3.1-linux.tgz
pi@raspberrypi:~ $ tar zxvf grammar-kit-v4.1.tar.gz
pi@raspberrypi:~ $ mkdir julius-kits
pi@raspberrypi:~ $ mv dictation-kit-v4.3.1-linux julius-kits/
pi@raspberrypi:~ $ mv grammar-kit-v4.1 julius-kits/



pi@raspberrypi:~ $ lsusb
Bus 001 Device 004: ID 05e3:0608 Genesys Logic, Inc. USB-2.0 4-Port HUB
Bus 001 Device 005: ID 056e:700d Elecom Co., Ltd
Bus 001 Device 003: ID 0424:ec00 Standard Microsystems Corp. SMSC9512/9514 Fast Ethernet Adapter
Bus 001 Device 002: ID 0424:9514 Standard Microsystems Corp.
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub




pi@raspberrypi:~ $ sudo cat /proc/asound/modules
0 snd_bcm2835
1 snd_usb_audio


これは、/etc/modprobe.d/alsa-base.confのsnd-usb-audio のインデックスが-2になっているからだそうです。これを0にします。が・・・・

pi@raspberrypi:~ $ cat /etc/modprobe.d/alsa-base.conf
cat: /etc/modprobe.d/alsa- base.conf: そのようなファイルやディレクトリはありません



ただいまシステムの中身① Raspberry Piで音声認識

alsa-base.conf を新規作成します。viでこしらえましょう。

pi@raspberrypi:~ $ sudo vi /etc/modprobe.d/alsa-base.conf
options snd slots=snd_usb_audio,snd_bcm2835
options snd_usb_audio index=0
options snd_bcm2835 index=1



pi@raspberrypi:~ $ sudo cat /proc/asound/modules
0 snd_usb_audio
1 snd_bcm2835




pi@raspberrypi:~ $ amixer sset Mic 50
amixer: Unable to find simple control 'Mic',0
pi@raspberrypi:~$ sudo cat /home/pi/.asoundrc
pcm.!default {
type hw
card 1
}ctl.!default {
type hw
card 1
pi@raspberrypi:~ $ amixer sset Mic 40
Simple mixer control 'Mic',0
Capabilities: cvolume cswitch
Capture channels: Front Left - Front Right
Limits: Capture 0 - 44
Front Left: Capture 40 [91%] [14.00dB] [on]
Front Right: Capture 40 [91%] [14.00dB] [on]





pi@raspberrypi:~ $ aplay -l 
**** ハードウェアデバイス PLAYBACK のリスト **** 
カード 0: ALSA [bcm2835 ALSA], デバイス 0: bcm2835 ALSA [bcm2835 ALSA] 
サブデバイス: 8/8 サブデバイス #0: subdevice #0 
サブデバイス #1: subdevice #1 
サブデバイス #2: subdevice #2 
サブデバイス #3: subdevice #3 
サブデバイス #4: subdevice #4 
サブデバイス #5: subdevice #5 
サブデバイス #6: subdevice #6 
サブデバイス #7: subdevice #7 
カード 0: ALSA [bcm2835 ALSA], デバイス 1: bcm2835 ALSA [bcm2835 IEC958/HDMI] 
サブデバイス: 1/1 
サブデバイス #0: subdevice #0 


pi@raspberrypi:~ $ arecord -l
**** ハードウェアデバイス CAPTURE のリスト ****
カード 0: series [UCAM-DLN130T series], デバイス 0: USB Audio [USB Audio]
サブデバイス: 1/1
サブデバイス #0: subdevice #0

それでは、 「おはようございます」と録音 してみます。

録音後、Ctrl + C で終了します。

pi@raspberrypi:~ $ arecord -D plughw:0,0 -f cd test.wav
録音中 WAVE 'test.wav' : Signed 16 bit Little Endian, レート 44100 Hz, ステレオ
^Cシグナル 割り込み で中断…


pi@raspberrypi:~ $ aplay -D plughw:1,0 test.wav
再生中 WAVE 'test.wav' : Signed 16 bit Little Endian, レート 44100 Hz, ステレオ 









pi@raspberrypi $ cd julius-kits/dictation-kit-v4.3.1-linux
pi@raspberrypi:~/julius-kits/dictation-kit-v4.3.1-linux $ julius -C main.jconf -C am-gmm.jconf -demo
Notice for feature extraction (01),
* Cepstral mean normalization for real-time decoding: *
* NOTICE: The first input may not be recognized, since *
* no initial mean is available on startup. *
Stat: adin_oss: device name = /dev/dsp (application default) Stat: adin_oss: sampling rate = 16000Hz Stat: adin_oss: going to set latency to 50 msec
Stat: adin_oss: audio I/O Latency = 32 msec (fragment size = 512 samples)
STAT: AD-in thread created <<< please speak >>>Warning: strip: sample 0-1037 is invalid, stripped 
Warning: strip: sample 0-1023 is invalid, stripped 
Warning: strip: sample 0-1022 is invalid, stripped 
Warning: strip: sample 0-1023 is invalid, stripped
Warning: strip: sample 0-1022 is invalid, stripped
Warning: strip: sample 0-1023 is invalid, stripped
Warning: strip: sample 0-1022 is invalid, stripped
Warning: strip: sample 0-1022 is invalid, stripped
Warning: strip: sample 0-1023 is invalid, stripped
Warning: strip: sample 0-1022 is invalid, stripped
Warning: strip: sample 0-577 has zero value, stripped 




pi@raspberrypi:~/julius-kits/dictation-kit-v4.3.1-linux $ julius -nostrip -C main.jconf -C am-gmm.jconf -demo 
Notice for feature extraction (01),
* Cepstral mean normalization for real-time decoding: *
* NOTICE: The first input may not be recognized, since *
* no initial mean is available on startup. *
Stat: adin_oss: device name = /dev/dsp (application default)
Stat: adin_oss: sampling rate = 16000Hz
Stat: adin_oss: going to set latency to 50 msec Stat: adin_oss: audio I/O Latency = 32 msec (fragment size = 512 samples)
STAT: AD-in thread created ERROR: get_back_trellis_proceed: 00 _default: frame 3: no nodes left in beam, now terminates search <input rejected by short input>
STAT: skip CMN parameter update since last input was invalidpass1_best: <<< please speak >>>
WARNING: adin_thread_process: too long input (> 320000 samples), segmented now
Warning: input buffer overflow: some input may be dropped, so disgard the input ### read waveform input 
Stat: adin_oss: device name = /dev/dsp (application default)
Error: adin_oss: failed to open /dev/dsp failed to begin input stream


Raspberry Piで音声認識


sudo sh -c "echo snd-pcm-oss >> /etc/modules"



read waveform input Stat: adin_oss: device name = /dev/dsp (application default)
Stat: adin_oss: sampling rate = 16000Hz
Stat: adin_oss: going to set latency to 50 msec
Stat: adin_oss: audio I/O Latency = 32 msec (fragment size = 512 samples)
STAT: AD-in thread created
ERROR: get_back_trellis_proceed: 00 _default: frame 3: no nodes left in beam, now terminates search <input rejected by short input>
STAT: skip CMN parameter update since last input was invalid <<<
please speak >>>WARNING: adin_thread_process: too long input (> 320000 samples), segmented now
Warning: input buffer overflow: some input may be dropped, so disgard the input STAT: skip CMN parameter update since last input was invalid



WARNING: adin_thread_process: too long input (> 320000 samples), segmented now
You should try to change the sensibility level (-lv) for a high value (default is 2000). In my board it worked fine with 10000 :
# julius -input mic -lv 10000 -C julian.jconf
高い値(デフォルトは2000)の感性レベル(-lv)を変更してみてください。 私のボードでは、10000で正常に動作しました:

上記に出ている例(-input mic -lv 10000)を取り入れてやってみましょう。


pi@raspberrypi:~/julius-kits/dictation-kit-v4.3.1-linux $ julius -input mic -lv 10000 -C main.jconf -C am-gmm.jconf -nostrip
read waveform input
Stat: adin_oss: device name = /dev/dsp (application default)
Stat: adin_oss: sampling rate = 16000Hz
Stat: adin_oss: going to set latency to 50 msec
Stat: adin_oss: audio I/O Latency = 32 msec (fragment size = 512 samples)
STAT: AD-in thread created <<< please speak >>>

おおー、エラーが無くなり、しゃべりを要求してきました。やったね。 そこで、「こんちわ」と言ってみましたが、うーん、認識率が非常に悪いです。



lvの数値 10,000


pass1_best: 本 で は 、 。
pass1_best_wordseq: <s> 本+名詞 で+助詞 は+助詞 、+補助記号 </s>
pass1_best_phonemeseq: silB | h o N | d e | w a | sp | silE
pass1_best_score: -3297.711670
### Recognition: 2nd pass (RL heuristic best-first)
STAT: 00 _default: 33085 generated, 3104 pushed, 388 nodes popped in 136
sentence1:ほんと 、 。
wseq1: <s> ほんと+名詞 、+補助記号 </s>
phseq1: silB | h o N t o | sp | silE
cmscore1: 0.616 0.024 0.056 1.000
score1: -3319.762451 


pass1_best: こんにちは 。
pass1_best_wordseq: <s> こんにちは+感動詞 </s>
pass1_best_phonemeseq: silB | k o N n i ch i w a | silE
pass1_best_score: -2810.670166
### Recognition: 2nd pass (RL heuristic best-first)
STAT: 00 _default: 22323 generated, 1662 pushed, 250 nodes popped in 122
sentence1:本気 だ 。
wseq1: <s> 本気+名詞 だ+助動詞 </s>
phseq1: silB | h o N k i | d a | silE
cmscore1: 0.611 0.072 0.028 1.000
score1: -2831.078369


pass1_best: 受け た 。
pass1_best_wordseq: <s> 受け+動詞 た+助動詞 </s>
pass1_best_phonemeseq: silB | u k e | t a | silE pass1_best_score: -2881.974121
### Recognition: 2nd pass (RL heuristic best-first)
STAT: 00 _default: 13539 generated, 1731 pushed, 261 nodes popped in 122
sentence1:受け た 。
wseq1: <s> 受け+動詞 た+助動詞 </s>
phseq1: silB | u k e | t a | silE
cmscore1: 0.068 0.238 0.110 1.000
score1: -2906.262207 


pass1_best: うん 。
pass1_best_wordseq: <s> うん+感動詞 </s>
pass1_best_phonemeseq: silB | u N | silE pass1_best_score: -2881.351562
### Recognition: 2nd pass (RL heuristic best-first)
STAT: 00 _default: 22663 generated, 2328 pushed, 345 nodes popped in 122
sentence1: L 。
wseq1: <s> L+記号 </s>
phseq1: silB | e r u | silE
cmscore1: 0.409 0.022 1.000
score1: -2899.006348 

lvの数値 5,000

lvの数値を 10,000 ⇒ 5,000 に変えてやってみると、意外と認識率が高まりました。


pass1_best: 今季 も 。
pass1_best_wordseq: <s> 今季+名詞 も+助詞 </s>
pass1_best_phonemeseq: silB | k o N k i | m o | silE
pass1_best_score: -4239.636230
### Recognition: 2nd pass (RL heuristic best-first)
STAT: 00 _default: 16295 generated, 1909 pushed, 243 nodes popped in 185
sentence1: 今季 も 。
wseq1: <s> 今季+名詞 も+助詞 </s>
phseq1: silB | k o N k i | m o | silE
cmscore1: 0.635 0.023 0.140 1.000
score1: -4251.720703 


pass1_best: こんにちは 。
pass1_best_wordseq: <s> こんにちは+感動詞 </s>
pass1_best_phonemeseq: silB | k o N n i ch i w a | silE
pass1_best_score: -4716.737793
### Recognition: 2nd pass (RL heuristic best-first)
STAT: 00 _default: 46105 generated, 3806 pushed, 514 nodes popped in 210
sentence1: こんにちは 。
wseq1: <s> こんにちは+感動詞 </s>
phseq1: silB | k o N n i ch i w a | silE
cmscore1: 0.795 0.036 1.000
score1: -4738.603027


pass1_best: 今日 ね 。
pass1_best_wordseq: <s> 今日+名詞 ね+助詞 </s>
pass1_best_phonemeseq: silB | k o N n i ch i | n e | silE
pass1_best_score: -3644.272705
### Recognition: 2nd pass (RL heuristic best-first)
STAT: 00 _default: 51851 generated, 2900 pushed, 440 nodes popped in 153
sentence1: 今日 ね 。
wseq1: <s> 今日+名詞 ね+助詞 </s>
phseq1: silB | k o N n i ch i | n e | silE
cmscore1: 0.298 0.033 0.160 1.000
score1: -3667.632080












