概述
混合高斯模型+语言模型
今天事情比较多,就花了点时间看了一下HTKbook的高斯混合模型和data driven,然后使用HVite进行解码,时间比较长,出去吃了个饭,打几局台球回来刚好运行完。
1、初始proto 的hmm模型:
~o <VecSize> 39 <MFCC_0_D_A>
~h "proto1"
<BeginHMM>
<VecSize> 39 <MFCC_0_D_A>
<NumStates> 5
<State> 2 <NumMixes> 5
<Mixture> 1 0.2
<Mean> 39
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
<Variance> 39
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
<Mixture> 2 0.2
<Mean> 39
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
<Variance> 39
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
<Mixture> 3 0.2
<Mean> 39
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
<Variance> 39
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
<Mixture> 4 0.2
<Mean> 39
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
<Variance> 39
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
<Mixture> 5 0.2
<Mean> 39
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
<Variance> 39
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
。。。。。。。
<TransP> 5
0.0 1.0 0.0 0.0 0.0
0.0 0.6 0.4 0.0 0.0
0.0 0.0 0.6 0.4 0.0
0.0 0.0 0.0 0.7 0.3
0.0 0.0 0.0 0.0 0.0
<EndHMM>
2、绑定三音素
由于使用了GMM,此时就不能使用决策树了,而应该使用data driven,在生成tree.hed文件时所用命令为:
perl scripts/mkclscript.prl TC 100.0 lists/monophones1>tree.hed
详见HTKbook p170-171
在生成的tree.hed中,
开头加入:RO 100 "stats"
末尾加入:CO lists/tiedlist
形如:
RO 100 "stats"
TC 100.0 "ST_ax_2_" {("ax","*-ax+*","ax+*","*-ax").state[2]}
TC 100.0 "ST_sp_2_" {("sp","*-sp+*","sp+*","*-sp").state[2]}
TC 100.0 "ST_b_2_" {("b","*-b+*","b+*","*-b").state[2]}
TC 100.0 "ST_r_2_" {("r","*-r+*","r+*","*-r").state[2]}
…………………
TC 100.0 "ST_em_4_" {("em","*-em+*","em+*","*-em").state[4]}
TC 100.0 "ST_zh_4_" {("zh","*-zh+*","zh+*","*-zh").state[4]}
TC 100.0 "ST_sil_4_" {("sil","*-sil+*","sil+*","*-sil").state[4]}
CO lists/tiedlist
然后使用命令:
HHEd -H hmms/hmm12/macros -H hmms/hmm12/hmmdefs -M hmms/hmm13 tree.hed lists/triphones1>log
重估两次即可
3、解码评测
在dict6中的开头加入:
!!UNK [] sil
!NULL [] sil
</s> [] sil
<s> [] sil
使用语言模型的解码方法进行再次评估:
HVite -T 1 -H hmms1/hmm15/macros -H hmms1/hmm15/hmmdefs -s 10.0 -S test/test.scp -i results/recout_GMM_lm.mlf -w dict/bigram.net -C config/config2 -t 250.0 -n 4 20 -q Atal -z lat dict/dict6 lists1/tiedlist
由于高斯混合度为5,运行时间比单一混合度长,大约需要2小时左右。
然后使用HResults命令进行评测:
HResults -I test/testwords1.mlf lists1/tiedlist recults/recout_GMM_lm.mlf
结果如下:
句子识别率已经有了良好的改善。
4、总结:增加高斯混合度可以提高识别率,同时在进行解码的时候需要更多的时间,加上语言模型的训练就可以得到较理想的结果。下一步重点放在语言模型的训练上。
最后
以上就是鲤鱼煎蛋为你收集整理的HTK搭建大词汇量连续语音识别系统( 五)的全部内容,希望文章能够帮你解决HTK搭建大词汇量连续语音识别系统( 五)所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
发表评论 取消回复