我是靠谱客的博主 昏睡乐曲,最近开发中收集的这篇文章主要介绍语音特征i-Vector EM估计公式推导,觉得挺不错的,现在分享给大家,希望可以做个参考。

概述

“Useful Derivations for i-Vector Based Approach to Data Clustering in Speech Recognition” Yu Zhang
这篇文章较为详细地推到了i-Vecoter的由来,解答了许多困惑,salute!

正文

假设 Y i = ( y 1 i , y 2 i , … , y T i i ) boldsymbol{Y}^{i}=left(boldsymbol{y}_{1}^{i}, boldsymbol{y}_{2}^{i}, ldots, boldsymbol{y}_{T_{i}}^{i}right) Yi=(y1i,y2i,,yTii)是一段目标说话人D维的长度为Ti的特征向量序列数据,在i-Vector方法中,我们希望用一个特征超矢量 M ( i ) M(i) M(i)来刻画说话人的这段数据:
M ( i ) = M 0 + T w ( i ) M(i)=M_{0}+T w(i) M(i)=M0+Tw(i)
M ( i ) M(i) M(i)为目标说话人超矢量,其中 M ( 0 ) M(0) M(0)是由背景训练训练出的不含目标说话人的GMM-UBM的均值超矢量(多个高斯的均值向量拼接在一起),T是全局差异空间矩阵,w(i) 就是i-Vector,其维度远远小于 M ( 0 ) M(0) M(0)
可以这样形象地理解:
假设同一个Beat被许多人拿来写Rap,那么 M ( 0 ) M(0) M(0)就是这个Beat的旋律,是固定的。然后不同人把几个词如“天气”,“时间”组成w(i) 即i-Vector,然后再将这几个词扩充成高维的句子来组成歌词即 T w ( i ) Tw(i) Tw(i),最后就得到不同的Music即 M ( i ) M(i) M(i)

通过MAP来估计w(i):
w ^ ( i ) = argmax ⁡ w ( i ) p ( w ( i ) ∣ Y i ) = argmax ⁡ w ( i ) p ( Y i ∣ w ( i ) ) p ( w ( i ) ) begin{aligned} hat{boldsymbol{w}}(i) &=underset{boldsymbol{w}(i)}{operatorname{argmax}} pleft(boldsymbol{w}(i) mid boldsymbol{Y}^{i}right) \ &=underset{boldsymbol{w}(i)}{operatorname{argmax}} pleft(boldsymbol{Y}^{i} mid boldsymbol{w}(i)right) p(boldsymbol{w}(i)) end{aligned} w^(i)=w(i)argmaxp(w(i)Yi)=w(i)argmaxp(Yiw(i))p(w(i))
将GMM-UMP的均值用M(i)对应的均值子向量表示,根据论文进一步写为:
p ( Y i ∣ w ( i ) ) ≃ ∏ t = 1 T i ∏ k = 1 K N ( y t i ; M k ( i ) , R k ) p ( k ∣ y t i , Ω ( 0 ) ) pleft(boldsymbol{Y}^{i} mid boldsymbol{w}(i)right) simeq prod_{t=1}^{T_{i}} prod_{k=1}^{K} mathcal{N}left(boldsymbol{y}_{t}^{i} ; boldsymbol{M}_{k}(i), boldsymbol{R}_{k}right)^{pleft(k mid y_{t}^{i}, boldsymbol{Omega}^{(0)}right)} p(Yiw(i))t=1Tik=1KN(yti;Mk(i),Rk)p(kyti,Ω(0))
where
p ( k ∣ y t i , Ω ( 0 ) ) = c k N ( y t i ; m k , R k ( 0 ) ) ∑ l = 1 K c l N ( y t i ; m l , R l ( 0 ) ) pleft(k mid boldsymbol{y}_{t}^{i}, boldsymbol{Omega}^{(0)}right)=frac{c_{k} mathcal{N}left(boldsymbol{y}_{t}^{i} ; boldsymbol{m}_{k}, boldsymbol{R}_{k}^{(0)}right)}{sum_{l=1}^{K} c_{l} mathcal{N}left(boldsymbol{y}_{t}^{i} ; boldsymbol{m}_{l}, boldsymbol{R}_{l}^{(0)}right)} p(kyti,Ω(0))=l=1KclN(yti;ml,Rl(0))ckN(yti;mk,Rk(0))
这应该就是EM里E步对隐变量的估计
上面取log后应该是变成了EM里的Q函数(直接复制论文了):
log ⁡ p ( Y i ∣ w ( i ) ) = ∑ t = 1 1 i ∑ k = 1 11 p ( k ∣ y t i , Ω ( 0 ) ) log ⁡ N ( y t i ; M k ( i ) , R k ) = ∑ t = 1 T i ∑ k = 1 K p ( k ∣ y t i , Ω ( 0 ) ) [ log ⁡ 1 ( 2 π ) D / 2 ∣ R k ∣ 1 / 2 − 1 2 ( y t i − m k − T k w ( i ) ) ⊤ R k − 1 ( y t i − m k − T k w ( i ) ) ] = ∑ t = 1 T i ∑ k = 1 K p ( k ∣ y t i , Ω ( 0 ) ) [ log ⁡ 1 ( 2 π ) D / 2 ∣ R k ∣ 1 / 2 − 1 2 ( y t i − m k ) ⊤ R k − 1 ( y t i − m k ) + w ⊤ ( i ) T k ⊤ R k − 1 ( y t i − m k ) − 1 2 w ⊤ ( i ) T k ⊤ R k − 1 T k w ( i ) ] begin{array}{l} log pleft(boldsymbol{Y}^{i} mid boldsymbol{w}(i)right)=sum_{t=1}^{1_{i}} sum_{k=1}^{11} pleft(k mid boldsymbol{y}_{t}^{i}, boldsymbol{Omega}^{(0)}right) log mathcal{N}left(boldsymbol{y}_{t}^{i} ; boldsymbol{M}_{k}(i), boldsymbol{R}_{k}right) \ =sum_{t=1}^{T_{i}} sum_{k=1}^{K} pleft(k mid boldsymbol{y}_{t}^{i}, boldsymbol{Omega}^{(0)}right)left[log frac{1}{(2 pi)^{D / 2}left|boldsymbol{R}_{k}right|^{1 / 2}}right. \ left.quad-frac{1}{2}left(boldsymbol{y}_{t}^{i}-boldsymbol{m}_{k}-boldsymbol{T}_{k} boldsymbol{w}(i)right)^{top} boldsymbol{R}_{k}^{-1}left(boldsymbol{y}_{t}^{i}-boldsymbol{m}_{k}-boldsymbol{T}_{k} boldsymbol{w}(i)right)right] \ =sum_{t=1}^{T_{i}} sum_{k=1}^{K} pleft(k mid boldsymbol{y}_{t}^{i}, boldsymbol{Omega}^{(0)}right)left[log frac{1}{(2 pi)^{D / 2}left|boldsymbol{R}_{k}right|^{1 / 2}}-frac{1}{2}left(boldsymbol{y}_{t}^{i}-boldsymbol{m}_{k}right)^{top} boldsymbol{R}_{k}^{-1}left(boldsymbol{y}_{t}^{i}-boldsymbol{m}_{k}right)right. \ left.quad+boldsymbol{w}^{top}(i) boldsymbol{T}_{k}^{top} boldsymbol{R}_{k}^{-1}left(boldsymbol{y}_{t}^{i}-boldsymbol{m}_{k}right)-frac{1}{2} boldsymbol{w}^{top}(i) boldsymbol{T}_{k}^{top} boldsymbol{R}_{k}^{-1} boldsymbol{T}_{k} boldsymbol{w}(i)right] end{array} logp(Yiw(i))=t=11ik=111p(kyti,Ω(0))logN(yti;Mk(i),Rk)=t=1Tik=1Kp(kyti,Ω(0))[log(2π)D/2Rk1/2121(ytimkTkw(i))Rk1(ytimkTkw(i))]=t=1Tik=1Kp(kyti,Ω(0))[log(2π)D/2Rk1/2121(ytimk)Rk1(ytimk)+w(i)TkRk1(ytimk)21w(i)TkRk1Tkw(i)]
只有最后两项与w(i)有关定义为:
H ( i ) = ∑ t = 1 T i ∑ k = 1 K p ( k ∣ y t i , Ω ( 0 ) ) [ w ⊤ ( i ) T k ⊤ R k − 1 ( y t i − m k ) − 1 2 w ⊤ ( i ) T k ⊤ R k − 1 T k w ( i ) ] = w ⊤ ( i ) T R − 1 Γ y ( i ) − 1 2 w ⊤ ( i ) T ⊤ Γ ( i ) R − 1 T w ( i ) begin{aligned} mathcal{H}(i) &=sum_{t=1}^{T_{i}} sum_{k=1}^{K} pleft(k mid boldsymbol{y}_{t}^{i}, boldsymbol{Omega}^{(0)}right)left[boldsymbol{w}^{top}(i) boldsymbol{T}_{k}^{top} boldsymbol{R}_{k}^{-1}left(boldsymbol{y}_{t}^{i}-boldsymbol{m}_{k}right)-frac{1}{2} boldsymbol{w}^{top}(i) boldsymbol{T}_{k}^{top} boldsymbol{R}_{k}^{-1} boldsymbol{T}_{k} boldsymbol{w}(i)right] \ =& boldsymbol{w}^{top}(i) boldsymbol{T} boldsymbol{R}^{-1} boldsymbol{Gamma}_{y}(i)-frac{1}{2} boldsymbol{w}^{top}(i) boldsymbol{T}^{top} boldsymbol{Gamma}(i) boldsymbol{R}^{-1} boldsymbol{T} boldsymbol{w}(i) end{aligned} H(i)==t=1Tik=1Kp(kyti,Ω(0))[w(i)TkRk1(ytimk)21w(i)TkRk1Tkw(i)]w(i)TR1Γy(i)21w(i)TΓ(i)R1Tw(i)
其中 Γ ( i ) mathbf{Gamma}(i) Γ(i) ( D ⋅ K ) × ( D ⋅ K ) (D cdot K) times(D cdot K) (DK)×(DK)的对角矩阵,对角线第k个部分的值为 γ k ( i ) ∗ I D × D gamma_{k}(i) *boldsymbol{I}_{D times D} γk(i)ID×D
Γ y ( i ) mathbf{Gamma}_{y}(i) Γy(i) ( D ⋅ K ) (D cdot K) (DK)维的超矢量, Γ y , k ( i ) mathbf{Gamma}_{y, k}(i) Γy,k(i)是其第k个子矢量。

γ k ( i ) = ∑ t = 1 T i p ( k ∣ y t i , Ω ( 0 ) ) gamma_{k}(i)=sum_{t=1}^{T_{i}} pleft(k mid boldsymbol{y}_{t}^{i}, boldsymbol{Omega}^{(0)}right) γk(i)=t=1Tip(kyti,Ω(0))
Γ y , k ( i ) = ∑ t = 1 T i p ( k ∣ y t i , Ω ( 0 ) ) ( y t i − m k ) Gamma_{y, k}(i)=sum_{t=1}^{T_{i}} pleft(k mid y_{t}^{i}, Omega^{(0)}right)left(y_{t}^{i}-m_{k}right) Γy,k(i)=t=1Tip(kyti,Ω(0))(ytimk)
然后:
p ( w ( i ) ∣ Y i ) ∝ p ( Y i ∣ w ( i ) ) p ( w ( i ) ) ∝ exp ⁡ ( w ⊤ ( i ) T R − 1 Γ y ( i ) − 1 2 w ⊤ ( i ) T ⊤ Γ ( i ) R − 1 T w ( i ) ) ⋅ exp ⁡ ( − 1 2 w ⊤ ( i ) w ( i ) ) = exp ⁡ ( w ⊤ ( i ) T R − 1 Γ y ( i ) − 1 2 w ⊤ ( i ) [ T ⊤ Γ ( i ) R − 1 T + I ] w ( i ) ) = exp ⁡ ( w ⊤ ( i ) T R − 1 Γ y ( i ) − 1 2 w ⊤ ( i ) l ( i ) w ( i ) ) ∝ exp ⁡ ( − 1 2 ( w ( i ) − l − 1 ( i ) T ⊤ R − 1 Γ y ( i ) ) ⊤ l ( i ) ( w ( i ) − l − 1 ( i ) T ⊤ R − 1 Γ y ( i ) ) ) begin{array}{l} pleft(boldsymbol{w}(i) mid boldsymbol{Y}^{i}right) propto pleft(boldsymbol{Y}^{i} mid boldsymbol{w}(i)right) p(boldsymbol{w}(i)) \ quad propto exp left(boldsymbol{w}^{top}(i) boldsymbol{T} boldsymbol{R}^{-1} boldsymbol{Gamma}_{y}(i)-frac{1}{2} boldsymbol{w}^{top}(i) boldsymbol{T}^{top} boldsymbol{Gamma}(i) boldsymbol{R}^{-1} boldsymbol{T} boldsymbol{w}(i)right) cdot exp left(-frac{1}{2} boldsymbol{w}^{top}(i) boldsymbol{w}(i)right) \ quad=exp left(boldsymbol{w}^{top}(i) boldsymbol{T} boldsymbol{R}^{-1} boldsymbol{Gamma}_{y}(i)-frac{1}{2} boldsymbol{w}^{top}(i)left[boldsymbol{T}^{top} boldsymbol{Gamma}(i) boldsymbol{R}^{-1} boldsymbol{T}+boldsymbol{I}right] boldsymbol{w}(i)right) \ quad=exp left(boldsymbol{w}^{top}(i) boldsymbol{T} boldsymbol{R}^{-1} boldsymbol{Gamma}_{y}(i)-frac{1}{2} boldsymbol{w}^{top}(i) l(i) boldsymbol{w}(i)right) \ quad propto exp left(-frac{1}{2}left(boldsymbol{w}(i)-l^{-1}(i) boldsymbol{T}^{top} boldsymbol{R}^{-1} boldsymbol{Gamma}_{y}(i)right)^{top} boldsymbol{l}(i)left(boldsymbol{w}(i)-l^{-1}(i) boldsymbol{T}^{top} boldsymbol{R}^{-1} boldsymbol{Gamma}_{y}(i)right)right) end{array} p(w(i)Yi)p(Yiw(i))p(w(i))exp(w(i)TR1Γy(i)21w(i)TΓ(i)R1Tw(i))exp(21w(i)w(i))=exp(w(i)TR1Γy(i)21w(i)[TΓ(i)R1T+I]w(i))=exp(w(i)TR1Γy(i)21w(i)l(i)w(i))exp(21(w(i)l1(i)TR1Γy(i))l(i)(w(i)l1(i)TR1Γy(i)))
w(i)的后验分布就可以写为均值为
l − 1 ( i ) T ⊤ R − 1 Γ y ( i ) l^{-1}(i) boldsymbol{T}^{top} boldsymbol{R}^{-1} mathbf{Gamma}_{y}(i) l1(i)TR1Γy(i)
方差为:
l − 1 ( i ) boldsymbol{l}^{-1}(i) l1(i)
l ( i ) = I + T ⊤ Γ ( i ) R − 1 T l(i)=I+T^{top} mathbf{Gamma}(i) R^{-1} boldsymbol{T} l(i)=I+TΓ(i)R1T
的高斯分布,所以就可以用均值来作为w(i)的估计(很像稀疏贝叶斯):
w ^ ( i ) = l − 1 ( i ) T ⊤ R − 1 Γ y ( i ) hat{boldsymbol{w}}(i)=l^{-1}(i) boldsymbol{T}^{top} boldsymbol{R}^{-1} mathbf{Gamma}_{y}(i) w^(i)=l1(i)TR1Γy(i)
然后在M步通过求导得零估计参数T和R。(省略复制。。)

总结:
E-Step:
E [ w ( i ) ] = l − 1 ( i ) T ⊤ R − 1 Γ y ( i ) E [ w ( i ) w ⊤ ( i ) ] = E [ w ( i ) ] E [ w ⊤ ( i ) ] + l − 1 ( i ) begin{aligned} E[boldsymbol{w}(i)] &=l^{-1}(i) boldsymbol{T}^{top} boldsymbol{R}^{-1} boldsymbol{Gamma}_{y}(i) \ Eleft[boldsymbol{w}(i) boldsymbol{w}^{top}(i)right] &=E[boldsymbol{w}(i)] Eleft[boldsymbol{w}^{top}(i)right]+boldsymbol{l}^{-1}(i) end{aligned} E[w(i)]E[w(i)w(i)]=l1(i)TR1Γy(i)=E[w(i)]E[w(i)]+l1(i)
M-Step:
T m ∑ i Γ m ( i ) E [ w ( i ) w ⊤ ( i ) ] = ∑ i Γ y m ( i ) E [ w ⊤ ( i ) ] R k = 1 ∑ i γ k ( i ) ( ∑ i Γ y y ⊤ , k ( i ) − M k ) begin{array}{c} boldsymbol{T}^{m} sum_{boldsymbol{i}} mathbf{Gamma}^{m}(i) Eleft[boldsymbol{w}(i) boldsymbol{w}^{top}(i)right]=sum_{i} boldsymbol{Gamma}_{boldsymbol{y}}^{m}(i) Eleft[boldsymbol{w}^{top}(i)right] \ boldsymbol{R}_{k}=frac{1}{sum_{i} gamma_{k}(i)}left(sum_{i} mathbf{Gamma}_{boldsymbol{y} y^{top}, k}(i)-M_{k}right) end{array} TmiΓm(i)E[w(i)w(i)]=iΓym(i)E[w(i)]Rk=iγk(i)1(iΓyy,k(i)Mk)
算法步骤(精华)

  1. 用背景数据训练GMM-UBM
  2. 用UBM的协方差矩阵做初值初始化R,然后T设置为:
    T m , f ∈ [ − α R m , m , α R m , m ] , ∀ m = 1 , … , D K ; f = 1 , … , F boldsymbol{T}^{m, f} inleft[-alpha boldsymbol{R}^{m, m}, alpha boldsymbol{R}^{m, m}right], quad forall m=1, ldots, D K ; f=1, ldots, F Tm,f[αRm,m,αRm,m],m=1,,DK;f=1,,F
    D为输入特征维度,K为混合高斯个数,F为i-Vector维度, T m , f boldsymbol{T}^{m, f} Tm,f表示T的m行,N列。
  3. 对第i段说话人序列训练数据,计算 γ k ( i ) , Γ y , k ( i ) gamma_{k}(i), mathbf{Gamma}_{y, k}(i) γk(i),Γy,k(i) Γ y y ⊤ , k ( i ) mathbf{Gamma}_{y y^{top}, k}(i) Γyy,k(i)来得到
  4. E-step:计算 E [ w ( i ) ] E[boldsymbol{w}(i)] E[w(i)] E [ w ( i ) w ⊤ ( i ) ] Eleft[boldsymbol{w}(i) boldsymbol{w}^{top}(i)right] E[w(i)w(i)]
  5. 对所有的说话人序列训练数据重复步骤3,4.
  6. M-step:更新T和R
  7. 如果收敛,结束。否则返回4。

注:不清楚的公式直接看论文。

最后

以上就是昏睡乐曲为你收集整理的语音特征i-Vector EM估计公式推导的全部内容,希望文章能够帮你解决语音特征i-Vector EM估计公式推导所遇到的程序开发问题。

如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。

本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
点赞(61)

评论列表共有 0 条评论

立即
投稿
返回
顶部