概述
“Useful Derivations for i-Vector Based Approach to Data Clustering in Speech Recognition” Yu Zhang
这篇文章较为详细地推到了i-Vecoter的由来,解答了许多困惑,salute!
正文
假设
Y
i
=
(
y
1
i
,
y
2
i
,
…
,
y
T
i
i
)
boldsymbol{Y}^{i}=left(boldsymbol{y}_{1}^{i}, boldsymbol{y}_{2}^{i}, ldots, boldsymbol{y}_{T_{i}}^{i}right)
Yi=(y1i,y2i,…,yTii)是一段目标说话人D维的长度为Ti的特征向量序列数据,在i-Vector方法中,我们希望用一个特征超矢量
M
(
i
)
M(i)
M(i)来刻画说话人的这段数据:
M
(
i
)
=
M
0
+
T
w
(
i
)
M(i)=M_{0}+T w(i)
M(i)=M0+Tw(i)
M
(
i
)
M(i)
M(i)为目标说话人超矢量,其中
M
(
0
)
M(0)
M(0)是由背景训练训练出的不含目标说话人的GMM-UBM的均值超矢量(多个高斯的均值向量拼接在一起),T是全局差异空间矩阵,w(i) 就是i-Vector,其维度远远小于
M
(
0
)
M(0)
M(0)。
可以这样形象地理解:
假设同一个Beat被许多人拿来写Rap,那么
M
(
0
)
M(0)
M(0)就是这个Beat的旋律,是固定的。然后不同人把几个词如“天气”,“时间”组成w(i) 即i-Vector,然后再将这几个词扩充成高维的句子来组成歌词即
T
w
(
i
)
Tw(i)
Tw(i),最后就得到不同的Music即
M
(
i
)
M(i)
M(i)。
通过MAP来估计w(i):
w
^
(
i
)
=
argmax
w
(
i
)
p
(
w
(
i
)
∣
Y
i
)
=
argmax
w
(
i
)
p
(
Y
i
∣
w
(
i
)
)
p
(
w
(
i
)
)
begin{aligned} hat{boldsymbol{w}}(i) &=underset{boldsymbol{w}(i)}{operatorname{argmax}} pleft(boldsymbol{w}(i) mid boldsymbol{Y}^{i}right) \ &=underset{boldsymbol{w}(i)}{operatorname{argmax}} pleft(boldsymbol{Y}^{i} mid boldsymbol{w}(i)right) p(boldsymbol{w}(i)) end{aligned}
w^(i)=w(i)argmaxp(w(i)∣Yi)=w(i)argmaxp(Yi∣w(i))p(w(i))
将GMM-UMP的均值用M(i)对应的均值子向量表示,根据论文进一步写为:
p
(
Y
i
∣
w
(
i
)
)
≃
∏
t
=
1
T
i
∏
k
=
1
K
N
(
y
t
i
;
M
k
(
i
)
,
R
k
)
p
(
k
∣
y
t
i
,
Ω
(
0
)
)
pleft(boldsymbol{Y}^{i} mid boldsymbol{w}(i)right) simeq prod_{t=1}^{T_{i}} prod_{k=1}^{K} mathcal{N}left(boldsymbol{y}_{t}^{i} ; boldsymbol{M}_{k}(i), boldsymbol{R}_{k}right)^{pleft(k mid y_{t}^{i}, boldsymbol{Omega}^{(0)}right)}
p(Yi∣w(i))≃t=1∏Tik=1∏KN(yti;Mk(i),Rk)p(k∣yti,Ω(0))
where
p
(
k
∣
y
t
i
,
Ω
(
0
)
)
=
c
k
N
(
y
t
i
;
m
k
,
R
k
(
0
)
)
∑
l
=
1
K
c
l
N
(
y
t
i
;
m
l
,
R
l
(
0
)
)
pleft(k mid boldsymbol{y}_{t}^{i}, boldsymbol{Omega}^{(0)}right)=frac{c_{k} mathcal{N}left(boldsymbol{y}_{t}^{i} ; boldsymbol{m}_{k}, boldsymbol{R}_{k}^{(0)}right)}{sum_{l=1}^{K} c_{l} mathcal{N}left(boldsymbol{y}_{t}^{i} ; boldsymbol{m}_{l}, boldsymbol{R}_{l}^{(0)}right)}
p(k∣yti,Ω(0))=∑l=1KclN(yti;ml,Rl(0))ckN(yti;mk,Rk(0))
这应该就是EM里E步对隐变量的估计
上面取log后应该是变成了EM里的Q函数(直接复制论文了):
log
p
(
Y
i
∣
w
(
i
)
)
=
∑
t
=
1
1
i
∑
k
=
1
11
p
(
k
∣
y
t
i
,
Ω
(
0
)
)
log
N
(
y
t
i
;
M
k
(
i
)
,
R
k
)
=
∑
t
=
1
T
i
∑
k
=
1
K
p
(
k
∣
y
t
i
,
Ω
(
0
)
)
[
log
1
(
2
π
)
D
/
2
∣
R
k
∣
1
/
2
−
1
2
(
y
t
i
−
m
k
−
T
k
w
(
i
)
)
⊤
R
k
−
1
(
y
t
i
−
m
k
−
T
k
w
(
i
)
)
]
=
∑
t
=
1
T
i
∑
k
=
1
K
p
(
k
∣
y
t
i
,
Ω
(
0
)
)
[
log
1
(
2
π
)
D
/
2
∣
R
k
∣
1
/
2
−
1
2
(
y
t
i
−
m
k
)
⊤
R
k
−
1
(
y
t
i
−
m
k
)
+
w
⊤
(
i
)
T
k
⊤
R
k
−
1
(
y
t
i
−
m
k
)
−
1
2
w
⊤
(
i
)
T
k
⊤
R
k
−
1
T
k
w
(
i
)
]
begin{array}{l} log pleft(boldsymbol{Y}^{i} mid boldsymbol{w}(i)right)=sum_{t=1}^{1_{i}} sum_{k=1}^{11} pleft(k mid boldsymbol{y}_{t}^{i}, boldsymbol{Omega}^{(0)}right) log mathcal{N}left(boldsymbol{y}_{t}^{i} ; boldsymbol{M}_{k}(i), boldsymbol{R}_{k}right) \ =sum_{t=1}^{T_{i}} sum_{k=1}^{K} pleft(k mid boldsymbol{y}_{t}^{i}, boldsymbol{Omega}^{(0)}right)left[log frac{1}{(2 pi)^{D / 2}left|boldsymbol{R}_{k}right|^{1 / 2}}right. \ left.quad-frac{1}{2}left(boldsymbol{y}_{t}^{i}-boldsymbol{m}_{k}-boldsymbol{T}_{k} boldsymbol{w}(i)right)^{top} boldsymbol{R}_{k}^{-1}left(boldsymbol{y}_{t}^{i}-boldsymbol{m}_{k}-boldsymbol{T}_{k} boldsymbol{w}(i)right)right] \ =sum_{t=1}^{T_{i}} sum_{k=1}^{K} pleft(k mid boldsymbol{y}_{t}^{i}, boldsymbol{Omega}^{(0)}right)left[log frac{1}{(2 pi)^{D / 2}left|boldsymbol{R}_{k}right|^{1 / 2}}-frac{1}{2}left(boldsymbol{y}_{t}^{i}-boldsymbol{m}_{k}right)^{top} boldsymbol{R}_{k}^{-1}left(boldsymbol{y}_{t}^{i}-boldsymbol{m}_{k}right)right. \ left.quad+boldsymbol{w}^{top}(i) boldsymbol{T}_{k}^{top} boldsymbol{R}_{k}^{-1}left(boldsymbol{y}_{t}^{i}-boldsymbol{m}_{k}right)-frac{1}{2} boldsymbol{w}^{top}(i) boldsymbol{T}_{k}^{top} boldsymbol{R}_{k}^{-1} boldsymbol{T}_{k} boldsymbol{w}(i)right] end{array}
logp(Yi∣w(i))=∑t=11i∑k=111p(k∣yti,Ω(0))logN(yti;Mk(i),Rk)=∑t=1Ti∑k=1Kp(k∣yti,Ω(0))[log(2π)D/2∣Rk∣1/21−21(yti−mk−Tkw(i))⊤Rk−1(yti−mk−Tkw(i))]=∑t=1Ti∑k=1Kp(k∣yti,Ω(0))[log(2π)D/2∣Rk∣1/21−21(yti−mk)⊤Rk−1(yti−mk)+w⊤(i)Tk⊤Rk−1(yti−mk)−21w⊤(i)Tk⊤Rk−1Tkw(i)]
只有最后两项与w(i)有关定义为:
H
(
i
)
=
∑
t
=
1
T
i
∑
k
=
1
K
p
(
k
∣
y
t
i
,
Ω
(
0
)
)
[
w
⊤
(
i
)
T
k
⊤
R
k
−
1
(
y
t
i
−
m
k
)
−
1
2
w
⊤
(
i
)
T
k
⊤
R
k
−
1
T
k
w
(
i
)
]
=
w
⊤
(
i
)
T
R
−
1
Γ
y
(
i
)
−
1
2
w
⊤
(
i
)
T
⊤
Γ
(
i
)
R
−
1
T
w
(
i
)
begin{aligned} mathcal{H}(i) &=sum_{t=1}^{T_{i}} sum_{k=1}^{K} pleft(k mid boldsymbol{y}_{t}^{i}, boldsymbol{Omega}^{(0)}right)left[boldsymbol{w}^{top}(i) boldsymbol{T}_{k}^{top} boldsymbol{R}_{k}^{-1}left(boldsymbol{y}_{t}^{i}-boldsymbol{m}_{k}right)-frac{1}{2} boldsymbol{w}^{top}(i) boldsymbol{T}_{k}^{top} boldsymbol{R}_{k}^{-1} boldsymbol{T}_{k} boldsymbol{w}(i)right] \ =& boldsymbol{w}^{top}(i) boldsymbol{T} boldsymbol{R}^{-1} boldsymbol{Gamma}_{y}(i)-frac{1}{2} boldsymbol{w}^{top}(i) boldsymbol{T}^{top} boldsymbol{Gamma}(i) boldsymbol{R}^{-1} boldsymbol{T} boldsymbol{w}(i) end{aligned}
H(i)==t=1∑Tik=1∑Kp(k∣yti,Ω(0))[w⊤(i)Tk⊤Rk−1(yti−mk)−21w⊤(i)Tk⊤Rk−1Tkw(i)]w⊤(i)TR−1Γy(i)−21w⊤(i)T⊤Γ(i)R−1Tw(i)
其中
Γ
(
i
)
mathbf{Gamma}(i)
Γ(i)是
(
D
⋅
K
)
×
(
D
⋅
K
)
(D cdot K) times(D cdot K)
(D⋅K)×(D⋅K)的对角矩阵,对角线第k个部分的值为
γ
k
(
i
)
∗
I
D
×
D
gamma_{k}(i) *boldsymbol{I}_{D times D}
γk(i)∗ID×D;
Γ
y
(
i
)
mathbf{Gamma}_{y}(i)
Γy(i)是
(
D
⋅
K
)
(D cdot K)
(D⋅K)维的超矢量,
Γ
y
,
k
(
i
)
mathbf{Gamma}_{y, k}(i)
Γy,k(i)是其第k个子矢量。
γ
k
(
i
)
=
∑
t
=
1
T
i
p
(
k
∣
y
t
i
,
Ω
(
0
)
)
gamma_{k}(i)=sum_{t=1}^{T_{i}} pleft(k mid boldsymbol{y}_{t}^{i}, boldsymbol{Omega}^{(0)}right)
γk(i)=t=1∑Tip(k∣yti,Ω(0))
Γ
y
,
k
(
i
)
=
∑
t
=
1
T
i
p
(
k
∣
y
t
i
,
Ω
(
0
)
)
(
y
t
i
−
m
k
)
Gamma_{y, k}(i)=sum_{t=1}^{T_{i}} pleft(k mid y_{t}^{i}, Omega^{(0)}right)left(y_{t}^{i}-m_{k}right)
Γy,k(i)=t=1∑Tip(k∣yti,Ω(0))(yti−mk)
然后:
p
(
w
(
i
)
∣
Y
i
)
∝
p
(
Y
i
∣
w
(
i
)
)
p
(
w
(
i
)
)
∝
exp
(
w
⊤
(
i
)
T
R
−
1
Γ
y
(
i
)
−
1
2
w
⊤
(
i
)
T
⊤
Γ
(
i
)
R
−
1
T
w
(
i
)
)
⋅
exp
(
−
1
2
w
⊤
(
i
)
w
(
i
)
)
=
exp
(
w
⊤
(
i
)
T
R
−
1
Γ
y
(
i
)
−
1
2
w
⊤
(
i
)
[
T
⊤
Γ
(
i
)
R
−
1
T
+
I
]
w
(
i
)
)
=
exp
(
w
⊤
(
i
)
T
R
−
1
Γ
y
(
i
)
−
1
2
w
⊤
(
i
)
l
(
i
)
w
(
i
)
)
∝
exp
(
−
1
2
(
w
(
i
)
−
l
−
1
(
i
)
T
⊤
R
−
1
Γ
y
(
i
)
)
⊤
l
(
i
)
(
w
(
i
)
−
l
−
1
(
i
)
T
⊤
R
−
1
Γ
y
(
i
)
)
)
begin{array}{l} pleft(boldsymbol{w}(i) mid boldsymbol{Y}^{i}right) propto pleft(boldsymbol{Y}^{i} mid boldsymbol{w}(i)right) p(boldsymbol{w}(i)) \ quad propto exp left(boldsymbol{w}^{top}(i) boldsymbol{T} boldsymbol{R}^{-1} boldsymbol{Gamma}_{y}(i)-frac{1}{2} boldsymbol{w}^{top}(i) boldsymbol{T}^{top} boldsymbol{Gamma}(i) boldsymbol{R}^{-1} boldsymbol{T} boldsymbol{w}(i)right) cdot exp left(-frac{1}{2} boldsymbol{w}^{top}(i) boldsymbol{w}(i)right) \ quad=exp left(boldsymbol{w}^{top}(i) boldsymbol{T} boldsymbol{R}^{-1} boldsymbol{Gamma}_{y}(i)-frac{1}{2} boldsymbol{w}^{top}(i)left[boldsymbol{T}^{top} boldsymbol{Gamma}(i) boldsymbol{R}^{-1} boldsymbol{T}+boldsymbol{I}right] boldsymbol{w}(i)right) \ quad=exp left(boldsymbol{w}^{top}(i) boldsymbol{T} boldsymbol{R}^{-1} boldsymbol{Gamma}_{y}(i)-frac{1}{2} boldsymbol{w}^{top}(i) l(i) boldsymbol{w}(i)right) \ quad propto exp left(-frac{1}{2}left(boldsymbol{w}(i)-l^{-1}(i) boldsymbol{T}^{top} boldsymbol{R}^{-1} boldsymbol{Gamma}_{y}(i)right)^{top} boldsymbol{l}(i)left(boldsymbol{w}(i)-l^{-1}(i) boldsymbol{T}^{top} boldsymbol{R}^{-1} boldsymbol{Gamma}_{y}(i)right)right) end{array}
p(w(i)∣Yi)∝p(Yi∣w(i))p(w(i))∝exp(w⊤(i)TR−1Γy(i)−21w⊤(i)T⊤Γ(i)R−1Tw(i))⋅exp(−21w⊤(i)w(i))=exp(w⊤(i)TR−1Γy(i)−21w⊤(i)[T⊤Γ(i)R−1T+I]w(i))=exp(w⊤(i)TR−1Γy(i)−21w⊤(i)l(i)w(i))∝exp(−21(w(i)−l−1(i)T⊤R−1Γy(i))⊤l(i)(w(i)−l−1(i)T⊤R−1Γy(i)))
w(i)的后验分布就可以写为均值为
l
−
1
(
i
)
T
⊤
R
−
1
Γ
y
(
i
)
l^{-1}(i) boldsymbol{T}^{top} boldsymbol{R}^{-1} mathbf{Gamma}_{y}(i)
l−1(i)T⊤R−1Γy(i)
方差为:
l
−
1
(
i
)
boldsymbol{l}^{-1}(i)
l−1(i)
l
(
i
)
=
I
+
T
⊤
Γ
(
i
)
R
−
1
T
l(i)=I+T^{top} mathbf{Gamma}(i) R^{-1} boldsymbol{T}
l(i)=I+T⊤Γ(i)R−1T
的高斯分布,所以就可以用均值来作为w(i)的估计(很像稀疏贝叶斯):
w
^
(
i
)
=
l
−
1
(
i
)
T
⊤
R
−
1
Γ
y
(
i
)
hat{boldsymbol{w}}(i)=l^{-1}(i) boldsymbol{T}^{top} boldsymbol{R}^{-1} mathbf{Gamma}_{y}(i)
w^(i)=l−1(i)T⊤R−1Γy(i)
然后在M步通过求导得零估计参数T和R。(省略复制。。)
总结:
E-Step:
E
[
w
(
i
)
]
=
l
−
1
(
i
)
T
⊤
R
−
1
Γ
y
(
i
)
E
[
w
(
i
)
w
⊤
(
i
)
]
=
E
[
w
(
i
)
]
E
[
w
⊤
(
i
)
]
+
l
−
1
(
i
)
begin{aligned} E[boldsymbol{w}(i)] &=l^{-1}(i) boldsymbol{T}^{top} boldsymbol{R}^{-1} boldsymbol{Gamma}_{y}(i) \ Eleft[boldsymbol{w}(i) boldsymbol{w}^{top}(i)right] &=E[boldsymbol{w}(i)] Eleft[boldsymbol{w}^{top}(i)right]+boldsymbol{l}^{-1}(i) end{aligned}
E[w(i)]E[w(i)w⊤(i)]=l−1(i)T⊤R−1Γy(i)=E[w(i)]E[w⊤(i)]+l−1(i)
M-Step:
T
m
∑
i
Γ
m
(
i
)
E
[
w
(
i
)
w
⊤
(
i
)
]
=
∑
i
Γ
y
m
(
i
)
E
[
w
⊤
(
i
)
]
R
k
=
1
∑
i
γ
k
(
i
)
(
∑
i
Γ
y
y
⊤
,
k
(
i
)
−
M
k
)
begin{array}{c} boldsymbol{T}^{m} sum_{boldsymbol{i}} mathbf{Gamma}^{m}(i) Eleft[boldsymbol{w}(i) boldsymbol{w}^{top}(i)right]=sum_{i} boldsymbol{Gamma}_{boldsymbol{y}}^{m}(i) Eleft[boldsymbol{w}^{top}(i)right] \ boldsymbol{R}_{k}=frac{1}{sum_{i} gamma_{k}(i)}left(sum_{i} mathbf{Gamma}_{boldsymbol{y} y^{top}, k}(i)-M_{k}right) end{array}
Tm∑iΓm(i)E[w(i)w⊤(i)]=∑iΓym(i)E[w⊤(i)]Rk=∑iγk(i)1(∑iΓyy⊤,k(i)−Mk)
算法步骤(精华)
- 用背景数据训练GMM-UBM
- 用UBM的协方差矩阵做初值初始化R,然后T设置为:
T m , f ∈ [ − α R m , m , α R m , m ] , ∀ m = 1 , … , D K ; f = 1 , … , F boldsymbol{T}^{m, f} inleft[-alpha boldsymbol{R}^{m, m}, alpha boldsymbol{R}^{m, m}right], quad forall m=1, ldots, D K ; f=1, ldots, F Tm,f∈[−αRm,m,αRm,m],∀m=1,…,DK;f=1,…,F
D为输入特征维度,K为混合高斯个数,F为i-Vector维度, T m , f boldsymbol{T}^{m, f} Tm,f表示T的m行,N列。 - 对第i段说话人序列训练数据,计算 γ k ( i ) , Γ y , k ( i ) gamma_{k}(i), mathbf{Gamma}_{y, k}(i) γk(i),Γy,k(i), Γ y y ⊤ , k ( i ) mathbf{Gamma}_{y y^{top}, k}(i) Γyy⊤,k(i)来得到
- E-step:计算 E [ w ( i ) ] E[boldsymbol{w}(i)] E[w(i)] 和 E [ w ( i ) w ⊤ ( i ) ] Eleft[boldsymbol{w}(i) boldsymbol{w}^{top}(i)right] E[w(i)w⊤(i)]
- 对所有的说话人序列训练数据重复步骤3,4.
- M-step:更新T和R
- 如果收敛,结束。否则返回4。
注:不清楚的公式直接看论文。
最后
以上就是昏睡乐曲为你收集整理的语音特征i-Vector EM估计公式推导的全部内容,希望文章能够帮你解决语音特征i-Vector EM估计公式推导所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复