高斯混合模型GMM(Gaussian Mixture Model)
- 1. 模型介绍
- 2. 极大似然估计MLE(Maximum Likelihood Estimate)
- 3. EM求解(Expectation-Maximization Algorithm)
- (1) EM算法(期望最大算法)公式与收敛性
- (2) E-step
- (3) M-step
- 总结
1. 模型介绍
高斯——高斯分布
从概率密度估计的角度来看 ,从几何角度来看,加权(
α
\alpha
α)平均→多个高斯分布叠加而成
p
(
x
)
=
∑
k
=
1
k
α
k
N
(
μ
k
,
Σ
k
)
,
∑
k
=
1
k
α
k
=
1
(1)
p(x)=\sum_{k=1}^{k} \alpha_{k} N\left(\mu_{k}, \Sigma_{k}\right), \sum_{k=1}^{k} \alpha_{k}=1\tag{1}
p(x)=k=1∑kαkN(μk,Σk),k=1∑kαk=1(1)
∑
α
k
=
1
\sum \alpha_{k}=1
∑αk=1
上式中
α
k
\alpha_k
αk为权重
从混合(生成)模型的角度来看(生成模型)
x:observed variable
z:latent variable
z表示对应的样本x是属于哪一个高斯分布,这是一个离散的随机变量
生成过程,概率图(有向图)如下:(观测图用阴影表示)
联合概率密度可转化为乘法的形式:
p
(
x
)
=
∑
z
p
(
x
,
z
)
=
∑
k
=
1
k
p
(
x
,
z
=
c
k
)
=
∑
k
=
1
k
p
(
z
=
c
k
)
⋅
p
(
x
∣
z
=
c
k
)
=
∑
k
=
1
K
p
k
⋅
N
(
x
∣
μ
k
Σ
k
)
(2)
\begin{aligned} p(x) &=\sum_{z} p(x, z) \\ &=\sum_{k=1}^{k} p\left(x, z=c_{k}\right) \\ &=\sum_{k=1}^{k} p\left(z=c_{k}\right) \cdot p\left(x \mid z=c_{k}\right) \\ &=\sum_{k=1}^{K} p_{k} \cdot N\left(x \mid \mu_{k} \Sigma_{k}\right) \end{aligned}\tag{2}
p(x)=z∑p(x,z)=k=1∑kp(x,z=ck)=k=1∑kp(z=ck)⋅p(x∣z=ck)=k=1∑Kpk⋅N(x∣μkΣk)(2)
上式中
p
k
p_k
pk为概率值,可见与式(1)相同
2. 极大似然估计MLE(Maximum Likelihood Estimate)
X:observed data (
X
=
(
x
1
,
x
2
,
.
.
.
,
x
N
)
X=(x_1,x_2,...,x_N)
X=(x1,x2,...,xN))
(X,Z):complete data
θ
\theta
θ:parameter (
θ
\theta
θ={
p
1
,
p
2
,
.
.
.
,
p
k
,
μ
1
,
μ
2
,
.
.
.
,
μ
k
,
Σ
1
,
Σ
2
,
.
.
.
,
Σ
k
p_1,p_2,...,p_k,\mu_1,\mu_2,...,\mu_k,\Sigma_1,\Sigma_2,...,\Sigma_k
p1,p2,...,pk,μ1,μ2,...,μk,Σ1,Σ2,...,Σk})
θ
^
M
L
E
=
arg
max
θ
log
P
(
x
)
\hat{\theta}_{MLE}=\arg \max _{\theta} \log P(x)
θ^MLE=argθmaxlogP(x)
样本之间相互独立,上式可写为相乘的形式
θ
^
M
L
E
=
arg
max
θ
log
∏
i
=
1
N
P
(
x
i
)
=
arg
max
∑
i
=
1
N
log
P
(
x
i
)
\hat{\theta}_{MLE}=\arg \max _{\theta} \log \prod_{i=1}^{N} P\left(x_{i}\right)=\arg \max \sum_{i=1}^{N} \log P\left(x_{i}\right)
θ^MLE=argθmaxlogi=1∏NP(xi)=argmaxi=1∑NlogP(xi)
将式(2)代入,得
θ
^
M
L
E
=
arg
max
θ
∑
i
=
1
N
log
∑
k
=
1
k
p
k
⋅
N
(
x
i
∣
μ
k
,
Σ
k
)
\hat{\theta}_{MLE}=\arg \max _{\theta} \sum_{i=1}^{N} \log \sum_{k=1}^{k} p_{k} \cdot N\left(x_{i} \mid \mu_{k}, \Sigma_{k}\right)
θ^MLE=argθmaxi=1∑Nlogk=1∑kpk⋅N(xi∣μk,Σk)
log里面是一个连加的形式,直接用MLE求解GMM,得不到解析解,一般使用EM求出 θ \theta θ
3. EM求解(Expectation-Maximization Algorithm)
(1) EM算法(期望最大算法)公式与收敛性
主要用以估计还有隐变量的参数估计
MLE:
P
(
x
∣
θ
)
P(x|\theta)
P(x∣θ)
θ
M
L
E
=
argmax
log
P
(
x
∣
θ
)
\theta_{MLE}=\operatorname{argmax} \log P(x \mid \theta)
θMLE=argmaxlogP(x∣θ)
log-likelihood
EM公式如下:
θ
(
t
+
1
)
=
arg
max
θ
∫
z
log
p
(
x
,
z
∣
θ
)
⋅
p
(
z
∣
x
,
θ
(
t
)
)
d
z
(3)
\theta^{(t+1)}=\arg \max _{\theta} \int_{z} \log p(x, z \mid \theta) \cdot p\left(z \mid x, \theta^{(t)}\right) d z\tag{3}
θ(t+1)=argθmax∫zlogp(x,z∣θ)⋅p(z∣x,θ(t))dz(3)
第一项为对数联合概率,第二项为后验概率
E
z
∣
x
θ
(
t
)
[
log
P
(
x
,
z
∣
θ
)
]
E_{z \mid x \theta^{(t)}}[\log P(x, z \mid \theta)]
Ez∣xθ(t)[logP(x,z∣θ)]
对于收敛性,需要证明:
log
P
(
x
∣
θ
(
t
)
)
⩽
log
P
(
x
∣
θ
(
t
+
1
)
)
\log P\left( x| \theta^{(t)}\right) \leqslant \log P\left(x \mid \theta^{(t+1)}\right)
logP(x∣θ(t))⩽logP(x∣θ(t+1))
对于
log
P
(
x
∣
θ
)
=
log
P
(
x
,
z
∣
θ
)
−
log
P
(
z
∣
x
,
θ
)
\log P(x \mid \theta)=\log P(x, z \mid \theta)-\log P(z \mid x, \theta)
logP(x∣θ)=logP(x,z∣θ)−logP(z∣x,θ)
对Q进行积分:
左边
=
∫
z
p
(
z
∣
x
,
θ
(
t
)
)
⋅
log
P
(
x
∣
θ
)
d
z
=
log
P
(
x
∣
θ
)
∫
z
P
(
z
∣
x
,
θ
(
t
)
)
d
z
⏟
1
=
log
P
(
x
∣
θ
)
\begin{aligned} \text { 左边 }&=\int_{z} p\left(z \mid x, \theta^{(t)}\right) \cdot \log P(x \mid \theta) d z\\ &=\log P(x \mid \theta) \underbrace{\int_{z} P\left(z \mid x, \theta^{(t)}\right) d z}_{1}\\ &=\log P(x \mid \theta) \end{aligned}
左边 =∫zp(z∣x,θ(t))⋅logP(x∣θ)dz=logP(x∣θ)1
∫zP(z∣x,θ(t))dz=logP(x∣θ)
右边
=
∫
z
P
(
Z
∣
x
,
θ
(
t
)
)
⋅
log
P
(
x
,
z
∣
θ
)
⏟
Q
(
θ
,
θ
(
t
)
)
d
z
−
∫
z
P
(
z
∣
x
,
θ
(
t
)
)
⋅
log
P
(
Z
∣
x
,
θ
)
d
z
⏟
H
(
θ
,
θ
(
t
)
)
\begin{aligned} \text { 右边 }&= \underbrace{\int_{z} P\left(Z \mid x, \theta^{(t)}\right) \cdot \log P(x, z \mid \theta)}_{Q\left(\theta, \theta^{(t)}\right)} d z- \underbrace{\int_{z} P\left(z \mid x, \theta^{(t)}\right) \cdot \log P(Z \mid x, \theta) d z}_{H\left(\theta, \theta^{(t)}\right)} \end{aligned}
右边 =Q(θ,θ(t))
∫zP(Z∣x,θ(t))⋅logP(x,z∣θ)dz−H(θ,θ(t))
∫zP(z∣x,θ(t))⋅logP(Z∣x,θ)dz
对于第一项已经得证:
Q
(
θ
(
t
+
1
)
,
θ
(
t
)
)
⩾
Q
(
θ
(
t
)
,
θ
(
t
)
)
\begin{aligned} Q\left(\theta^{(t+1)}, \theta^{(t)}\right) & \geqslant Q\left(\theta^{(t)}, \theta^{(t)}\right) \\ \end{aligned}
Q(θ(t+1),θ(t))⩾Q(θ(t),θ(t))
下面仅需证明:
H
(
θ
(
t
+
1
)
,
θ
(
t
)
)
⩽
H
(
θ
(
t
)
,
θ
(
t
)
)
H\left(\theta^{(t+1)}, \theta^{(t)}\right) \leqslant H\left(\theta^{(t)}, \theta^{(t)}\right)
H(θ(t+1),θ(t))⩽H(θ(t),θ(t))
直接相减得:
H
(
θ
(
t
+
1
)
,
θ
(
t
)
)
−
H
(
θ
(
t
)
⋅
θ
(
t
)
)
=
∫
z
P
(
z
∣
x
,
θ
(
t
)
)
⋅
log
P
(
z
∣
x
θ
(
t
+
1
)
)
d
z
−
∫
z
P
(
z
∣
x
,
θ
(
t
)
)
⋅
log
P
(
z
∣
x
,
θ
(
t
)
)
d
z
=
∫
z
P
(
z
∣
x
,
θ
(
t
)
)
⋅
log
P
(
z
∣
x
,
θ
(
+
+
1
)
)
p
(
z
∣
x
,
θ
(
1
)
)
d
z
E
[
log
x
]
⩽
log
E
[
x
]
⩽
log
∫
z
p
(
z
∣
x
,
θ
(
t
+
1
)
)
d
z
⏟
1
=
log
1
=
0
\begin{aligned} & H\left(\theta^{(t+1)}, \theta^{(t)}\right)-H\left(\theta^{(t)} \cdot \theta^{(t)}\right) \\ =& \int_{z} P\left(z \mid x, \theta^{(t)}\right) \cdot \log P\left(z \mid x \theta^{(t+1)}\right) d z \\ -& \int_{z} P\left(z \mid x, \theta^{(t)}\right) \cdot \log P\left(z \mid x, \theta^{(t)}\right) d z \\ =& \int_{z} P\left(z \mid x, \theta^{(t)}\right) \cdot \log \frac{P\left(z \mid x, \theta^{(++1)}\right)}{p\left(z \mid x, \theta^{(1)}\right)} d z \\ & E[\log x] \leqslant \log E[x] \\ \leqslant & \log \underbrace{\int_{z} p\left(z \mid x, \theta^{(t+1)}\right) d z}_{1}=\log 1=0 \end{aligned}
=−=⩽H(θ(t+1),θ(t))−H(θ(t)⋅θ(t))∫zP(z∣x,θ(t))⋅logP(z∣xθ(t+1))dz∫zP(z∣x,θ(t))⋅logP(z∣x,θ(t))dz∫zP(z∣x,θ(t))⋅logp(z∣x,θ(1))P(z∣x,θ(++1))dzE[logx]⩽logE[x]log1
∫zp(z∣x,θ(t+1))dz=log1=0
最后一步不是很懂
(2) E-step
E
M
:
θ
(
t
+
1
)
=
arg
max
E
z
∣
x
θ
(
i
)
[
log
P
(
x
,
z
∣
θ
)
]
=
arg
max
(
Q
(
θ
,
θ
t
)
)
\begin{aligned} E M: \theta^{(t+1)}&=\arg \max E_{{z|x } \theta^{(i)}}[\log P(x, z \mid \theta)] \\ &=\arg \max(Q(\theta,\theta^{t})) \end{aligned}
EM:θ(t+1)=argmaxEz∣xθ(i)[logP(x,z∣θ)]=argmax(Q(θ,θt))
这是一个迭代的方式,逐步去逼近和的最大值,由公式(3)
Q
(
θ
,
θ
(
t
)
)
=
∫
z
log
P
(
X
,
z
∣
θ
)
⋅
P
(
z
∣
X
,
θ
(
t
)
)
d
z
=
∑
Z
log
∏
i
=
1
N
P
(
x
i
,
z
i
∣
θ
)
⏟
⋅
∏
i
=
1
N
P
(
z
i
∣
x
i
,
θ
(
t
)
)
=
∑
z
1
,
z
2
,
z
N
∑
i
=
1
N
log
P
(
x
i
,
z
i
∣
θ
)
⋅
∏
i
=
1
N
P
(
z
i
∣
x
i
,
θ
(
t
)
)
=
∑
z
1
,
z
2
,
z
N
[
log
P
(
x
1
,
z
1
∣
θ
)
+
log
P
(
x
2
,
z
2
∣
θ
)
+
⋯
+
log
P
(
x
N
,
z
N
∣
θ
)
]
∏
i
=
1
N
P
(
z
i
∣
x
i
,
θ
(
t
)
)
\begin{aligned} &Q\left(\theta, \theta^{(t)}\right)=\int_{z} \log P(X, z \mid \theta) \cdot P\left(z \mid X, \theta^{(t)}\right) d z\\ &=\sum_{Z} \underbrace{\log \prod_{i=1}^{N} P\left(x_{i}, z_{i} \mid \theta\right)} \cdot \prod_{i=1}^{N} P\left(z_{i} \mid x_{i}, \theta^{(t)}\right)\\ &=\sum_{z_1, z_{2}, z_{N}} \sum_{i=1}^{N} \log P\left(x_{i}, z_{i} \mid \theta\right) \cdot \prod_{i=1}^{N} P\left(z_{i} \mid x_{i}, \theta^{(t)}\right)\\ &=\sum_{z_1, z_{2},z_{N}}\left[\log P\left(x_{1}, z_{1} \mid \theta\right)+\log P\left(x_{2}, z_{2} \mid \theta\right)+\cdots+\log P\left(x_{N}, z_{N} \mid \theta\right)\right] \prod_{i=1}^{N} P\left(z_{i} \mid x_{i}, \theta^{(t)}\right) \end{aligned}
Q(θ,θ(t))=∫zlogP(X,z∣θ)⋅P(z∣X,θ(t))dz=Z∑
logi=1∏NP(xi,zi∣θ)⋅i=1∏NP(zi∣xi,θ(t))=z1,z2,zN∑i=1∑NlogP(xi,zi∣θ)⋅i=1∏NP(zi∣xi,θ(t))=z1,z2,zN∑[logP(x1,z1∣θ)+logP(x2,z2∣θ)+⋯+logP(xN,zN∣θ)]i=1∏NP(zi∣xi,θ(t))
因为:
P
(
x
,
z
)
=
P
(
z
)
⋅
p
(
x
∣
z
)
=
p
z
⋅
N
(
x
∣
μ
z
,
Σ
z
)
\begin{aligned} P(x, z) &=P(z) \cdot p(x \mid z) \\ &=p_{z} \cdot N\left(x \mid \mu_{z}, \Sigma_{z}\right) \\ \end{aligned}
P(x,z)=P(z)⋅p(x∣z)=pz⋅N(x∣μz,Σz)
P
(
z
∣
x
)
=
p
(
x
,
z
)
p
(
x
)
=
p
z
⋅
N
(
x
∣
μ
z
Σ
z
)
∑
k
=
1
k
p
k
⋅
N
(
x
∣
μ
k
,
∑
k
)
P(z \mid x) =\frac{p(x, z)}{p(x)}=\frac{p_{z} \cdot N\left(x \mid \mu_{z} \Sigma_{z}\right)}{\sum_{k=1}^{k} p_{k} \cdot N\left(x \mid \mu_{k}, \sum_{k}\right)}
P(z∣x)=p(x)p(x,z)=∑k=1kpk⋅N(x∣μk,∑k)pz⋅N(x∣μzΣz)
∑
z
1
,
z
2
,
z
N
log
P
(
x
1
,
z
1
∣
θ
)
⋅
∏
i
=
1
N
P
(
z
i
∣
x
i
,
θ
(
t
)
)
=
∑
z
1
log
P
(
x
1
,
z
1
∣
θ
)
⋅
p
(
z
1
∣
x
1
,
θ
(
t
)
)
\sum_{z_{1},z_{2},z_N} {\log P\left(x_{1}, z_{1} \mid \theta\right) \cdot \prod_{i=1}^{N} P\left(z_{i} \mid x_{i}, \theta^{(t)}\right)}=\sum_{z_{1}} \log P\left(x_{1}, z_{1} \mid \theta\right) \cdot p\left(z_{1} \mid x_{1}, \theta^{(t)}\right)
z1,z2,zN∑logP(x1,z1∣θ)⋅i=1∏NP(zi∣xi,θ(t))=z1∑logP(x1,z1∣θ)⋅p(z1∣x1,θ(t))
上式:
=
∑
z
1
log
P
(
x
1
,
z
1
∣
θ
)
⋅
P
(
z
1
∣
x
1
,
θ
(
t
)
)
+
⋯
+
∑
Z
N
log
P
(
x
N
,
z
N
∣
θ
)
⋅
p
(
z
N
∣
x
N
,
θ
(
t
)
)
=
∑
i
=
1
N
∑
Z
i
log
P
(
x
i
,
z
i
∣
θ
)
⋅
P
(
z
i
∣
x
i
,
θ
(
t
)
)
=
∑
i
=
1
N
∑
z
i
log
[
p
z
i
⋅
N
(
x
i
∣
μ
z
i
,
Σ
z
i
)
]
⋅
p
z
i
⋅
N
(
x
i
∣
μ
z
i
,
(
t
)
Σ
z
i
(
t
)
)
∑
k
=
1
k
p
k
(
t
)
⋅
N
(
x
i
∣
μ
k
(
t
)
Σ
k
(
t
)
)
(4)
\begin{aligned} &=\sum_{z_{1}} \log P\left(x_{1}, z_1 \mid \theta\right) \cdot P\left(z_{1} \mid x_{1}, \theta^{(t)}\right)+\cdots+\sum_{Z_N} \log P\left(x_{N} ,z_{N} \mid \theta\right) \cdot p\left(z_{N} \mid x_{N}, \theta^{(t)}\right)\\ &=\sum_{i=1}^{N} \sum_{Z_i} \log P\left(x_{i}, z_{i} \mid \theta\right) \cdot P\left(z_{i} \mid x_{i}, \theta^{(t)}\right)\\\tag{4} &=\sum_{i=1}^{N} \sum_{z_{i}} \log [p_{z_{i}} \cdot N\left(x_{i} \mid \mu_{z_i}, \Sigma_{z_i}\right)] \cdot \frac{p_{z_{i}} \cdot N\left(x_{i} \mid \mu_{z_i,}^{(t)} \Sigma_{z_i}^{(t)}\right)}{\sum_{k=1}^{k} p_{k}^{(t)} \cdot N\left(x_{i} \mid \mu_{k}^{(t)} \Sigma_{k}^{(t)}\right)} \end{aligned}
=z1∑logP(x1,z1∣θ)⋅P(z1∣x1,θ(t))+⋯+ZN∑logP(xN,zN∣θ)⋅p(zN∣xN,θ(t))=i=1∑NZi∑logP(xi,zi∣θ)⋅P(zi∣xi,θ(t))=i=1∑Nzi∑log[pzi⋅N(xi∣μzi,Σzi)]⋅∑k=1kpk(t)⋅N(xi∣μk(t)Σk(t))pzi⋅N(xi∣μzi,(t)Σzi(t))(4)
注意第二项:
μ
z
i
(
t
)
,
Σ
Z
i
(
t
)
\mu_{z_i}^{(t)}, \Sigma_{Z_i}^{(t)}
μzi(t),ΣZi(t)
式(4)可继续写:
=
∑
i
=
1
N
∑
z
i
log
[
p
z
i
⋅
N
(
x
i
∣
μ
ε
i
,
Σ
z
i
)
]
⋅
P
(
z
i
∣
x
i
,
θ
(
t
)
)
=
∑
z
i
∑
i
=
1
N
log
p
z
i
⋅
N
(
x
i
∣
μ
z
i
,
Σ
z
i
)
⋅
P
(
z
i
∣
x
i
,
θ
(
t
)
)
=
∑
k
=
1
K
∑
i
=
1
N
log
[
p
k
⋅
N
(
x
i
∣
μ
k
,
Σ
k
)
]
P
(
z
i
=
c
k
∣
x
i
,
θ
(
t
)
)
=
∑
k
=
1
k
∑
i
=
1
N
[
log
p
k
+
log
N
(
x
i
∣
μ
k
Σ
k
)
]
⋅
P
(
z
i
=
c
k
∣
x
i
,
θ
(
t
)
)
\begin{aligned} &=\sum_{i=1}^{N} \sum_{z_{i}} \log \left[p_{z_{i}} \cdot N\left(x_{i} \mid \mu_{\varepsilon_{i}}, \Sigma_{z_{i}}\right)\right] \cdot P\left(z_{i} \mid x_{i}, \theta^{(t)}\right)\\ &=\sum_{z_i} \sum_{i=1}^{N} \log p_{z_{i}} \cdot N\left(x_{i} \mid \mu_{z_{i}}, \Sigma_{z_{i}}\right) \cdot P\left(z_{i} \mid x_{i}, \theta^{(t)}\right)\\ &=\sum_{k=1}^{K} \sum_{i=1}^{N} \log \left[p_{k} \cdot N\left(x_{i} \mid \mu_{k}, \Sigma_{k}\right)\right] P\left(z_{i}=c_{k} \mid x_{i}, \theta^{(t)}\right)\\ &=\sum_{k=1}^{k} \sum_{i=1}^{N}\left[\log p_{k}+\log N\left(x_{i} \mid \mu_{k} \Sigma_{k}\right)\right] \cdot P\left(z_{i}=c_{k} \mid x_{i}, \theta^{\left(t\right)}\right) \end{aligned}
=i=1∑Nzi∑log[pzi⋅N(xi∣μεi,Σzi)]⋅P(zi∣xi,θ(t))=zi∑i=1∑Nlogpzi⋅N(xi∣μzi,Σzi)⋅P(zi∣xi,θ(t))=k=1∑Ki=1∑Nlog[pk⋅N(xi∣μk,Σk)]P(zi=ck∣xi,θ(t))=k=1∑ki=1∑N[logpk+logN(xi∣μkΣk)]⋅P(zi=ck∣xi,θ(t))
(3) M-step
θ
(
t
+
1
)
=
argmax
θ
Q
(
θ
,
θ
(
t
)
)
\theta^{(t+1)}=\underset{\theta}{\operatorname{argmax}} Q\left(\theta, \theta^{(t)}\right)
θ(t+1)=θargmaxQ(θ,θ(t))
下面以求
p
k
(
t
+
1
)
p_k^{(t+1)}
pk(t+1)为例:
p
k
(
t
+
1
)
=
arg max
p
k
∑
k
=
1
k
∑
i
=
1
N
log
p
k
⋅
P
(
Z
i
=
C
k
∣
x
i
,
θ
(
t
)
)
,
s.t
∑
k
=
1
K
p
k
=
1
p_{k}^{(t+1)}=\argmax_{p_{k}} \sum_{k=1}^{k} \sum_{i=1}^{N} \log p_{k} \cdot P\left(Z_{i}=C_{k} \mid x_{i}, \theta^{(t)}\right), \text { s.t } \sum_{k=1}^{K} p_{k}=1
pk(t+1)=pkargmaxk=1∑ki=1∑Nlogpk⋅P(Zi=Ck∣xi,θ(t)), s.t k=1∑Kpk=1
约束优化问题,使用拉格朗日乘子法,首先定义拉格朗日乘子
L
(
p
,
λ
)
=
∑
k
=
1
k
∑
i
=
1
N
log
p
k
⋅
P
(
Z
i
=
C
k
∣
x
i
,
θ
(
t
)
)
+
λ
(
∑
k
=
1
k
p
k
−
1
)
\mathcal{L}(p, \lambda)=\sum_{k=1}^{k} \sum_{i=1}^{N} \log p_{k} \cdot P\left(Z_{i}=C_{k} \mid x_{i}, \theta^{(t)}\right)+\lambda\left(\sum_{k=1}^{k} p_{k}-1\right)
L(p,λ)=k=1∑ki=1∑Nlogpk⋅P(Zi=Ck∣xi,θ(t))+λ(k=1∑kpk−1)
求偏导数令其为0:
∂
f
∂
p
k
=
∑
i
=
1
N
1
p
k
⋅
P
(
Z
i
=
C
k
∣
x
i
,
θ
(
t
)
)
+
λ
≜
0
\frac{\partial \mathcal{f}}{\partial p_{k}}=\sum_{i=1}^{N} \frac{1}{p_{k}} \cdot P\left(Z_{i}=C_{k} \mid x_{i}, \theta^{(t)}\right)+\lambda \triangleq 0
∂pk∂f=i=1∑Npk1⋅P(Zi=Ck∣xi,θ(t))+λ≜0
⇒
∑
i
=
1
N
P
(
z
i
=
C
R
∣
x
i
,
θ
(
t
)
)
+
p
k
i
λ
i
=
0
⇒
k
=
1
,
⋯
,
K
⇒
∑
i
=
1
N
∑
k
=
1
K
p
(
Z
i
=
C
k
∣
x
i
,
θ
(
t
)
⏟
1
)
+
∑
k
=
1
K
p
k
λ
⏟
1
=
0
⇒
N
+
λ
=
0
\begin{aligned} &\Rightarrow \sum_{i=1}^{N} P\left(z_{i}=C_{R} \mid x_{i}, \theta^{(t)}\right)+p_{k} i \lambda_{i}=0\\ &\Rightarrow{k=1, \cdots,K}{\Rightarrow} \underbrace{\sum_{i=1}^{N} \sum_{k=1}^{K} p\left(Z_{i}=C_{k} \mid x_{i}, \theta^{(t)}\right.}_{1})+\underbrace{\sum_{k=1}^{K} p_{k} \lambda}_{1}=0\\ &\Rightarrow N+\lambda=0 \end{aligned}
⇒i=1∑NP(zi=CR∣xi,θ(t))+pkiλi=0⇒k=1,⋯,K⇒1
i=1∑Nk=1∑Kp(Zi=Ck∣xi,θ(t))+1
k=1∑Kpkλ=0⇒N+λ=0
p
k
(
t
+
1
)
=
1
N
∑
i
=
1
N
P
(
Z
i
=
C
k
∣
x
i
,
θ
(
t
)
)
p
(
t
+
1
)
=
(
p
1
(
t
+
1
)
,
⋯
,
p
k
(
t
+
1
)
)
\begin{aligned} &p_{k}^{(t+1)}=\frac{1}{N} \sum_{i=1}^{N} P\left(Z_{i}=C_{k} \mid x_{i}, \theta^{(t)}\right)\\ &p^{(t+1)}=\left(p_{1}^{(t+1)}, \cdots, p_{k}^{(t+1)}\right) \end{aligned}
pk(t+1)=N1i=1∑NP(Zi=Ck∣xi,θ(t))p(t+1)=(p1(t+1),⋯,pk(t+1))
总结
基本了解了GMM模型是什么,以及如何通过EM算法去求解,手推公式感觉很不友好,开始入门HMM