概述
任意矩阵
A
∈
C
m
×
n
Ain C^{mtimes n}
A∈Cm×n,都能被奇异值分解为:
A
=
U
[
∑
r
0
0
0
]
V
T
A=Ubegin{bmatrix} sum_r & 0\ 0 & 0end{bmatrix} V^T
A=U[∑r000]VT
其中,
U
U
U是
m
×
n
mtimes n
m×n的正交矩阵,
V
V
V是
n
×
n
ntimes n
n×n的正交矩阵,
∑
r
sum_r
∑r是由
r
r
r个沿对角线从大到小排列的奇异值组成的方阵。
r
r
r就是矩阵
A
A
A的秩。
3. 线性最小二乘问题
考虑线性方程组
A
x
=
b
mathbf{A}mathbf{x}=mathbf{b}
Ax=b, 求其最小二乘解。
如果
A
mathbf{A}
A的秩是
n
n
n, 则其唯一解是
A
+
b
A^{+}mathbf{b}
A+b;如果秩小于
n
n
n, 则有无穷多解, 其中的最小范数解仍然是
A
+
b
mathbf{A}^{+}mathbf{b}
A+b;常关心的也就是这个解。
适定问题是指满足下列三个要求的问题:①解是存在的(存在性);②解是惟一的(唯一性);③解连续依赖于初始值条件(稳定性)。这三个要求中,只要有一个不满足,则称之为不适定问题。
凸优化问题是指目标函数为凸函数且由约束条件得到的定义域为凸集的优化问题。
凸函数是指一个定义在某个向量空间的凸子集
C
C
C(区间)上的实值函数
f
f
f,而且对于凸子集
C
C
C中任意两个向量
x
1
x_1
x1和
x
1
x_1
x1,
f
(
(
x
1
+
x
2
)
/
2
)
≤
(
f
(
x
1
)
+
f
(
x
2
)
)
/
2
f((x_1+x_2)/2)≤(f(x_1)+f(x_2))/2
f((x1+x2)/2)≤(f(x1)+f(x2))/2。
凸集:在欧氏距离空间中,凸集是指对于所在区域中的任意两个点,连接这两个点的直线也在该区域内的区域。
谱范数(2范数) ∣ ∣ A ∣ ∣ 2 ||A||_2 ∣∣A∣∣2是 A H A A^HA AHA (其中 A H A^H AH 共轭转置)的最大特征根的非负平方根,即 ∣ ∣ A ∣ ∣ 2 ||A||_2 ∣∣A∣∣2= ( maximum eigenvalue of A H A ) 1 / 2 (text{maximum eigenvalue of }A^HA)^{1/2} (maximum eigenvalue of AHA)1/2。
对于 L 1 L_1 L1范数函数的Proximal算子:
参考网址:The Proximal Operator of the L1 Norm Function
由Proximal算子给出的优化问题:
Prox
λ
∥
1
(
x
)
=
arg
min
u
{
1
2
∥
u
−
x
∥
2
+
λ
∥
u
∥
1
}
operatorname{Prox}_{lambda |_{1}}(x)=arg min _{u}left{frac{1}{2}|u-x|^{2}+lambda|u|_{1}right}
Proxλ∥1(x)=argumin{21∥u−x∥2+λ∥u∥1}
根据
u
u
u 和
x
x
x,该问题可以进行分离,因此可以转换为如下问题:
arg
min
u
i
{
1
2
(
u
i
−
x
i
)
2
+
λ
∣
u
i
∣
}
arg min _{u_{i}}left{frac{1}{2}left(u_{i}-x_{i}right)^{2}+lambdaleft|u_{i}right|right}
arguimin{21(ui−xi)2+λ∣ui∣}
Now, you can proceed using First Order Optimality Condition and the Sub Gradient of the
a
b
s
(
⋅
)
abs(cdot)
abs(⋅) function or you can employ simple trick.
需要知道的是,
u
i
u_i
ui可以是负值、零和正值。定义:
F
=
1
2
(
u
i
−
x
i
)
2
+
λ
∣
u
i
∣
F=frac{1}{2}left(u_{i}-x_{i}right)^{2}+lambdaleft|u_{i}right|
F=21(ui−xi)2+λ∣ui∣
(1)若
u
i
>
0
u_i>0
ui>0,则
F
F
F对
u
i
u_i
ui的偏导数为
∂
F
/
∂
u
i
=
u
i
−
x
i
+
λ
partial F/partial {u_i}=u_i-x_i+lambda
∂F/∂ui=ui−xi+λ。令
∂
F
/
∂
u
i
−
0
partial F/partial {u_i}-0
∂F/∂ui−0,则
u
i
=
x
i
−
λ
u_i=x_i-lambda
ui=xi−λ.
(2)The same procedure for the case ui<0 yields
u
i
=
x
i
+
λ
u_i=x_i+lambda
ui=xi+λ for
x
i
<
−
λ
x_i<−lambda
xi<−λ.
For values of xi in between, since ui=0 and hence the derivative (Sub Gradient) of ui can freely be chosen on the range [−1,1] the value of ui=0 holds.
Prox
λ
∥
⋅
∥
1
(
x
)
i
=
sign
(
x
i
)
max
(
∣
x
i
∣
−
λ
,
0
)
operatorname{Prox}_{lambda|cdot|_{1}}(x)_{i}=operatorname{sign}left(x_{i}right) max left(left|x_{i}right|-lambda, 0right)
Proxλ∥⋅∥1(x)i=sign(xi)max(∣xi∣−λ,0)
该运算可以被称为软阈值(Soft Threshold)函数。
Incremental proximal方法
Incremental proximal方法是一类旨在解决问题的算法,在每次迭代中,它只使用一个成分
f
k
f_k
fk。当已经取得
θ
k
−
1
theta_{k-1}
θk−1的条件下,Inpremental proximal方法通过求解如下子问题进行解决:
θ
k
=
prox
λ
,
f
k
(
θ
k
−
1
)
=
argmin
θ
∈
R
d
f
k
(
θ
)
+
λ
∥
θ
−
θ
k
−
1
∥
2
,
V
−
1
2
theta_{k}=operatorname{prox}_{lambda, f_{k}}left(theta_{k-1}right)=underset{theta in mathbb{R}^{d}}{operatorname{argmin}} f_{k}(theta)+lambdaleft|theta-theta_{k-1}right|_{2, V^{-1}}^{2}
θk=proxλ,fk(θk−1)=θ∈Rdargminfk(θ)+λ∥θ−θk−1∥2,V−12
基追踪算法
基追踪(basis pursuit)算法是一种用来求解未知参量
L
1
L_1
L1范数最小化的等式约束问题的算法。
基追踪是通常在信号处理中使用的一种对已知系数稀疏化的手段。将优化问题中的
L
0
L_0
L0范数转化为
L
1
L_1
L1范数的求解就是基追踪的基本思想。
肓源分离
盲源分离(Blind Source Separation,BSS)又称为盲信号分离,是指在信号的理论模型和源信号无法精确获知的情况下,如何从混迭信号(观测信号)中分离出各源信号的过程。盲源分离和盲辨识是盲信号处理的两大类型。盲源分离的目的是求得源信号的最佳估计,盲辨识的目的是求得传输通道的混合矩阵。
总变分(Total Variation)最小化方法
Rudin等人(Rudin1990)观察到,受噪声污染的图像的总变分比无噪图像的总变分明显的大。总变分定义为梯度幅值的积分:
J
T
0
(
u
)
=
∫
Ω
u
∣
∇
u
∣
d
x
d
y
=
∫
D
u
u
x
2
+
u
y
2
d
x
d
y
J_{T_{0}}(u)=int_{Omega_{u}}left|nabla_{u}right| d x d y=int_{D_{u}} sqrt{u_{x}^{2}+u_{y}^{2} d x d y}
JT0(u)=∫Ωu∣∇u∣dxdy=∫Duux2+uy2dxdy
其中,
u
x
=
∂
u
∂
x
u_{x}=frac{partial u}{partial x}
ux=∂x∂u,
u
y
=
∂
u
∂
y
u_{y}=frac{partial u}{partial y}
uy=∂y∂u,
D
u
D_u
Du是图像的支持域。限制总变分就会限制噪声。
对于定义在区间
[
a
,
b
]
⊂
R
[a, b] subset mathbb{R}
[a,b]⊂R的实值函数
f
f
f,该定义区间上的总变分为用参数方程
x
↦
f
(
x
)
f
o
r
x
∈
[
a
,
b
]
x mapsto f(x) for xin [a, b]
x↦f(x)forx∈[a,b]测量曲线的一维长度。
对于2D信号
y
y
y,比如图像,它的Rudin等人提出了全变分范数[1]:
V
(
y
)
=
∑
i
,
j
∣
y
i
+
1
,
j
−
y
i
,
j
∣
2
+
∣
y
i
,
j
+
1
−
y
i
,
j
∣
2
V(y)=sum_{i, j} sqrt{left|y_{i+1, j}-y_{i, j}right|^{2}+left|y_{i, j+1}-y_{i, j}right|^{2}}
V(y)=i,j∑∣yi+1,j−yi,j∣2+∣yi,j+1−yi,j∣2
该全变分范数是各向同性的,但不是可微分的。它的一个变种如下所示(因为这个变种有时更易最小化):
V
a
n
i
s
o
(
y
)
=
∑
i
,
j
∣
y
i
+
1
,
j
−
y
i
,
j
∣
2
+
∣
y
i
,
j
+
1
−
y
i
,
j
∣
2
=
∑
i
,
j
∣
y
i
+
1
,
j
−
y
i
,
j
∣
+
∣
y
i
,
j
+
1
−
y
i
,
j
∣
V_{mathrm{aniso}}(y)=sum_{i, j} sqrt{left|y_{i+1, j}-y_{i, j}right|^{2}}+sqrt{left|y_{i, j+1}-y_{i, j}right|^{2}}=sum_{i, j}left|y_{i+1, j}-y_{i, j}right|+left|y_{i, j+1}-y_{i, j}right|
Vaniso(y)=i,j∑∣yi+1,j−yi,j∣2+∣yi,j+1−yi,j∣2=i,j∑∣yi+1,j−yi,j∣+∣yi,j+1−yi,j∣
则标准的全变分去噪问题有如下形式:
min
y
[
E
(
x
,
y
)
+
λ
V
(
y
)
]
min _{y}[mathrm{E}(x, y)+lambda V(y)]
ymin[E(x,y)+λV(y)]
其中,
E
E
E是2维的
L
2
L_2
L2范数。
Augmented Lagrange Multiplier Method
ALM method may be called as Method of Multiplier (MOM) or Primal-Dual Method. Let’s consider Lagrangian functional only for equality constraints
h
(
x
)
=
0
h(x)=0
h(x)=0.
L
(
x
)
=
f
(
x
)
+
λ
T
h
(
x
)
L(x)=f(x)+lambda^Th(x)
L(x)=f(x)+λTh(x)
Now, for a Lagrange multiplier vector
λ
∗
lambda^*
λ∗, suppose that there is an optimum
x
∗
x^*
x∗ for the following unconstrained optimization problem.
min
x
L
(
x
,
λ
∗
)
min_x L(x,lambda^*)
xminL(x,λ∗)
If
x
∗
x^*
x∗ satisfy all the equality constraints
h
(
x
∗
)
=
0
h(x^*)=0
h(x∗)=0 in the original design problem,
x
∗
x^*
x∗ is an optimum for the original optimization problem and
λ
∗
lambda^*
λ∗ is a Lagrange multiplier optimum. Consequently, the original optimization problem can be transformed into the following problem that have the same optimum
x
∗
x^*
x∗ and
λ
∗
lambda^*
λ∗.
min
x
L
(
x
,
λ
)
s.t.
h
i
(
x
)
=
0
,
i
=
1
,
2
,
.
.
.
,
l
min_x L(x,lambda )\ text{s.t.},,h_i(x)=0,,,i=1,2,...,l
xminL(x,λ)s.t.hi(x)=0,i=1,2,...,l
In order to avoid the unboundness of Lagrangian, a penalty function is introduced. We call it as augmented Lagrangian.
A
(
x
,
λ
,
r
)
=
L
(
x
,
λ
)
+
1
2
∑
i
=
1
l
r
i
h
i
(
x
)
)
2
A(x,lambda,r)=L(x,lambda)+frac{1}{2}sum_{i=1}^{l}r_ih_i(x))^2
A(x,λ,r)=L(x,λ)+21i=1∑lrihi(x))2
where,
r
i
r_i
ri is the penalty parameter for the ith equality constraint. In the ALM method, the unconstrained optimization tool sequentially minimize the augmented Lagrangian for the given value of
r
i
r_i
ri and
λ
i
lambda_i
λi. Then, these two parameters are modified to satisfy the optimality condition.
The update rule for Lagrange multipliers can be determined from the following relation.
lim
x
→
x
∗
∇
A
(
x
,
λ
,
r
)
=
∇
L
(
x
∗
,
λ
∗
)
lim_{xrightarrow x^*}nabla A(x,lambda,r)=nabla L(x^*,lambda^*)
x→x∗lim∇A(x,λ,r)=∇L(x∗,λ∗)
This implies
lim
x
→
x
∗
(
λ
i
+
r
i
h
i
(
x
)
)
=
λ
i
∗
,
i
=
1
,
2
,
.
.
.
,
l
lim_{xrightarrow x^*} left(lambda_i+r_ih_i(x)right)=lambda_i^*,,,i=1,2,...,l
x→x∗lim(λi+rihi(x))=λi∗,i=1,2,...,l
Hence, the update rule for Lagrange multipliers is
λ
i
k
+
1
=
λ
i
k
+
r
i
k
h
i
(
x
k
)
lambda_i^{k+1}=lambda_i^k+r_i^k h_i(x^k)
λik+1=λik+rikhi(xk)
where, the superscript
k
k
k is the iteration of ALM algorithm.
Inequality constraints are transformed into equality constraints by adding slack variables (
θ
j
>
0
theta_j>0
θj>0). Thus, the augmented Lagrangian becomes
A
(
x
,
θ
,
μ
,
r
)
=
f
(
x
)
+
∑
j
=
1
m
[
μ
j
(
g
j
(
x
)
+
θ
)
+
1
2
(
g
j
(
x
)
+
θ
j
)
2
]
A(x,theta,mu,r)=f(x)+sum_{j=1}^mleft[mu_j(g_j(x)+theta)+frac{1}{2}(g_j(x)+theta_j)^2right]
A(x,θ,μ,r)=f(x)+j=1∑m[μj(gj(x)+θ)+21(gj(x)+θj)2]
Then, a new primal variables are
x
ˉ
=
{
x
,
θ
}
bar{x}={x,theta}
xˉ={x,θ}. The augmented Lagrangian should satisfy the optimality condition for slack variable
θ
j
theta_j
θj.
∇
t
h
e
t
a
j
A
(
x
,
θ
,
μ
,
r
)
=
∑
j
=
1
m
(
μ
j
+
r
j
(
g
j
(
x
)
+
θ
j
)
)
=
0
nabla_{theta_j}A(x,theta,mu,r)=sum_{j=1}^m(mu_j+r_j(g_j(x)+theta_j))=0
∇thetajA(x,θ,μ,r)=j=1∑m(μj+rj(gj(x)+θj))=0
Hence, an optimum of slack variable
θ
j
theta_j
θj is
θ
j
∗
=
max
{
0
,
−
μ
j
r
j
−
g
j
(
x
)
}
theta_j^*=maxleft{0,-frac{mu_j}{r_j}-g_j(x)right}
θj∗=max{0,−rjμj−gj(x)}
Now, these optimum values are substituted into the originally transformed form.
g
j
(
x
)
+
θ
j
∗
=
g
j
(
x
)
+
max
{
0
,
−
μ
j
r
j
−
g
j
(
x
)
}
=
max
{
g
j
(
x
)
,
−
μ
j
r
j
}
g_j(x)+theta_j^*=g_j(x)+maxleft{0,-frac{mu_j}{r_j}-g_j(x)right}\ =maxleft{g_j(x),-frac{mu_j}{r_j}right}\
gj(x)+θj∗=gj(x)+max{0,−rjμj−gj(x)}=max{gj(x),−rjμj}
Hence, the augmented Lagrangian for inequality constraints are transformed into the following simple functional.
A
(
x
,
μ
,
r
)
=
f
(
x
)
+
∑
j
=
1
m
[
μ
j
Ψ
j
,
r
j
)
+
1
2
r
j
Ψ
j
(
x
,
μ
j
,
r
j
)
2
]
A(x,mu,r)=f(x)+sum_{j=1}^mleft[mu_jPsi_j,r_j)+frac{1}{2}r_jPsi_j(x,mu_j,r_j)^2 right]
A(x,μ,r)=f(x)+j=1∑m[μjΨj,rj)+21rjΨj(x,μj,rj)2]
where,
Ψ
j
(
x
,
μ
j
,
r
j
)
=
max
{
g
j
(
x
)
,
−
μ
j
r
j
}
Psi_j(x,mu_j,r_j)=max{g_j(x),-frac{mu_j}{r_j}}
Ψj(x,μj,rj)=max{gj(x),−rjμj}. Also, the Lagrange multiplier update rule is defined as
μ
j
k
+
1
=
μ
j
k
+
r
j
Ψ
j
(
x
k
,
μ
j
k
,
r
j
k
)
,
j
=
1
,
2
,
.
.
.
,
l
mu_j^{k+1}=mu_j^k+r_jPsi_j(x^k,mu_j^k,r_j^k),,,j=1,2,...,l
μjk+1=μjk+rjΨj(xk,μjk,rjk),j=1,2,...,l
相关函数基础
考虑到Hilbert空间中的线性映射
A
:
H
1
→
H
2
A:H_{1}to H_{2}
A:H1→H2。若不关注细节,adjoint operator是如下一种线性算子(大部分情况下这是唯一定义)
A
∗
:
H
2
→
H
1
A^{*}:H_{2}to H_{1}
A∗:H2→H1
⟨
A
h
1
,
h
2
⟩
H
2
=
⟨
h
1
,
A
∗
h
2
⟩
H
1
leftlangle Ah_{1},h_{2}rightrangle _{H_{2}}=leftlangle h_{1},A^{*}h_{2}rightrangle _{H_{1}}
⟨Ah1,h2⟩H2=⟨h1,A∗h2⟩H1
其中,
⟨
⋅
,
⋅
⟩
H
i
langle cdot ,cdot rangle _{H_{i}}
⟨⋅,⋅⟩Hi是Hilbert空间
H
i
H_{i}
Hi中的内积算子.
参考文献:
[1] Rudin, Leonid I., Stanley Osher, and Emad Fatemi. “Nonlinear total variation based noise removal algorithms.” Physica D: nonlinear phenomena 60.1-4 (1992): 259-268.
最后
以上就是优秀雪碧为你收集整理的最优化方法的全部内容,希望文章能够帮你解决最优化方法所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复