最优化方法

81 阅读 0 评论 54 点赞

我是靠谱客的博主优秀雪碧，这篇文章主要介绍最优化方法，现在分享给大家，希望可以做个参考。

任意矩阵 $C^{mtimes n}$ ，都能被奇异值分解为：
$sum_r & 0\ 0 & 0end{bmatrix} V^T$
其中， $U$ 是 $m \times n$ 的正交矩阵， $V$ 是 $n \times n$ 的正交矩阵， $sum_r$ 是由 $r$ 个沿对角线从大到小排列的奇异值组成的方阵。 $r$ 就是矩阵 $A$ 的秩。
3. 线性最小二乘问题
考虑线性方程组 $A x = b$ , 求其最小二乘解。
如果 $A$ 的秩是 $n$ , 则其唯一解是 $A^{+}mathbf{b}$ ;如果秩小于 $n$ , 则有无穷多解, 其中的最小范数解仍然是 $mathbf{A}^{+}mathbf{b}$ ;常关心的也就是这个解。

适定问题是指满足下列三个要求的问题:①解是存在的(存在性);②解是惟一的(唯一性);③解连续依赖于初始值条件(稳定性)。这三个要求中，只要有一个不满足，则称之为不适定问题。
凸优化问题是指目标函数为凸函数且由约束条件得到的定义域为凸集的优化问题。
凸函数是指一个定义在某个向量空间的凸子集 $C$ （区间）上的实值函数 $f$ ，而且对于凸子集 $C$ 中任意两个向量 $x_1$ 和 $x_1$ ， $f((x_1+x_2)/2)≤(f(x_1)+f(x_2))/2$ 。
凸集：在欧氏距离空间中，凸集是指对于所在区域中的任意两个点，连接这两个点的直线也在该区域内的区域。

谱范数（2范数） $A||_2$ 是 $A^HA$ （其中 $A^H$ 共轭转置）的最大特征根的非负平方根，即 $A||_2$ = $}A^HA)^{1/2}$ 。

对于 $L_1$ 范数函数的Proximal算子：

参考网址：The Proximal Operator of the L1 Norm Function
由Proximal算子给出的优化问题：
$operatorname{Prox}_{lambda |_{1}}(x)=arg min _{u}left{frac{1}{2}|u-x|^{2}+lambda|u|_{1}right}$
根据 $u$ 和 $x$ ，该问题可以进行分离，因此可以转换为如下问题：
$_{u_{i}}left{frac{1}{2}left(u_{i}-x_{i}right)^{2}+lambdaleft|u_{i}right|right}$
Now, you can proceed using First Order Optimality Condition and the Sub Gradient of the $a b s (\cdot)$ function or you can employ simple trick.
需要知道的是， $u_i$ 可以是负值、零和正值。定义： $F=frac{1}{2}left(u_{i}-x_{i}right)^{2}+lambdaleft|u_{i}right|$
（1）若 $u_i>0$ ，则 $F$ 对 $u_i$ 的偏导数为 ${u_i}=u_i-x_i+lambda$ 。令 ${u_i}-0$ ，则 $u_i=x_i-lambda$ .
（2）The same procedure for the case ui<0 yields $u_i=x_i+lambda$ for $x_i<−lambda$ .
For values of xi in between, since ui=0 and hence the derivative (Sub Gradient) of ui can freely be chosen on the range [−1,1] the value of ui=0 holds.
$operatorname{Prox}_{lambda|cdot|_{1}}(x)_{i}=operatorname{sign}left(x_{i}right) max left(left|x_{i}right|-lambda, 0right)$
该运算可以被称为软阈值（Soft Threshold）函数。

Incremental proximal方法

Incremental proximal方法是一类旨在解决问题的算法，在每次迭代中，它只使用一个成分 $f_k$ 。当已经取得 $theta_{k-1}$ 的条件下，Inpremental proximal方法通过求解如下子问题进行解决：
$theta_{k}=operatorname{prox}_{lambda, f_{k}}left(theta_{k-1}right)=underset{theta in mathbb{R}^{d}}{operatorname{argmin}} f_{k}(theta)+lambdaleft|theta-theta_{k-1}right|_{2, V^{-1}}^{2}$

基追踪算法

基追踪（basis pursuit）算法是一种用来求解未知参量 $L_1$ 范数最小化的等式约束问题的算法。
基追踪是通常在信号处理中使用的一种对已知系数稀疏化的手段。将优化问题中的 $L_0$ 范数转化为 $L_1$ 范数的求解就是基追踪的基本思想。

肓源分离

盲源分离（Blind Source Separation，BSS）又称为盲信号分离，是指在信号的理论模型和源信号无法精确获知的情况下，如何从混迭信号（观测信号）中分离出各源信号的过程。盲源分离和盲辨识是盲信号处理的两大类型。盲源分离的目的是求得源信号的最佳估计，盲辨识的目的是求得传输通道的混合矩阵。

总变分(Total Variation)最小化方法

Rudin等人(Rudin1990)观察到，受噪声污染的图像的总变分比无噪图像的总变分明显的大。总变分定义为梯度幅值的积分：
$J_{T_{0}}(u)=int_{Omega_{u}}left|nabla_{u}right| d x d y=int_{D_{u}} sqrt{u_{x}^{2}+u_{y}^{2} d x d y}$
其中， $u_{x}=frac{partial u}{partial x}$ ， $u_{y}=frac{partial u}{partial y}$ ， $D_u$ 是图像的支持域。限制总变分就会限制噪声。
对于定义在区间 $[a, b] \subset R$ 的实值函数 $f$ ，该定义区间上的总变分为用参数方程 $x \mapsto f (x) f o r x \in [a, b]$ 测量曲线的一维长度。

对于2D信号 $y$ ，比如图像，它的Rudin等人提出了全变分范数[1]：
$V(y)=sum_{i, j} sqrt{left|y_{i+1, j}-y_{i, j}right|^{2}+left|y_{i, j+1}-y_{i, j}right|^{2}}$
该全变分范数是各向同性的，但不是可微分的。它的一个变种如下所示（因为这个变种有时更易最小化）：
$V_{mathrm{aniso}}(y)=sum_{i, j} sqrt{left|y_{i+1, j}-y_{i, j}right|^{2}}+sqrt{left|y_{i, j+1}-y_{i, j}right|^{2}}=sum_{i, j}left|y_{i+1, j}-y_{i, j}right|+left|y_{i, j+1}-y_{i, j}right|$
则标准的全变分去噪问题有如下形式：
$_{y}[mathrm{E}(x, y)+lambda V(y)]$
其中， $E$ 是2维的 $L_2$ 范数。

Augmented Lagrange Multiplier Method

ALM method may be called as Method of Multiplier (MOM) or Primal-Dual Method. Let’s consider Lagrangian functional only for equality constraints $h (x) = 0$ .
$L(x)=f(x)+lambda^Th(x)$
Now, for a Lagrange multiplier vector $lambda^*$ , suppose that there is an optimum $x^*$ for the following unconstrained optimization problem.
$min_x L(x,lambda^*)$
If $x^*$ satisfy all the equality constraints $h(x^*)=0$ in the original design problem, $x^*$ is an optimum for the original optimization problem and $lambda^*$ is a Lagrange multiplier optimum. Consequently, the original optimization problem can be transformed into the following problem that have the same optimum $x^*$ and $lambda^*$ .
$min_x L(x,lambda )\ text{s.t.},,h_i(x)=0,,,i=1,2,...,l$
In order to avoid the unboundness of Lagrangian, a penalty function is introduced. We call it as augmented Lagrangian.
$A(x,lambda,r)=L(x,lambda)+frac{1}{2}sum_{i=1}^{l}r_ih_i(x))^2$
where, $r_i$ is the penalty parameter for the ith equality constraint. In the ALM method, the unconstrained optimization tool sequentially minimize the augmented Lagrangian for the given value of $r_i$ and $lambda_i$ . Then, these two parameters are modified to satisfy the optimality condition.

The update rule for Lagrange multipliers can be determined from the following relation.
$lim_{xrightarrow x^*}nabla A(x,lambda,r)=nabla L(x^*,lambda^*)$
This implies
$lim_{xrightarrow x^*} left(lambda_i+r_ih_i(x)right)=lambda_i^*,,,i=1,2,...,l$
Hence, the update rule for Lagrange multipliers is
$lambda_i^{k+1}=lambda_i^k+r_i^k h_i(x^k)$
where, the superscript $k$ is the iteration of ALM algorithm.

Inequality constraints are transformed into equality constraints by adding slack variables ( $theta_j>0$ ). Thus, the augmented Lagrangian becomes
$A(x,theta,mu,r)=f(x)+sum_{j=1}^mleft[mu_j(g_j(x)+theta)+frac{1}{2}(g_j(x)+theta_j)^2right]$
Then, a new primal variables are $\overset{x}{ˉ} = {x, θ}$ . The augmented Lagrangian should satisfy the optimality condition for slack variable $theta_j$ .
$nabla_{theta_j}A(x,theta,mu,r)=sum_{j=1}^m(mu_j+r_j(g_j(x)+theta_j))=0$
Hence, an optimum of slack variable $theta_j$ is
$theta_j^*=maxleft{0,-frac{mu_j}{r_j}-g_j(x)right}$
Now, these optimum values are substituted into the originally transformed form.
$g_j(x)+theta_j^*=g_j(x)+maxleft{0,-frac{mu_j}{r_j}-g_j(x)right}\ =maxleft{g_j(x),-frac{mu_j}{r_j}right}\$
Hence, the augmented Lagrangian for inequality constraints are transformed into the following simple functional.
$A(x,mu,r)=f(x)+sum_{j=1}^mleft[mu_jPsi_j,r_j)+frac{1}{2}r_jPsi_j(x,mu_j,r_j)^2 right]$
where, $Psi_j(x,mu_j,r_j)=max{g_j(x),-frac{mu_j}{r_j}}$ . Also, the Lagrange multiplier update rule is defined as
$mu_j^{k+1}=mu_j^k+r_jPsi_j(x^k,mu_j^k,r_j^k),,,j=1,2,...,l$