caffe层解读系列——hinge_loss

77 阅读 0 评论 51 点赞

我是靠谱客的博主听话鸡翅，这篇文章主要介绍caffe层解读系列——hinge_loss，现在分享给大家，希望可以做个参考。

————— Hinge Loss 定义 —————

Hinge Loss 主要针对要求”maximum-margin”的分类问题，因此尤其适用于SVM分类。

Hinge Loss的定义如下：

(l(y) = max(0,1-tcdot y))

其中， (t=pm1) , 需要注意的是 (y) 并不是分类的label，而只是决策函数的输出。例如在线性SVM中， (y=wx+b), (x) 是待分类点， (w) 和 (b) 构成分类超平面。

从定义可以很容易看出，当 (t) 和 (y) 符号一致(表示 (y) 分类正确) 且 (Vert yVert ge 1) 时Hinge Loss (l(y)=0); 当符号不一致时，(l(y)=0) 随 (y) 线性增加。

———— caffe中如何定义Hinge Loss ————

caffe中定义与上面的介绍有些相反的地方，下面具体介绍caffe中具体是怎样实现的。

caffe提供了 L1 和 L2 两种Hinge Loss,即

(l(y) = Vert HVert_1 ) 和 (l(y) = Vert HVert_2 )

其中

(H_i = max(0,1+tcdot y), quad if i=label, 则t=-1; quad 否则 t=1)

下面举例说明，caffe中是如何计算多分类的Hinge Loss的：

比如我们要分5类，下表是分类器的5个输出，已知label=3.

ID	1	2	3	4	5
y	-1.73	-1.24	0.89	-0.99	0.05
t	1	1	-1	1	1

于是可以很容易得出H为：

ID	1	2	3	4	5
H	0.00	0.00	0.11	0.01	1.05

于是

(l(y) = Vert HVert_1 = sum_{i=1}^{5}H_i = 1.17)

(l(y) = Vert HVert_2 = sum_{i=1}^{5}H_{i}^{2} = 1.1147)

caffe中的实现源码如下：

复制代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
void HingeLossLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
const vector<Blob<Dtype>*>& top) {
const Dtype* bottom_data = bottom[0]->cpu_data();
Dtype* bottom_diff = bottom[0]->mutable_cpu_diff();
const Dtype* label = bottom[1]->cpu_data();
int num = bottom[0]->num();
int count = bottom[0]->count();
int dim = count / num;
caffe_copy(count, bottom_data, bottom_diff);
for (int i = 0; i < num; ++i) {
bottom_diff[i * dim + static_cast<int>(label[i])] *= -1;
}
for (int i = 0; i < num; ++i) {
for (int j = 0; j < dim; ++j) {
bottom_diff[i * dim + j] = std::max(
Dtype(0), 1 + bottom_diff[i * dim + j]);
}
}
Dtype* loss = top[0]->mutable_cpu_data();
switch (this->layer_param_.hinge_loss_param().norm()) {
case HingeLossParameter_Norm_L1:
loss[0] = caffe_cpu_asum(count, bottom_diff) / num;
break;
case HingeLossParameter_Norm_L2:
loss[0] = caffe_cpu_dot(count, bottom_diff, bottom_diff) / num;
break;
default:
LOG(FATAL) << "Unknown Norm";
}
}

———— caffe中Hinge Loss如何求导————

Hinge Loss的求导非常简单，

还是以上一节中的例子来说明L1下的求导

(frac{partial
H_i}{partial
y} = 0, quad if H_i=0)

(frac{partial
H_i}{partial
y} = frac{partial
(1+tcdot y)}{partial
y} = t, quad if H_ineq 0)

实际计算值如下表：

ID	3	4	5
(H)	0.11	0.01	1.05
(L1：partial H_i )	-1.00	1.00	1.00
(L2：partial H_i )	-0.22	0.02	2.10

caffe中的求导实现源码如下：

复制代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

if (propagate_down[0]) {
Dtype* bottom_diff = bottom[0]->mutable_cpu_diff();
const Dtype* label = bottom[1]->cpu_data();
int num = bottom[0]->num();
int count = bottom[0]->count();
int dim = count / num;
for (int i = 0; i < num; ++i) {
bottom_diff[i * dim + static_cast<int>(label[i])] *= -1;
}
const Dtype loss_weight = top[0]->cpu_diff()[0];
switch (this->layer_param_.hinge_loss_param().norm()) {
case HingeLossParameter_Norm_L1:
caffe_cpu_sign(count, bottom_diff, bottom_diff);
caffe_scal(count, loss_weight / num, bottom_diff);
break;
case HingeLossParameter_Norm_L2:
caffe_scal(count, loss_weight * 2 / num, bottom_diff);
break;
default:
LOG(FATAL) << "Unknown Norm";
}
}