CV目标检测 Task03: 化劲儿-损失函数设计打卡笔记前沿3.5 损失函数 loss函数代码分析

83 阅读 0 评论 55 点赞

我是靠谱客的博主清脆牛排，这篇文章主要介绍CV目标检测 Task03: 化劲儿-损失函数设计打卡笔记前沿3.5 损失函数 loss函数代码分析，现在分享给大家，希望可以做个参考。

Task03: 化劲儿-损失函数设计

前沿
3.5 损失函数
- 3.5.1 Matching strategy (匹配策略)：
- 3.5.2 损失函数
- 3.5.3 Hard negative mining:
- 3.5.4 小结
loss函数代码分析
- 对MultiBoxLoss类分段解释

前沿

光知道模型的结构，以及模型最终会输出什么怎么够，你得懂得化劲儿，通过合理的设置损失函数和一些相关的训练技巧，让模型向着正确的方向学习，从而预测出我们想要的结果。
因此在task3中，我们需要学习损失函数设计，对应《动手学CV-Pytorch》3.5节：
损失函数.
学习任务：
学习anchor和GT目标框的匹配策略
了解并学习损失函数的定义，并思考这样设计的道理
了解在线难例挖掘的训练技巧

3.5 损失函数

3.5.1 Matching strategy (匹配策略)：

我们分配了许多prior bboxes，我们要想让其预测类别和目标框信息，我们先要知道每个prior bbox和哪个目标对应，从而才能判断预测的是否准确，从而将训练进行下去。

不同方法 ground truth boxes 与 prior bboxes 的匹配策略大致都是类似的，但是细节会有所不同。这里我们采用SSD中的匹配策略，具体如下：

第一个原则：从ground truth box出发，寻找与每一个ground truth box有最大的jaccard overlap的prior bbox，这样就能保证每一个groundtruth box一定与一个prior bbox对应起来(jaccard overlap就是IOU，如图3-26所示，前面介绍过)。反之，若一个prior bbox没有与任何ground truth进行匹配，那么该prior bbox只能与背景匹配，就是负样本。

在这里插入图片描述

图3-26 IOU

一个图片中ground truth是非常少的，而prior bbox却很多，如果仅按第一个原则匹配，很多prior bbox会是负样本，正负样本极其不平衡，所以需要第二个原则。

第二个原则：从prior bbox出发，对剩余的还没有配对的prior bbox与任意一个ground truth box尝试配对，只要两者之间的jaccard overlap大于阈值（一般是0.5），那么该prior bbox也与这个ground truth进行匹配。这意味着某个ground truth可能与多个Prior box匹配，这是可以的。但是反过来却不可以，因为一个prior bbox只能匹配一个ground truth，如果多个ground truth与某个prior bbox的 IOU 大于阈值，那么prior bbox只与IOU最大的那个ground truth进行匹配。

注意：第二个原则一定在第一个原则之后进行，仔细考虑一下这种情况，如果某个ground truth所对应最大IOU的prior bbox小于阈值，并且所匹配的prior bbox却与另外一个ground truth的IOU大于阈值，那么该prior bbox应该匹配谁，答案应该是前者，首先要确保每个ground truth一定有一个prior bbox与之匹配。

用一个示例来说明上述的匹配原则：
在这里插入图片描述

图3-27

图像中有7个红色的框代表先验框，黄色的是ground truths，在这幅图像中有三个真实的目标。按照前面列出的步骤将生成以下匹配项：

在这里插入图片描述

图3-28

3.5.2 损失函数

下面来介绍如何设计损失函数。

将总体的目标损失函数定义为定位损失（loc）和置信度损失（conf）的加权和：
在这里插入图片描述
其中N是匹配到GT（Ground Truth）的prior bbox数量，如果N=0，则将损失设为0；而 α 参数用于调整confidence loss和location loss之间的比例，默认 α=1。

confidence loss是在多类别置信度©上的softmax loss，公式如下：
在这里插入图片描述
其中i指代搜索框序号，j指代真实框序号，p指代类别序号，p=0表示背景。其中 $x_{ij}^{p}$ ={1,0} 中取1表示第i个prior bbox匹配到第 j 个GT box，而这个GT box的类别为 p 。 $x_{i}^{p}$ 表示第i个搜索框对应类别p的预测概率。此处有一点需要关注，公式前半部分是正样本（Pos）的损失，即分类为某个类别的损失（不包括背景），后半部分是负样本（Neg）的损失，也就是类别为背景的损失。

而location loss（位置回归）是典型的smooth L1 loss
在这里插入图片描述
其中，l为预测框，g为ground truth。(cx,xy)为补偿(regress to offsets)后的默认框d的中心,(w,h)为默认框的宽和高。更详细的解释看-看下图：

这里再扩展一下L1 loss, L2 loss以及Smooth L1 Loss的对比
要知道这边的边框回归使用的是Smooth L1 Loss
均方误差L2 Loss和平均绝对误差L1 Loss想必很熟悉了

评价均方误差L2 Loss：
MSE对于较大的误差（>1）给予较大的惩罚，较小的误差（<1）给予较小的惩罚。也就是说，对离群点比较敏感，受其影响较大。
如果样本中存在离群点，MSE会给离群点更高的权重，这就会牺牲其他正常点数据的预测效果，最终降低整体的模型性能在这里插入图片描述

评价平均绝对误差（L1 Loss）
大部分情况下梯度都是相等的，这意味着即使对于小的损失值，其梯度也是大的。这不利于函数的收敛和模型的学习，
但是，无论对于什么样的输入值，都有着稳定的梯度，不会导致梯度爆炸问题，具有较为稳健性的解。
在这里插入图片描述

可以看到L1 loss，与L2 loss都纯在缺陷，因此出现了smooth L1 loss

完美避开了L1 loss与 L2 loss的缺陷
具有优势：
当预测框与 ground truth 差别过大时，梯度值不至于过大；
当预测框与 ground truth 差别很小时，梯度值足够小
这边我们选用了smooth L1 loss作为边框回归的损失函数

3.5.3 Hard negative mining:

值得注意的是，一般情况下negative prior bboxes数量 >> positive prior bboxes数量，直接训练会导致网络过于重视负样本，预测效果很差。为了保证正负样本尽量平衡，我们这里使用SSD使用的在线难例挖掘策略(hard negative mining)，即依据confidience loss对属于负样本的prior bbox进行排序，只挑选其中confidience loss高的bbox进行训练，将正负样本的比例控制在positive：negative=1:3。其核心作用就是只选择负样本中容易被分错类的困难负样本来进行网络训练，来保证正负样本的平衡和训练的有效性。

举个例子：假设在这 441 个 prior bbox 里，经过匹配后得到正样本先验框P个，负样本先验框 441−P 个。将负样本prior bbox按照prediction loss从大到小顺序排列后选择最高的M个prior bbox。这个M需要根据我们设定的正负样本的比例确定，比如我们约定正负样本比例为1:3时。我们就取M=3P，这M个loss最大的负样本难例将会被作为真正参与计算loss的prior bboxes，其余的负样本将不会参与分类损失的loss计算。

3.5.4 小结

本小节介绍的内容围绕如何进行训练展开，主要是3块：

先验框与GT框的匹配策略
损失函数计算
难例挖掘
这3部分是需要结合在一起理解，我们再整个梳理下计算loss的步骤

1. 先验框与GT框的匹配

按照我们介绍的方案，为每个先验框都分配好类别，确定是正样本还是负样本。

2. 计算loss

按照我们定义的损失函数计算分类loss 和目标框回归loss

负样本不计算目标框的回归loss

3. 难例挖掘

上面计算的loss中分类loss的部分还不是最终的loss

因为负样本先验框过多，我们要按一定的预设比例，一般是1:3，将loss最高的那部分负样本先验框拿出来，其余的负: 样本忽略，重新计算分类loss

完整loss计算过程的代码见model.py中的 MultiBoxLoss 类。

共勉：这一小节是整个章节中最难理解，也是代码最难啃的部分，坚持就是胜利～

loss函数代码分析

对MultiBoxLoss类解释

class MultiBoxLoss(nn.Module):
    """
    The loss function for object detection.
    对于Loss的计算，完全遵循SSD的定义，即 MultiBox Loss
    This is a combination of:
    (1) a localization loss for the predicted locations of the boxes.
    (2) a confidence loss for the predicted class scores.
    """

    def __init__(self, priors_cxcy, threshold=0.5, neg_pos_ratio=3, alpha=1.):
        super(MultiBoxLoss, self).__init__()
        self.priors_cxcy = priors_cxcy #create_prior_boxes()函数的返回值 (441, 4)是归一化过的，框为(cx,cy,w,h)形式
        self.priors_xy = cxcy_to_xy(priors_cxcy) #转为441个[x1,y1,x2,y2]
        self.threshold = threshold
        self.neg_pos_ratio = neg_pos_ratio
        self.alpha = alpha

        self.smooth_l1 = nn.L1Loss()
        self.cross_entropy = nn.CrossEntropyLoss(reduce=False) 
      


    def forward(self, predicted_locs, predicted_scores, boxes, labels):
        """
        Forward propagation.
        :param predicted_locs: predicted locations/boxes w.r.t the 441 prior boxes, a tensor of dimensions (N, 441, 4)
        :param predicted_scores: class scores for each of the encoded locations/boxes, a tensor of dimensions (N, 441, n_classes)
        :param boxes: true  object bounding boxes in boundary coordinates, a list of N tensors
        :param labels: true object labels, a list of N tensors
        :return: multibox loss, a scalar
        """
        batch_size = predicted_locs.size(0)  #N
        n_priors = self.priors_cxcy.size(0)  #441(7*7*9)
        n_classes = predicted_scores.size(2) #21（0~21，0为负样本）

        assert n_priors == predicted_locs.size(1) == predicted_scores.size(1)

        true_locs = torch.zeros((batch_size, n_priors, 4), dtype=torch.float).to(device)  # (N, 441, 4) 
        true_classes = torch.zeros((batch_size, n_priors), dtype=torch.long).to(device)  # (N, 441)

        # For each image
        for i in range(batch_size):
            n_objects = boxes[i].size(0)

            overlap = find_jaccard_overlap(boxes[i], self.priors_xy)  
            # For each prior, find the object that has the maximum overlap
            
            overlap_for_each_prior, object_for_each_prior = overlap.max(dim=0)  # (441)  

            # We don't want a situation where an object is not represented in our positive (non-background) priors -
            # 1. An object might not be the best object for all priors, and is therefore not in object_for_each_prior.
            # 2. All priors with the object may be assigned as background based on the threshold (0.5).

            # To remedy this -
            # First, find the prior that has the maximum overlap for each object.
            _, prior_for_each_object = overlap.max(dim=1)  # (N_object) 

            # Then, assign each object to the corresponding maximum-overlap-prior. (This fixes 1.)
           
            object_for_each_prior[prior_for_each_object] = torch.LongTensor(range(n_objects)).to(device)
            # To ensure these priors qualify, artificially give them an overlap of greater than 0.5. (This fixes 2.)
            overlap_for_each_prior[prior_for_each_object] = 1. 
            # Labels for each prior
            label_for_each_prior = labels[i][object_for_each_prior]  # (441) 
            # Set priors whose overlaps with objects are less than the threshold to be background (no object)
            label_for_each_prior[overlap_for_each_prior < self.threshold] = 0  # (441)

            # Store
            true_classes[i] = label_for_each_prior

            # Encode center-size object coordinates into the form we regressed predicted boxes to
            true_locs[i] = cxcy_to_gcxgcy(xy_to_cxcy(boxes[i][object_for_each_prior]), self.priors_cxcy)  # (441, 4)  
            
        # Identify priors that are positive (object/non-background)
        positive_priors = true_classes != 0  # (N, 441)bool

        # LOCALIZATION LOSS

        # Localization loss is computed only over positive (non-background) priors
        #predicted_locs是(N,441,4)，索引是二维的，因此结果是二维的（m,4）,两个二维的输入
        loc_loss = self.smooth_l1(predicted_locs[positive_priors], true_locs[positive_priors])  # (), scalar

        # Note: indexing with a torch.uint8 (byte) tensor flattens the tensor when indexing is across multiple dimensions (N & 441)
        # So, if predicted_locs has the shape (N, 441, 4), predicted_locs[positive_priors] will have (total positives, 4)

        # CONFIDENCE LOSS

        # Confidence loss is computed over positive priors and the most difficult (hardest) negative priors in each image
        # That is, FOR EACH IMAGE,
        # we will take the hardest (neg_pos_ratio * n_positives) negative priors, i.e where there is maximum loss
        # This is called Hard Negative Mining - it concentrates on hardest negatives in each image, and also minimizes pos/neg imbalance

        # Number of positive and hard-negative priors per image
        n_positives = positive_priors.sum(dim=1)  # (N)    

        # First, find the loss for all priors   predicted_scores为(N,441,21)变成(N*441,21)     true_classes(N,441)变成(N*441)
        conf_loss_all = self.cross_entropy(predicted_scores.view(-1, n_classes), true_classes.view(-1))  # (N * 441)
        conf_loss_all = conf_loss_all.view(batch_size, n_priors)  # (N, 441)


        # We already know which priors are positive
        conf_loss_pos = conf_loss_all[positive_priors]  # (sum(n_positives))

        
        # Next, find which priors are hard-negative
        # To do this, sort ONLY negative priors in each image in order of decreasing loss and take top n_hard_negatives
        conf_loss_neg = conf_loss_all.clone()  # (N, 441)
        conf_loss_neg[positive_priors] = 0.  # (N, 441), positive priors are ignored (never in top n_hard_negatives)
        conf_loss_neg, _ = conf_loss_neg.sort(dim=1, descending=True)  # (N, 441), sorted by decreasing hardness
        hardness_ranks = torch.LongTensor(range(n_priors)).unsqueeze(0).expand_as(conf_loss_neg).to(device)  # (N, 441)  将（441）升为成与conf_loss_neg一样的（N，441）
        hard_negatives = hardness_ranks < n_hard_negatives.unsqueeze(1)  # (N, 441)   #这样就可以得到多少数量的difficult负样本被选用，这里是得到mask
        conf_loss_hard_neg = conf_loss_neg[hard_negatives]  # (sum(n_hard_negatives)) #用同形状mask对已经排序的负样本损失索引，得到一维的一个负样本损失


        # As in the paper, averaged over positive priors only, although computed over both positive and hard-negative priors
        conf_loss = (conf_loss_hard_neg.sum() + conf_loss_pos.sum()) / n_positives.sum().float()  # (), scalar

        # return TOTAL LOSS
        return conf_loss + self.alpha * loc_loss

对MultiBoxLoss类分段解释

先说大致一下的程序干了什么

1.获得每个先验框（共441个）的分类标签，和obeject在每个先验框中的位置（偏移量）

按批次传入数据，
求每个先验框与object对应的iou，得到每个先验框的标签
求得每个object对应iou最大的先验框，更新这几个先验框的标签，并设置iou=1（防止后续被筛选掉）
筛选每个先验框，如果iou<0.5,就给它分类为负样本
存储得到的分类标签和偏移量
得到N批分类标签和偏移量（这是我们想要的东西，第一步结束）

2.设置loc_loss（仅对正样本进行回归）

先获得正样本数量
设置loc_loss为smooth_l1损失函数

3.设置conf_loss(仅对正样本和困难负样本进行分类)

根据正样本数量得到负样本
设置总的分类损失函数为交叉熵损失函数（后续分为负样本和正样本损失函数）
3.1直接根据正样本数和损失函数得到正样本的分类损失函数
3.2负样本的分类损失函数
对困难负样本进行挖掘:
将正样本损失设置为0（因为要排序，根据损失排序，确定困难负样本）
对负样本的分类损失函数排序
取指定数目的作为困难负样本

4.把损失函数整合

刚开始会疑惑输入的到底是何物？

def __init__(self, priors_cxcy, threshold=0.5, neg_pos_ratio=3, alpha=1.):

priors_cxcy是什么？
是create_prior_boxes()的返回值，形状为(441, 4)，框为(cx,cy,w,h)形式，是归一化过的
threshold是用来判定负样本的
neg_pos_ratio是用来困难负样本挖掘的
alpha是loc_loss和conf_loss之间的关系

def forward(self, predicted_locs, predicted_scores, boxes, labels):

predicted_locs是模型输出的编码后的boxes坐标（ $g_{cx},g_{cy},g_{w},g_{h})$ ,形状为(N, 441, 4)
predicted_scores是模型输出的每个先验框的分类，形状为(N, 441, 21)
boxes是 object的框形状为(N,n_object,4) ,但是其为list，并没有.shape
labels是object的标签，形状为(N,n_object)

匹配策略中的第一个原则

 			overlap_for_each_prior, object_for_each_prior = overlap.max(dim=0)  # (441)   
            _, prior_for_each_object = overlap.max(dim=1)  # (N_object) 
            object_for_each_prior[prior_for_each_object] = torch.LongTensor(range(n_objects)).to(device)
            overlap_for_each_prior[prior_for_each_object] = 1

首先tensor.max返回的是最大iou和索引
对(n_object,441)的在dim=0取最大值，得到(441)形状，装的是每个先验框的最大iou(与object)
对(n_object,441)的在dim=1取最大值，得(n_object)形状，装的是每个目标框最大iou(与先验框)
然后对先验框的索引用目标框的索引去取？
这是一个交集的过程，首先对dim=0最大值，是为了防止一个先验框可能会有多分类，因为一个框对应的目标只有一个。
其次对，dim=1取最大值，就是3.5.1匹配策略中的第一个原则，每个目标对应一个与之iou最大的先验
为了防止被筛选掉，因此iou设置为1

对于object_for_each_prior[prior_for_each_object] = torch.LongTensor(range(n_objects)).to(device)
在这里插入图片描述
大概是这样的步骤，我这边因为假设有两个目标分到同一个先验框（步骤2），导致步骤3中，有一个被覆盖了。
具体来说，如果多个object均与一个先验框的iou最大，那么分类序号（0~20）大的，会将前面的给覆盖。这样被覆盖的object只能选取iou第二大的了。但实际上并不存在这种情况，因为先验框数量远大于目标数量。这种情况可能会发生，但极小概率，可以忽略。

匹配策略中的第二个原则

			label_for_each_prior = labels[i][object_for_each_prior]  # (441)
            label_for_each_prior[overlap_for_each_prior < self.threshold] = 0

将阈值小于0.5的直接筛去，作为负样本

定义loc_loss损失

loc_loss = self.smooth_l1(predicted_locs[positive_priors], true_locs[positive_priors])

定义conf_loss损失

		#获得困难负样本数
		n_positives = positive_priors.sum(dim=1)  # (N)   
        n_hard_negatives = self.neg_pos_ratio * n_positives  # (N)
        #定义conf_loss总损失
        conf_loss_all = self.cross_entropy(predicted_scores.view(-1, n_classes), true_classes.view(-1))  # (N * 441)
        conf_loss_all = conf_loss_all.view(batch_size, n_priors)  # (N, 441)
        #获取conf_loss正样本的损失
    	conf_loss_pos = conf_loss_all[positive_priors]  # (sum(n_positives))
    	#获取conf_loss困难负样本的损失
        conf_loss_neg = conf_loss_all.clone()  # (N, 441)
        conf_loss_neg[positive_priors] = 0.  # (N, 441), positive priors are ignored (never in top n_hard_negatives)
        conf_loss_neg, _ = conf_loss_neg.sort(dim=1, descending=True)  # (N, 441), sorted by decreasing hardness
        hardness_ranks = torch.LongTensor(range(n_priors)).unsqueeze(0).expand_as(conf_loss_neg).to(device)  
        hard_negatives = hardness_ranks < n_hard_negatives.unsqueeze(1)  # (N, 441)   
        conf_loss_hard_neg = conf_loss_neg[hard_negatives]  # (sum(n_hard_negatives))
        # As in the paper, averaged over positive priors only, although computed over both positive and hard-negative priors
        #把正样本的损失和负样本的损失整合到一起
        conf_loss = (conf_loss_hard_neg.sum() + conf_loss_pos.sum()) / n_positives.sum().float()

在获取困难负样本中，

conf_loss_neg.sort(dim=1, descending=True)是将loss从高到低排，后续用掩膜(bool类型的)将他索引取出来

hardness_ranks = torch.LongTensor(range(n_priors)).unsqueeze(0).expand_as(conf_loss_neg).to(device) hard_negatives = hardness_ranks < n_hard_negatives.unsqueeze(1)创建掩膜
conf_loss_hard_neg = conf_loss_neg[hard_negatives]取出困难负样本的损失

最后就得到想要的损失了conf_loss + self.alpha * loc_loss

补充1：取标签如何实现的

label_for_each_prior = labels[i][object_for_each_prior]形状为(441)
在这里插入图片描述
补充2：对predicted_locs[positive_priors]实现解释