概述
本博客主要是基于Faster R-CNN的pytorch版本。见https://github.com/jwyang/faster-rcnn.pytorch
一、了解anchor
Faster R-CNN 是一种anchor-based的双阶段目标检测算法,下面先了解一下锚框anchor机制。
区域候选网络(RPN:region proposal network)网络是在Faster R-CNN中被提出的,它代替了Fast R-CNN的选择性搜索,大大提高了网络的速度。
一开始的anchor是一系列固定的候选框,在实验当中,可以设置锚框的宽高比以及面积的大小。然后就会再每一个特征图上生成不同种比例的锚框,比例种数取决于宽高比的种数与面积种数的乘积。如果宽高比为{1:1、1:2、2:1}且有三种,尺寸大小的面积,那么其示意图如下图所示。然后利用不同种比例的初始锚框在特征图上进行滑动窗口的操作,特征图上的每个点都会有不同种比例的初始锚框,就像一个暴力列举,大多数能够将实际的框给包含覆盖。
args.set_cfgs = ['ANCHOR_SCALES', '[8, 16, 32]', 'ANCHOR_RATIOS', '[0.5,1,2]', 'MAX_NUM_GT_BOXES', '20']
self.anchor_scales = cfg.ANCHOR_SCALES
self.anchor_ratios = cfg.ANCHOR_RATIOS
len(self.anchor_scales) * len(self.anchor_ratios)则为每个特征点的anchor数目,即3*3=9个。
二、相关程序函数
(1)faster_rcnn.py里面的类class _fasterRCNN(nn.Module),仅列举部分相关语句。
rois, rpn_loss_cls, rpn_loss_bbox = self.RCNN_rpn(base_feat, im_info, gt_boxes, num_boxes) #区域候选网络RPN
roi_data = self.RCNN_proposal_target(rois, gt_boxes, num_boxes) #第三次筛选
rois, rois_label, rois_target, rois_inside_ws, rois_outside_ws = roi_data
pooled_feat = self.RCNN_roi_pool(base_feat, rois.view(-1,5))
cls_score = self.RCNN_cls_score(pooled_feat)
(2)self.RCNN_rpn对应的区域候选网络rpn.py里class _RPN(nn.Module)类。
# define proposal layer
self.RPN_proposal = _ProposalLayer(self.feat_stride, self.anchor_scales, self.anchor_ratios)
rois = self.RPN_proposal((rpn_cls_prob.data, rpn_bbox_pred.data,
im_info, cfg_key),target=target) #第一次筛选
# define anchor target layer
self.RPN_anchor_target = _AnchorTargetLayer(self.feat_stride, self.anchor_scales, self.anchor_ratios)
rpn_data = self.RPN_anchor_target((rpn_cls_score.data, gt_boxes, im_info, num_boxes)) #第二次筛选
return rois, self.rpn_loss_cls, self.rpn_loss_box
最后,
三、self.RPN_proposal的第一次筛选机制
RPN提供RoIs(region of interests)给Faster RCNN的RoIHead(第二阶段)作为训练样本。RPN生成RoIs的过程(_ProposalLayer
)如下:
- 对于每张图片,利用它的feature map, 计算 (H/16)× (W/16)×9(大概20000)个anchor属于前景的概率,以及对应的位置参数。
- 选取概率较大的12000个anchor
- 利用回归的位置参数,修正这12000个anchor的位置,得到RoIs
- 利用非极大值(Non-maximum suppression, NMS)抑制,选出概率最大的2000个RoIs(指的是前景分类的概率)
注意:在inference的时候,为了提高处理速度,12000和2000分别变为6000和300.
注意:这部分的操作不需要进行反向传播,因此可以利用numpy/tensor实现。
RPN的输出:RoIs(形如2000×4或者300×4的tensor)
四、self.RPN_anchor_target的第二次筛选机制(训练)
RPN做的事情就是利用(_AnchorTargetLayer
)将20000个左右候选的anchor选出256个anchor进行第一阶段的前景/背景分类和回归位置。选择过程如下:
- 对于每一个ground truth bounding box (
gt_bbox
),选择和它重叠度(IoU)最高的一个anchor作为正样本 - 对于剩下的anchor,从中选择和任意一个
gt_bbox
重叠度超过0.7的anchor,作为正样本,正样本的数目不超过128个。 - 随机选择和
gt_bbox
重叠度小于0.3的anchor作为负样本。负样本和正样本的总数为256。
class _RPN(nn.Module):
""" region proposal network """
def __init__(self, din):
super(_RPN, self).__init__()
self.din = din # get depth of input feature map, e.g., 512
self.anchor_scales = cfg.ANCHOR_SCALES
self.anchor_ratios = cfg.ANCHOR_RATIOS
self.feat_stride = cfg.FEAT_STRIDE[0]
# define the convrelu layers processing input feature map
self.RPN_Conv = nn.Conv2d(self.din, 512, 3, 1, 1, bias=True)
# define bg/fg classifcation score layer
self.nc_score_out = len(self.anchor_scales) * len(self.anchor_ratios) * 2 # 2(bg/fg) * 9 (anchors)
self.RPN_cls_score = nn.Conv2d(512, self.nc_score_out, 1, 1, 0)
# define anchor box offset prediction layer
self.nc_bbox_out = len(self.anchor_scales) * len(self.anchor_ratios) * 4 # 4(coords) * 9 (anchors)
self.RPN_bbox_pred = nn.Conv2d(512, self.nc_bbox_out, 1, 1, 0)
# define proposal layer
self.RPN_proposal = _ProposalLayer(self.feat_stride, self.anchor_scales, self.anchor_ratios)
# define anchor target layer
self.RPN_anchor_target = _AnchorTargetLayer(self.feat_stride, self.anchor_scales, self.anchor_ratios)
self.rpn_loss_cls = 0
self.rpn_loss_box = 0
@staticmethod
def reshape(x, d):
input_shape = x.size()
x = x.view(
input_shape[0],
int(d),
int(float(input_shape[1] * input_shape[2]) / float(d)),
input_shape[3]
)
return x
def forward(self, base_feat, im_info, gt_boxes, num_boxes):
batch_size = base_feat.size(0)
# return feature map after convrelu layer
rpn_conv1 = F.relu(self.RPN_Conv(base_feat), inplace=True) #### 首先特征都经过了RPN卷积层
# get rpn classification score
rpn_cls_score = self.RPN_cls_score(rpn_conv1)
rpn_cls_score_reshape = self.reshape(rpn_cls_score, 2)
rpn_cls_prob_reshape = F.softmax(rpn_cls_score_reshape, 1)
rpn_cls_prob = self.reshape(rpn_cls_prob_reshape, self.nc_score_out)
# get rpn offsets to the anchor boxes
rpn_bbox_pred = self.RPN_bbox_pred(rpn_conv1) #### 修正框的过程
# proposal layer
cfg_key = 'TRAIN' if self.training else 'TEST'
rois = self.RPN_proposal((rpn_cls_prob.data, rpn_bbox_pred.data,
im_info, cfg_key)) #### 上面的第三个步骤
self.rpn_loss_cls = 0
self.rpn_loss_box = 0
# generating training labels and build the rpn loss
if self.training: ####
assert gt_boxes is not None
rpn_data = self.RPN_anchor_target((rpn_cls_score.data, gt_boxes, im_info, num_boxes)) #### 训练时,需要按第四部分筛选数据进行监督学习
# compute classification loss
rpn_cls_score = rpn_cls_score_reshape.permute(0, 2, 3, 1).contiguous().view(batch_size, -1, 2)
rpn_label = rpn_data[0].view(batch_size, -1)
rpn_keep = Variable(rpn_label.view(-1).ne(-1).nonzero().view(-1))
rpn_cls_score = torch.index_select(rpn_cls_score.view(-1,2), 0, rpn_keep)
rpn_label = torch.index_select(rpn_label.view(-1), 0, rpn_keep.data)
rpn_label = Variable(rpn_label.long())
self.rpn_loss_cls = F.cross_entropy(rpn_cls_score, rpn_label) ####
fg_cnt = torch.sum(rpn_label.data.ne(0))
rpn_bbox_targets, rpn_bbox_inside_weights, rpn_bbox_outside_weights = rpn_data[1:]
# compute bbox regression loss
rpn_bbox_inside_weights = Variable(rpn_bbox_inside_weights)
rpn_bbox_outside_weights = Variable(rpn_bbox_outside_weights)
rpn_bbox_targets = Variable(rpn_bbox_targets)
self.rpn_loss_box = _smooth_l1_loss(rpn_bbox_pred, rpn_bbox_targets, rpn_bbox_inside_weights,
rpn_bbox_outside_weights, sigma=3, dim=[1,2,3]) ####
return rois, self.rpn_loss_cls, self.rpn_loss_box
五、self.RCNN_proposal_target的第三次筛选机制(训练)
RPN只是给出了2000个候选框,这2000个RoIs不是都拿去训练。利用_ProposalTargetLayer
挑选出128个sample_rois, 然后使用了RoIPooling 将这些不同尺寸的区域全部pooling到同一个尺度(7×7)上。
选择的规则如下:
- RoIs和gt_bboxes 的IoU大于0.5的,选择一些(比如32个)。
- 选择 RoIs和gt_bboxes的IoU小于等于0(或者0.1)的选择一些(比如 128-32=96个)作为负样本。
if fg_num_rois > 0 and bg_num_rois > 0:
# sampling fg
fg_rois_per_this_image = min(fg_rois_per_image, fg_num_rois)
# torch.randperm seems has a bug on multi-gpu setting that cause the segfault.
# See https://github.com/pytorch/pytorch/issues/1868 for more details.
# use numpy instead.
#rand_num = torch.randperm(fg_num_rois).long().cuda()
rand_num = torch.from_numpy(np.random.permutation(fg_num_rois)).type_as(gt_boxes).long()
fg_inds = fg_inds[rand_num[:fg_rois_per_this_image]]
# sampling bg
bg_rois_per_this_image = rois_per_image - fg_rois_per_this_image
# Seems torch.rand has a bug, it will generate very large number and make an error.
# We use numpy rand instead.
#rand_num = (torch.rand(bg_rois_per_this_image) * bg_num_rois).long().cuda()
rand_num = np.floor(np.random.rand(bg_rois_per_this_image) * bg_num_rois)
rand_num = torch.from_numpy(rand_num).type_as(gt_boxes).long()
bg_inds = bg_inds[rand_num]
elif fg_num_rois > 0 and bg_num_rois == 0:
# sampling fg
#rand_num = torch.floor(torch.rand(rois_per_image) * fg_num_rois).long().cuda()
rand_num = np.floor(np.random.rand(rois_per_image) * fg_num_rois)
rand_num = torch.from_numpy(rand_num).type_as(gt_boxes).long()
fg_inds = fg_inds[rand_num]
fg_rois_per_this_image = rois_per_image
bg_rois_per_this_image = 0
elif bg_num_rois > 0 and fg_num_rois == 0:
# sampling bg
#rand_num = torch.floor(torch.rand(rois_per_image) * bg_num_rois).long().cuda()
rand_num = np.floor(np.random.rand(rois_per_image) * bg_num_rois)
rand_num = torch.from_numpy(rand_num).type_as(gt_boxes).long()
bg_inds = bg_inds[rand_num]
bg_rois_per_this_image = rois_per_image
fg_rois_per_this_image = 0
else:
raise ValueError("bg_num_rois = 0 and fg_num_rois = 0, this should not happen!")
参考:
从编程实现角度学习Faster R-CNN(附极简实现) - 知乎 (zhihu.com)
Pytorch版Faster R-CNN 源码分析+方法流程详解
最后
以上就是苗条板凳为你收集整理的一文整理Faster R-CNN容易混淆的三次anchor筛选机制(creator)一、了解anchor二、相关程序函数三、self.RPN_proposal的第一次筛选机制四、self.RPN_anchor_target的第二次筛选机制(训练)五、self.RCNN_proposal_target的第三次筛选机制(训练)的全部内容,希望文章能够帮你解决一文整理Faster R-CNN容易混淆的三次anchor筛选机制(creator)一、了解anchor二、相关程序函数三、self.RPN_proposal的第一次筛选机制四、self.RPN_anchor_target的第二次筛选机制(训练)五、self.RCNN_proposal_target的第三次筛选机制(训练)所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复