目标检测—7 RetinaNet1 主干网络2 数据处理3 训练4 预测5 模型评价

186 阅读 0 评论 123 点赞

我是靠谱客的博主壮观发带，这篇文章主要介绍目标检测—7 RetinaNet1 主干网络2 数据处理3 训练4 预测5 模型评价，现在分享给大家，希望可以做个参考。

RetinaNet

1 主干网络
2 数据处理
3 训练
4 预测
5 模型评价

1 主干网络

主干网络：RestNet50 + FPN + (cls,reg)

RestNet50: 对Inputs用ConvBlock和IdentityBlock进行5次下采样，输出3个特征层
FPN: 对上一步输入的特征层上采样得到5个特征层。
cls , reg:根据上一步的结果分别进行回归和分类。

2 数据处理

把数据分为训练集、验证集、测试集

 1.0 设置xml地址和处理数据的存放地址
2.1 得到所有的xml文件
2.2 根据xml文件 和 数据集比例得到各类数据集的下标
3.1 数据集存放地址
3.2 把图片名写入指定文件文
3.3 关闭文件，释放空间。

读取.jpg、.xml文件

1 打开数据集集及最终包含图片.txt文件和.xml的文件；
2 遍历图片名，根据图片名读取图片地址和框及类别信息。

3 训练

'''
1 model = retinanet.resnet_retinanet(NUM_CLASSES,inputs)
2 priors = get_anchors(model)
3 bbox_util = BBoxUtility(NUM_CLASSES, priors)
4 设置训练集和验证集数量、训练参数设置
5 model.compile()、model.fit_generator()
6 制作标签 gen = Generator(bbox_util, BATCH_SIZE, lines[:num_train], lines[num_train:],
(input_shape[0], input_shape[1]),NUM_CLASSES)
7 损失函数 smooth_l1()、focal()
'''

4 预测

''' predict.py
1 导入模型
2 输入图片地址
3 打开图片,如果没有打开，就报错
4 预测
5 显示预测结果
6 函数导图：
6.1 retinanet = Retinanet()
# 实例化对象
6.2 r_image = retinanet.detect_image(image)
# 检测目标
6.2.1 crop_img,x_offset,y_offset = letterbox_image(image, [self.model_image_size[0],self.model_image_size[1]])
# [600,600,3]
图像加灰条
6.2.2 photo = preprocess_input(np.reshape(photo,[1,self.model_image_size[0],self.model_image_size[1],self.model_image_size[2]]))
# 图片归一化
6.2.3 preds = self.retinanet_model.predict(photo)
# 预测
6.2.4 self.prior = self._get_prior() # 得到先验框
6.2.5 results = self.bbox_util.detection_out(preds,self.prior,confidence_threshold=self.confidence)
# 解码
6.2.6 筛选出其中得分高于confidence的框[box_4,conf_1,label_1]
6.2.7 boxes = retinanet_correct_boxes(top_ymin,top_xmin,top_ymax,top_xmax,np.array([self.model_image_size[0],self.model_image_size[1]]),image_shape)
# 去掉灰条
??????
6.2.8 画图
'''

4_(6.2.1) 图像加灰条

# 6.2.1 图像加灰条：通过图像加灰条，所有的图像都统一到相同尺寸(600,600,3)，防止图片失真
crop_img,x_offset,y_offset = letterbox_image(image, [self.model_image_size[0],self.model_image_size[1]])
'''
inputs :image图片(1330,1330,3)； self.model_image_size[0]=self.model_image_size[1]=600
outputs:crop_img,x_offset,y_offset
基本思想：计算目标图片和原图片宽高的最小比例，最小比例*原图片尺寸=新图片尺寸，新图片与目标图片差异的尺寸用灰条代替。
1. 根据目标尺寸和原图片尺寸计算得到新图片尺寸
2. 双线性插值法重新设置图片大小
3. 生成一张灰度图
4. 把新图粘在灰度图上就生成目标图片
5. 目标图片和新图片宽高的偏移比
'''

4_(6.2.2) 图片归一化：


'''
1. /127.5-1
2. (x-mean)/std
'''

4_(6.2.3) 预测

#

# preds = self.retinanet_model.predict(photo)
# 预测 RestNet50 + FPN + cls,reg+ anchor + 
'''
retinanet_model = retinanet.resnet_retinanet(self.num_classes,inputs)
RestNet50 + FPN + (cls,reg)
Input:[-1,600,600,3]
Outputs: [] []
1. RestNet50
2. FPN
3. (cls,reg)
'''

######################## RestNet50 #######################


1 RestNet50_body
Inputs:[-1,600,600,3]--->Outputs:[y0,y1,y2,y3][150,150,256],[75,75,512],[38,38,1024],[19,19,2048]
inputs[600,600,3]-->
Z(3,3)C(64,(7,7),(2,2))-->[300,300,64]
BA('relu')M((3,3),(2,2))[150,150,64]-->
convblock+identityblock*2[150,150,256]-->
convblock+identityblock*3[75,75,512]-->
convblock+identityblock*5[38,38,1024]-->
convblock+identityblock*2[19,19,2048]
2 convblock 瓶颈结构+调整特征图大小
input_tensor-->CBA(1*1)-->CBA(3*3)-->CB(1*1)+CB(input_tensor)-->A
3 identityblock 瓶颈结构+调整特征图数量
input_tensor-->CBA(1*1)-->CBA(3*3)-->CB(1*1)+(input_tensor)-->A
'''

######################## FPN #############################

inputs: C3, C4, C5 = [150,150,256],[75,75,512],[38,38,1024],[19,19,2048]
outputs:[P3, P4, P5, P6, P7] = [75,75,256],[30,30,256],[19,19,256],[10,10,256],[5,5,256]
c5[19,19,2048]-->conv2d*2-->p5 [19,19,256]
c5-->conv2d(stride=2)-->p6 [10,10,256]
p6-->conv2d(stride=2)-->p7 [5,5,256]
conv(C4)+upsample(c5)==Add1-->conv2d-->p4 [30,30,256]
upsample(Add1)+ conv2d(C3)-->conv2d-->p3 [75,75,256]

######################### cls,reg ########################

inputs:[P3, P4, P5, P6, P7] = [75,75,256],[30,30,256],[19,19,256],[10,10,256],[5,5,256]
outputs_regressions: [-1,4]
outputs_classifications : [-1,cls]
inputs--> conv2d*4(256)-->conv2d(num_anchors * 4)-->reshape(-1,4)
inputs--> conv2d*4(256)-->conv2d(num_classes * num_anchors)-->reshape(-1,num_classes)
regression_model = make_last_layer_loc(num_classes,num_anchors)
classification_model = make_last_layer_cls(num_classes,num_anchors)

4_(6.2.4) 先验框 self.prior = self._get_prior()

'''
1. get_anchors(model)
# 得到框的边长
# (scales*base)**2--> areas
--> sqrt(areas /ratios) -->anchors[:,2]
# anchors[:,2]*ratios--> anchors[:,3]--> anchors
2. shift(shape, stride, anchors)
#
# shape+stride-->shifts --> shifts+anchors --> shifted_anchors
3. generate_anchors(base_size=16, ratios=None, scales=None) # 遍历所有的特征层，计算对应的先验框
'''

4_(6.2.5) 解码 results = self.bbox_util.detection_out(preds,self.prior,confidence_threshold=self.confidence)

# detection_out(preds,self.prior,confidence_threshold=self.confidence)
'''
detection_out(preds,self.prior,confidence_threshold=self.confidence)
1. decode_bbox = self.decode_boxes(mbox_loc[i], mbox_priorbox)
(1) 获得先验框的宽与高
(2) 根据公式获取真实框的左上角与右下角
2. 根据预测概率的阈值，筛选框、概率、类别
3. 对种类进行循环可以帮助我们对每一个类分别进行非极大抑制。
(1) 按照概率对detection[框4，概率1、类别1]从大到小排列。
(2) 把detection第一个数据添加到best_box，计算detection[1;]的框与第一个框的iou,保留iou小于阈值的框
(3) 持续以上步骤，直到detection的数量等于1
'''

# 筛选出其中得分高于confidence的框
'''
1. conf >= self.confidence
'''

# 去掉灰条
'''
boxes = retinanet_correct_boxes(top_ymin,top_xmin,top_ymax,top_xmax,np.array([self.model_image_size[0],self.model_image_size[1]]),image_shape)
1. 计算new_shape、offset、scale
2. 对框按照offset、scale进行放缩
'''