深度学习之语义分割-SegNet

77 阅读 0 评论 51 点赞

我是靠谱客的博主超级楼房，最近开发中收集的这篇文章主要介绍深度学习之语义分割-SegNet，觉得挺不错的，现在分享给大家，希望可以做个参考。

概述

This core trainable segmentation engine consists of an encoder network, a corresponding decoder network followed by a pixel-wise classification layer.

模型

说明：
- 基础模型采用VGG16
  - 去掉fc层，使得encoder网络更小，更易训练
  - 134M –> 14.7M
- decoder网络将encoder网络中的低层像素映射到整张图像尺寸
  - decoder网络与encoder网络基本完全对成
  - 最终，对每一个像素进行multi-class soft-max分类
- decoder网络进行上采样的采用pool indices
  - 没有增加参数：稀疏，且不需要参与训练
  - 降低内存：decoder不用存储encoder中的输出结果
  - 提升了边界的描绘能力
  - 该网络结构可以拓展到任意的encoder-decoder网络结构
效果：
注意
- Max-pooling可以提高平移不变形：
  - Max-pooling is used to achieve translation invariance over small spatial shifts in the input image.
- 降采样是的feature层上的每一个点对应原图上一块很大的区域
  - Sub-sampling results in a large input image context (spatial window) for each pixel in the feature map.
- 多次的max-pooling和降采样虽然能够提高模型的分类能力，但是缺丢失了图像中的空间边界能力
  - While several layers of max-pooling and sub-sampling can achieve more translation invariance for robust classification correspondingly there is loss of spatial resolution of the feature maps.
- 特征层上的空间位置关系对于分割任务非常重要
  - The increasingly lossy (boundary detail) image representation is not beneficial for segmentation where boundary delineation is vital

效果分析

实验1:

对比不同的decoder模型

说明
- 采用双线性插值进行上采样，固定参数【不参与学习】
- 采用max-pooling indices进行上采样【不参与学习】
- 采用双线性插值进行上采样，参数参与学习
  - 双线性插值进行初始化
- SegNet-Basic采用类似FCN的decoder方式
  - 4个encoder，4个decoder
  - upsample上采样采用下采样downsample的indices
  - encoder／decoder上，每一个conv之后接一个BN操作。
  - 对于decoder网络，conv中没有采用ReLU非线性激活函数和biases偏置
  - 采用7x7卷积核，则VGG layer4的感受野为106x106
  - decoder卷积的filter个数与对一个的encoder卷积filter个数相同
- SegNet-SingleChannelDecoder
  - decoder卷积核个数位1
- FCN-Basic-NoDimReduction
  - 最终的维度和对应的encoder相对应
- 结论还是FCN-Basic-NoDimReduction的效果最好