【COCO】coco数据集分析

340 阅读 0 评论 225 点赞

我是靠谱客的博主阔达耳机，这篇文章主要介绍【COCO】coco数据集分析，现在分享给大家，希望可以做个参考。

coco数据集下载链接

训练集

http://images.cocodataset.org/zips/train2017.zip

http://images.cocodataset.org/annotations/annotations_trainval2017.zip

验证集

http://images.cocodataset.org/zips/val2017.zip

http://images.cocodataset.org/annotations/stuff_annotations_trainval2017.zip

测试集

http://images.cocodataset.org/zips/test2017.zip

http://images.cocodataset.org/annotations/image_info_test2017.zip

coco2014 :https://blog.csdn.net/u014734886/article/details/78830713?utm_source=blogxgwz0

COCO数据集是微软团队制作的一个数据集，通过这个数据集我们可以训练到神经网络对图像进行detection，classification，segmentation，captioning。具体介绍请祥见官网。

annotation格式介绍

annotainon 数据格式：

https://blog.csdn.net/zziahgf/article/details/72819043

https://blog.csdn.net/u013735511/article/details/79099483

直观查看：instances_train2014.json内容

https://blog.csdn.net/hehangjiang/article/details/79084794

Object Instance Annotations

Each instance annotation contains a series of fields, including the category id and segmentation mask of the object. The segmentation format depends on whether the instance represents a single object (iscrowd=0 in which case polygons are used) or a collection of objects (iscrowd=1 in which case RLE is used). Note that a single object (iscrowd=0) may require multiple polygons, for example if occluded. Crowd annotations (iscrowd=1) are used to label large groups of objects (e.g. a crowd of people). In addition, an enclosing bounding box is provided for each object (box coordinates are measured from the top left image corner and are 0-indexed). Finally, the categories field of the annotation structure stores the mapping of category id to category and supercategory names.

中文翻译如下：每个实例注释包含一系列字段，这些字段有category id和segmentation mask。segementation字段的格式取决于实例是代表单个物体（具体来说iscrowd=0，这时候就会用到polygon，也就是多边形）还是目标的集合体（此时iscrowd=1，会用到RLE，后面解释这个的意思）。注意到单个目标可能需要多个多边形来表示，例如在被遮挡的情况下。群体注释是用来标注目标的集合体（例如一群人）。除此之外，每个目标都会有一个封闭的外接矩形框来标记（矩形框的坐标从图像的左上角开始记录，没有索引）。最后，类别字段存储着category id到category和父级category名字的映射。

mask存储处理方式简单介绍

上面提到coco数据集使用了两种方式进行mask存储，一是polygon，一是RLE。polygon比较好理解，就是多边形嘛！RLE是什么呢？

简单点来讲，RLE是一种压缩方法，也是最容易想到的压缩方式。

举个例子：M = [0,0,0,1,1,1,1,1,1,0,0]，则M的RLE编码为[3,6,2]，当然这是针对二进制进行的编码，也是coco里面采用的。RLE远不止这样简单，我们这里并不着重讲RLE，请百度吧。

代码中注释说的

# RLE is a simple yet efficient format for storing binary masks. RLE
# first divides a vector (or vectorized image) into a series of piecewise
# constant regions and then for each piece simply stores the length of
# that piece. For example, given M=[0 0 1 1 1 0 1] the RLE counts would
# be [2 3 1 1], or for M=[1 1 1 1 1 1 0] the counts would be [0 6 1]
# (note that the odd counts are always the numbers of zeros). Instead of
# storing the counts directly, additional compression is achieved with a
# variable bitrate representation based on a common scheme called LEB128.

解释一下就是：RLE将一个二进制向量分成一系列固定长度的片段，对每个片段只存储那个片段的长度。例如M=[0 0 1 1 1 0 1]， RLE就是[2 3 1 1]；M=[1 1 1 1 1 1 0]， RLE为[0 6 1]，注意奇数位始终为0的个数。另外，也使用一个基于LEB128的通用方案的可变比特率来完成额外的压缩。