S t r a w - b e a r 算法学习

由天下分享时间：2025/2/26 21:34:21 加入收藏我要投稿点赞

语义分割的数据集

【超牛恋爱资-源】【企鹅:⒈０１б.x.９⒌⒉６】目前学术界主要有三个benchmark（数据集）用于模型训练和测试。第一个常用的数据集是Pascal VOC系列。这个系列中目前较流行的是VOC2012，Pascal Context等类似的数据集也有用到。第二个常用的数据集是Microsoft COCO。 COCO一共有80个类别，虽然有很详细的像素级别的标注，但是官方没有专门对语义分割的评测。这个数据集主要用于实例级别的分割（Instance-level Segmentation）以及图片描述Image Caption）。所以COCO数据集往往被当成是额外的训练数据集用于模型的训练。第三个数据集是辅助驾驶（自动驾驶）环境的Cityscapes，使用比较常见的19个类别用于评测。

1、Pascal Voc 2012

标准的VOC2012数据集有21个类别(包括背景)，包含:{?0=background，1=aeroplane, 2=bicycle, 3=bird, 4=boat, 5=bottle, 6=bus, 7=car , 8=cat, 9=chair, 10=cow, 11=diningtable, 12=dog, 13=horse, 14=motorbike, 15=person, 16=potted plant, 17=sheep, 18=sofa, 19=train, 20=tv-monitor，255=?'void' or unlabelled }这些比较常见的类别。VOC2012中用于分割的图片中，trainval包含2007-2011你那所有对应的图片，test只包含2008-2011年的图片。trainaug有10582张图片，trainval中有2913张图片，其中1464张用于训练，1449张用于验证，而测试集有1456张图片，测试集的label是不对外公布的，需要将预测的结果上传到Pascal Challenge比赛

的测试服务器才可以计算MIoU的值。

2、MS COCO

COCO(Common Objects in Context)是一个新的图像识别、分割和图像语义数据集，是一个大规模的图像识别、分割、标注数据集。它可以用于多种竞赛，与本领域最相关的是检测部分，因为其一部分是致力于解决分割问题的。

该竞赛包含了超过80个物体类别，分别为：['background = 0','person=1', 'bicycle=2', 'car=3', 'motorcycle=4', 'airplane=5', 'bus=6', 'train=7', 'truck=8', 'boat=9', 'traffic light=10', 'fire hydrant=11', 'stop sign=13', 'parking meter=14', 'bench=15', 'bird=16', 'cat=17', 'dog=18', 'horse=19', 'sheep=20', 'cow=21', 'elephant=22', 'bear=23', 'zebra=24', 'giraffe=25', 'backpack=27', 'umbrella=28',

'handbag=31',

'tie=32',

'suitcase=33',

'frisbee=34', 'skis=35', 'snowboard=36', 'sports ball=37', 'kite=38', 'baseball bat=39', 'baseball glove=40', 'skateboard=41', 'surfboard=42', 'tennis racket=43', 'bottle=44', 'wine glass=46', 'cup=47',

'fork=48',

'knife=49',

'spoon=50',

'bowl=51', 'orange=55',

'banana=52', 'apple=53', 'sandwich=54',

'broccoli=56', 'carrot=57', 'hot dog=58', 'pizza=59', 'donut=60', 'cake=61', 'chair=62', 'couch=63', 'potted plant=64', 'bed=65', 'dining table=67', 'toilet=70', 'tv=72', 'laptop=73', 'mouse=74', 'remote=75', 'keyboard=76', 'cell phone=77', 'microwave=78',

'oven=79', 'toaster=80', 'sink=81', 'refrigerator=82', 'book=84', 'clock=85', 'vase=86', 'scissors=87', 'teddy bear=88', 'hair drier=89', 'toothbrush=90']。

91个填充类别，分别为['banner=92', 'blanket=93', 'branch=94', 'bridge=95', 'building-other=96', 'bush=97', 'cabinet=98', 'cage=99', 'cardboard=100', 'carpet=101', 'ceiling-other=102', 'ceiling-tile=103', 'cloth=104', 'clothes=105', 'clouds=106', 'counter=107', 'cupboard=108', 'curtain=109', 'desk-stuff=110', 'dirt=111', 'door-stuff=112', 'fence=113', 'floor-marble=114', 'floor-other=115',

'floor-stone=116',

'floor-tile=117',

'floor-wood=118', 'flower=119', 'fog=120', 'food-other=121', 'fruit=122', 'furniture-other=123', 'grass=124', 'gravel=125', 'ground-other=126', 'hill=127', 'house=128', 'leaves=129', 'light=130',

'mat=131',

'metal=132',

'mirror-stuff=133',

'moss=134', 'mountain=135', 'mud=136', 'napkin=137', 'net=138', 'paper=139', 'pavement=140', 'pillow=141', 'plant-other=142', 'plastic=143', 'platform=144', 'playingfield=145', 'railing=146', 'railroad=147', 'river=148', 'road=149', 'rock=150', 'roof=151', 'rug=152', 'salad=153', 'sand=154', 'sea=155', 'shelf=156', 'sky-other=157', 'skyscraper=158', 'snow=159', 'solid-other=160', 'stairs=161', 'stone=162', 'straw=163', 'structural-other=164', 'table=165', 'tent=166', 'textile-other=167', 'towel=168',

'tree=169', 'vegetable=170',

'wall-other=173', 'wall-tile=176', 'waterdrops=179',

'wall-brick=171', 'wall-panel=174', 'wall-wood=177', 'window-blind=180',

'wall-concrete=172', 'wall-stone=175', 'water-other=178',

'window-other=181', 'wood=182', 'other=183']。提供了118287张训练图片，5000张验证图片，以及超过40670张测试图片。由于其规模巨大，目前已非常常用，对领域发展很重要。实际上，该竞赛的结果每年都会在ECCV的研讨会上与ImageNet数据集的结果一起公布。它有如下特点：

? 1）Object segmentation：物体分割 ? 2）Recognition in context ：上下文识别

? 3）Superpixel stuff segmentation：超分辨率的实物分割 ? 4）330K images (200K labeled)：33万张图片（超过20万有标记） ? 5）1.5 million object instances：150万个物体实例 ? 6）80 object categories：80个物体类别 ? 9）91 stuff categories ：91个stuff类别 ? 10）5 captions per image：每张图像5个标题

? 11）250,000 people with keypoints：25万张带关节点的人物图片 3、Cityscapes

Cityscapes数据集则是由奔驰主推，提供无人驾驶环境下的图像分割数据集，用于评估视觉算法在城区场景语义理解方面的性能。Cityscapes包含50个欧洲城市不同场景、不同背景、不同季节的街景的33类标注物体，包括：{'unlabeled'=0 ,?'ego vehicle'=1 ,?'rectification

border'=2 , 'out of roi'= 3 , 'static'=4 , 'dynamic'=5 , 'ground'=6

,'road'=7

,'sidewalk'=8

,parking'=9

,'rail

track'=10 ,'building'=11 ,'wall'=12 ,'fence'=13 , 'guard rail'=14 ,'bridge'=15 ,'tunnel'=16 ,'pole'=17 ,'polegroup'=18 , 'traffic light'=19 ,'traffic sign'=20 , 'vegetation'=21 , 'terrain'=22 'car'=26

,'sky'=23

'person'=24 ,'truck'=27

'rider'=25

, ,

'bus'=28 ,'caravan'=29 ,'trailer'=30 ,'train'=31 ,'motorcycle'=32 , 'bicycle'=33 }，但是在这33个类中，评估时只用到了19个类别，因此训练时将33个类映射为19个类，评估时需要将19个类又映射回33个类上传评估服务器。这个数据需要注册账号才能下载。Cityscapes数据集共有fine和coarse两套评测标准，前者提供5000张精细标注的图像，后者提供5000张精细标注外加20000张粗糙标注的图像，用PASCAL VOC标准的 intersection-over-union （IoU）得分来对算法性能进行评价。 5000张精细标注的图片分为训练集2975张图片，验证集有500张图片，而测试集有1525张图片，测试集不对外公布，需要将预测结果上传到评估服务器才能计算mIoU值。

4、Pascal-Context

Pascal-Context数据集是对于PASCAL-VOC 2010识别竞赛的扩展，包含了对所有训练图像的像素级别的标注。共有540个类，包括原有的20个类及由PASCAL VOC分割数据集得来的图片背景，分为三大类，分别是物体、材料以及混合物。虽然种类繁多，但是只有59个常见类是较有意