前言
CamVid
数据集是由剑桥大学公开发布的城市道路场景的数据集,它包括 701 张精准标注的图片用于语义分割,如果要使用CamVid
数据集做目标检测,就需要bbox
标签,本文提供了根据 CamVid
语义标签提取 bbox
标签的代码,方便后续做目标检测模型训练。
Camvid 数据集简介
CamVid
数据集是由剑桥大学公开发布的城市道路场景的数据集。CamVid
全称:The Cambridge-driving Labeled Video Database,它是第一个具有目标类别语义标签的视频集合。数据集包括 701 张精准标注的图片用于语义分割模型训练,可分为训练集、验证集、测试集。
数据集官方下载地址:CamVid Dataset (cam.ac.uk)
数据示例如下:
类别标签链接:CamVid ClassLabel
数据库提供32个ground truth
语义标签,各个类别占比如下图:
Camvid
语义标签为RGB
图片,各个类别对应的颜色name_color_dict
如下:
name_color_dict={ 'Animal': [64, 128, 64],
'Archway': [192, 0, 128],
'Bicyclist': [0, 128, 192],
'Bridge': [0, 128, 64],
'Building': [128, 0, 0],
'Car': [64, 0, 128],
'CartLuggagePram': [64, 0, 192],
'Child': [192, 128, 64],
'Column_Pole': [192, 192, 128],
'Fence': [64, 64, 128],
'LaneMkgsDriv': [128, 0, 192],
'LaneMkgsNonDriv': [192, 0, 64],
'Misc_Text': [128, 128, 64],
'MotorcycleScooter': [192, 0, 192],
'OtherMoving': [128, 64, 64],
'ParkingBlock': [64, 192, 128],
'Pedestrian': [64, 64, 0],
'Road': [128, 64, 128],
'RoadShoulder': [128, 128, 192],
'Sidewalk': [0, 0, 192],
'SignSymbol': [192, 128, 128],
'Sky': [128, 128, 128],
'SUVPickupTruck': [64, 128, 192],
'TrafficCone': [0, 0, 64],
'TrafficLight': [0, 64, 64],
'Train': [192, 64, 128],
'Tree': [128, 128, 0],
'Truck_Bus': [192, 128, 192],
'Tunnel': [64, 0, 64],
'VegetationMisc': [192, 192, 0],
'Void': [0, 0, 0],
'Wall': [64, 192, 0],
}
语义标签转成bbox标签
选择需要提取的类别 names = ['Pedestrian', 'Car', 'Truck_Bus']
,根据 camvid
语义分割标签结果提取目标检测bbox
标签的代码如下:
import cv2
import torch
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
def mask_to_2D(label):
""" return: semantic_map -> [H, W] """
color_list = list(name_color_dict.values())
semantic_map = np.zeros(label.shape[:-1])
for index, color in enumerate(color_list):
equality = np.equal(label, color)
class_map = np.all(equality, axis=-1)
semantic_map[class_map] = index
return semantic_map
def draw_box(img, boxes, colors):
""" plots one bounding box on image img """
for box, color in zip(boxes, colors):
cv2.rectangle(img, (box[0], box[1]), (box[2], box[3]), color, thickness=2, lineType=cv2.LINE_AA)
plt.imshow(img)
plt.axis('off')
plt.show()
def get_bbox(label_file, names):
""" get bbox from semantic label """
# convert RGB mask to 2D mask
mask = np.array(Image.open(label_file))
mask_2D = mask_to_2D(mask)
mask_to_save = np.zeros_like(mask_2D)
# instances are encoded as different colors
obj_ids = np.unique(mask_2D)
# split the color-encoded mask into a set of binary masks
masks = mask_2D == obj_ids[:, None, None]
# get bounding box coordinates for each mask
num_objs = len(obj_ids)
boxes, colors = [], []
for i in range(num_objs):
id = obj_ids[i]
name = list(name_color_dict.keys())[int(id)]
if name in names:
binary = masks[i].astype(np.int8)
num_labels, labels = cv2.connectedComponents(binary, connectivity=8, ltype=cv2.CV_16U)
for id_label in range(1, num_labels):
temp_mask = labels == id_label
pos = np.where(temp_mask)
xmin = np.min(pos[1])
xmax = np.max(pos[1])
ymin = np.min(pos[0])
ymax = np.max(pos[0])
# filter result by setting threshold of width and hegith: 20
if (xmax - xmin) > 20 and (ymax - ymin) > 20:
boxes.append([xmin, ymin, xmax, ymax])
color = list(name_color_dict.values())[int(id)]
colors.append(color)
mask_to_save[pos] = id_label
# draw mask and bbox
draw_box(mask, boxes, colors)
if __name__ == '__main__':
names = ['Pedestrian', 'Car', 'Truck_Bus']
label_file = "camvid/labels/0001TP_006690_L.png"
label = np.array(Image.open(label_file))
get_bbox(label_file, names)
根据语义标签提取bbox
结果如下,有效提取了行人、车辆bbox
的目标框:
得到bbox
标签之后,就可以使用camvid
数据集同时做语义分割和目标检测了。
参考
(1) CamVid 官网: http://mi.eng.cam.ac.uk/research/projects/VideoRec/CamVid/
(2) Segmentation and Recognition Using Structure from Motion Point Clouds, ECCV 2008
(3) Semantic Object Classes in Video: A High-Definition Ground Truth Database