文章目录

1. KITTI Dataset Preparation
2. create_kitti_infos
3. dataset.set_split
4. dataset.get_infos
5. dataset.create_groundtruth_database

在进行openpcdet框架运行模型时，首先需要做的是对数据集的准备，这里是以kitti数据集为例。在OpenPCDet框架中也有提到如何对kitti数据集的训练进行准备处理。

1. KITTI Dataset Preparation

Please download the official KITTI 3D object detection dataset and organize the downloaded files as follows (the road planes could be downloaded from [road plane], which are optional for data augmentation in the training):
If you would like to train CaDDN, download the precomputed depth maps for the KITTI training set
NOTE: if you already have the data infos from

pcdet v0.1, you can choose to use the old infos and set the DATABASE_WITH_FAKELIDAR option in tools/cfgs/dataset_configs/kitti_dataset.yaml as True. The second choice is that you can create the infos and gt database again and leave the config unchanged.

OpenPCDet
├── data
│   ├── kitti
│   │   │── ImageSets
│   │   │── training
│   │   │   ├──calib & velodyne & label_2 & image_2 & (optional: planes) & (optional: depth_2)
│   │   │── testing
│   │   │   ├──calib & velodyne & image_2
├── pcdet
├── tools

Generate the data infos by running the following command:

python -m pcdet.datasets.kitti.kitti_dataset create_kitti_infos tools/cfgs/dataset_configs/kitti_dataset.yaml

在运行完以上命令行之后，便会出现一些利的pkl文件，包括infos_train.pkl和dbinfos_train.pkl文件。处理的入口就是kitti.dataset.py文件的create_kitti_infos函数。下面会完整的介绍这个函数的实现过程。

2. create_kitti_infos

create_kitti_infos函数主要涉及KittiDataset类的重点三个函数：dataset.set_split、dataset.get_infos、dataset.create_groundtruth_database。
在这里插入图片描述

通过set_split和get_infos，可以构造出点云每帧的相关信息，包含point_cloud、image、calib、annos，用一个字典包含这4个键值信息来表示当前场景（训练集含标注信息）。

# 构建kitti_infos_train.pkl文件
dataset.set_split(train_split)
kitti_infos_train = dataset.get_infos(num_workers=workers, has_label=True, count_inside_pts=True)
with open(train_filename, 'wb') as f:
    pickle.dump(kitti_infos_train, f)
print('Kitti info train file is saved to %s' % train_filename)

构造出来的train_infos.pkl信息如下所示：
在这里插入图片描述

而通过set_split和create_groundtruth_database可以构造整个训练集的每个类别的gt用一个列表来存储，来进后续的gt sample，也就是从其他场景采样gt通过碰撞测试后放置到当前的场景中。相当于是一个copy paste的数据增强方式。

# 构建kitti_dbinfos_train.pkl文件
dataset.set_split(train_split)
dataset.create_groundtruth_database(train_filename, split=train_split)

构造出来的dbinfos_train.pkl信息如下所示：这里是根据class names进行挑选了，kitti数据集中是拥有8个类别以供检测的。
在这里插入图片描述

下面记录一下这三个类函数具体做了什么内容。

3. dataset.set_split

这个函数的目的是为了对kittti数据集中切分出train、val、test数据集。但是由于ImageSets中以及分别对train、test、val切分好了索引，所以只需要根据对应的txt文件读取对应数据集的索引，构建成索引列表即可，随机进行后续各信息的提取与整合。

所以这里的核心代码就两行，确定好需要构建什么数据集，然后读取txt信息构建索引列表。

# 选择training、val、testing的其中一个文件
split_dir = self.root_path / 'ImageSets' / (self.split + '.txt')    # /home/lab/LLC/PointCloud/OpenPCDet/data/kitti/ImageSets/train.txt
# 获取txt文件中的内容，这里就是最后目的
self.sample_id_list = [x.strip() for x in open(split_dir).readlines()] if split_dir.exists() else None

4. dataset.get_infos

在构建好了索引列表后，就需要根据索引到各目录下整合当前点云帧的各种信息，包括图像信息、标注信息、修正矩阵等。

point_cloud

对于点云信息，这个字典只包含了点云特征维度以及当前点云帧的索引

pc_info = {
    
    'num_features': 4, 'lidar_idx': sample_idx}
info['point_cloud'] = pc_info

image

对于图像信息，包含了点云帧索引以及图像的宽高。这里的点云索引和上述point_cloud的索引是一样的

image_info = {
    
    'image_idx': sample_idx, 'image_shape': self.get_image_shape(sample_idx)} # 返回图像尺寸
info['image'] = image_info

calib

对于Calibration对象需要进行矩阵的更改。其中使用get_calib中的矩阵参数是标注的。

# calib此时包含4各key: P2/P3/R0/Tr_velo2cam
self.P2 = calib['P2']  # 3 x 4
self.R0 = calib['R0']  # 3 x 3
self.V2C = calib['Tr_velo2cam']  # 3 x 4

说明：
对于内参矩阵P0-P3，给定的txt文件中的维度大小是3x4，这里在最后一行进行填充构造成4x4的矩阵大小；外参矩阵T2V同理，也是在最后一行进行填充。
对于校准矩阵R0的维度大小是3x3，所以需要补充一行一列，同时对角线的最后数值设置为1，同样填充到4x4大小。

这样设置之后，在不改变坐标系与这些矩阵相乘的结果下，改变了他们的维度。代码实现如下所示：

# 根据索引获取Calibration对象
calib = self.get_calib(sample_idx)

# 1）内参矩阵P2：(3,4)， 在最后一行进行填充，变成(4,4)矩阵
P2 = np.concatenate([calib.P2, np.array([[0., 0., 0., 1.]])], axis=0)

# 2）校准矩阵R0：(3,3)，用0补充一行一列，构建成(4,4)矩阵，然在R0[3,3]赋值为1
R0_4x4 = np.zeros([4, 4], dtype=calib.R0.dtype)
R0_4x4[3, 3] = 1.
R0_4x4[:3, :3] = calib.R0

# 3）外参矩阵T2V：(3,4)，也是在最后一行进行填充，变成(4,4)矩阵
V2C_4x4 = np.concatenate([calib.V2C, np.array([[0., 0., 0., 1.]])], axis=0)

# 构建标定信息：P2、R0_rect和Tr_velo_to_cam矩阵
"""
    相机坐标系 = 内参矩阵 * 校准矩阵 * 外参矩阵 * 点云坐标系
    y = P2 * R0 * Tr_velo_to_cam * x
"""
calib_info = {
    
    'P2': P2, 'R0_rect': R0_4x4, 'Tr_velo_to_cam': V2C_4x4}
info['calib'] = calib_info

annos

在构建训练集和验证集时拥有label目录文件，而在构建测试集时没有label文件则无需进行。

以其中一个label的txt文件举例介绍：

Car 0.88 3 -0.69 0.00 192.37 402.31 374.00 1.60 1.57 3.23 -2.70 1.74 3.68 -1.29
Car 0.00 1 2.04 334.85 178.94 624.50 372.04 1.57 1.50 3.68 -1.17 1.65 7.86 1.90
Car 0.34 3 -1.84 937.29 197.39 1241.00 374.00 1.39 1.44 3.08 3.81 1.64 6.15 -1.31
Car 0.00 1 -1.33 597.59 176.18 720.90 261.14 1.47 1.60 3.66 1.07 1.55 14.44 -1.25
Car 0.00 0 1.74 741.18 168.83 792.25 208.43 1.70 1.63 4.08 7.24 1.55 33.20 1.95
Car 0.00 0 -1.65 884.52 178.31 956.41 240.18 1.59 1.59 2.47 8.48 1.75 19.96 -1.25
DontCare -1 -1 -10 800.38 163.67 825.45 184.07 -1 -1 -1 -1000 -1000 -1000 -10
DontCare -1 -1 -10 859.58 172.34 886.26 194.51 -1 -1 -1 -1000 -1000 -1000 -10
DontCare -1 -1 -10 801.81 163.96 825.20 183.59 -1 -1 -1 -1000 -1000 -1000 -10
DontCare -1 -1 -10 826.87 162.28 845.84 178.86 -1 -1 -1 -1000 -1000 -1000 -10

首先txt文件的每一行代表了当前点云场景的一个gt信息。首先每一行构造成一个object3d的类，然后整个txt文件构造成一个list形式，每个对象就是一个object3d类。每个gt信息就用一个object3d类来进行保存与处理。随后将列表的各类属性按属性分别保存在各属性的列表中。例如：

obj_list = self.get_label(sample_idx)
......
annotations['bbox'] = np.concatenate([obj.box2d.reshape(1, 4) for obj in obj_list], axis=0)
annotations['dimensions'] = np.array([[obj.l, obj.h, obj.w] for obj in obj_list])  # lhw(camera) format
annotations['location'] = np.concatenate([obj.loc.reshape(1, 3) for obj in obj_list], axis=0)
......

随后统计去除’DontCare‘类别的有效gt’数量以及总的gt数量，来设置一个object有效无效的查询索引列表，用-1来表示无效object，有效object正常编号，由此可以得到 index=[0,1,2,3,4,5,-1,-1,-1,-1]。随后，利用标注信息的location（N,3）、dimensions（N,3）、rotation_y（N,1）信息，构造出gt的一个信息矩阵，维度是(N, 7)。其中方向需要转换，而将物体的坐标原点有物体的地步中心转移到物体的中心位置上。

......
loc_lidar[:, 2] += h[:, 0] / 2      # 将物体的坐标原点由物体底部中心移到物体中心
gt_boxes_lidar = np.concatenate([loc_lidar, l, w, h, -(np.pi / 2 + rots[..., np.newaxis])], axis=1)    # 构造gt矩阵
annotations['gt_boxes_lidar'] = gt_boxes_lidar

在训练集和验证集信息构建中，这里还会统计每个gt内的点云数量。在统计前还会筛选出FOV视角下的点云（在数据增强处理中还会进行一次筛选）。统计操作调用了scipy的Delaunay方法来实现判断每个点是否在盒子中。随后统计好每个gt，以xyz为中心，dxdydz为尺寸大小内的点数量，构造出gt box点数量列表。

# 是否统计每个gt boxes内的点云数量
if count_inside_pts:
    points = self.get_lidar(sample_idx)   # 根据索引获取点云 [N, 4]
    calib = self.get_calib(sample_idx)    # 根据索引获取Calibration对象
    pts_rect = calib.lidar_to_rect(points[:, 0:3])      # 将lidar坐标系的点变换到rect坐标系

    # 筛选出FOV视角下的点云，在数据增强的处理中还会再进行一次筛选
    fov_flag = self.get_fov_flag(pts_rect, info['image']['image_shape'], calib)
    pts_fov = points[fov_flag]  # 根据索引提取有效点

    # gt_boxes_lidar是(N,7)  [x, y, z, dx, dy, dz, heading], (x, y, z) is the box center
    # 返回值corners_lidar为（N,8,3）
    corners_lidar = box_utils.boxes_to_corners_3d(gt_boxes_lidar)

    # num_gt是这一帧图像里物体的总个数，假设为10，
    # 则num_points_in_gt=array([-1, -1, -1, -1, -1, -1, -1, -1, -1, -1], dtype=int32)
    num_points_in_gt = -np.ones(num_gt, dtype=np.int32)
    for k in range(num_objects):    # 有效object数量
        flag = box_utils.in_hull(pts_fov[:, 0:3], corners_lidar[k])   # 判断点云是否在第k各gt bbox中，使用了scipy的库函数
        num_points_in_gt[k] = flag.sum()    # 统计当前第k各gt box的点数量
    annotations['num_points_in_gt'] = num_points_in_gt  # 添加框内点云数量信息

至此，一个点云索引的信息整合结束。在pcdet中还创建了线程池来加快处理速度，对整个索引列表的每个索引按照顺序依次进行处理，最后保存在infos列表中。info列表的每个对象就是一个点云帧场景的全部信息。

sample_id_list = sample_id_list if sample_id_list is not None else self.sample_id_list  # 对应数据集txt文件的索引
# 创建线程池，多线程异步处理，增加处理速度
with futures.ThreadPoolExecutor(num_workers) as executor:   # 多线程
    infos = executor.map(process_single_scene, sample_id_list)  # 对索引列表中依次对每个索引进行process_single_scene处理

# infos是一个列表，每一个元素代表了一帧的信息（字典）
return list(infos)

#  序列保存
with open(train_filename, 'wb') as f:
    pickle.dump(kitti_infos_train, f)

随后，将这个序列变量进行保存，随即就是kitti_infos_train.pkl文件。其他的验证集，测试集构建也是类似的。

5. dataset.create_groundtruth_database

在上述操作中实现了对数据集信息的整合。而在具体的数据处理中，涉及到了各类数据增强方法，所以这里还额外对训练集的gt进行信息整合，来实现后续的copy paste操作。

在上述完成了每个点云帧场景的信息整合后，现在就开对每个点云帧场景的gt进行各类信息的统计处理，单帧点云信息整合字典如下所示：
在这里插入图片描述

接下来，根据当前的点云索引就可以获取当前场景的全部点云特征，同时还可以在annos[‘gt_boxes_lidar’]中获取当前有效的gt信息。那么，通过roiaware_pool3d_utils.points_in_boxes_cpu函数（c++编译的工具包），可以判断当前点云场景的每个点是否在每一个gt中。具体来说，假设这里的有效gt有6，那么gt_boxes的维度为[6, 7]，假设当前的点云矩阵为：[N, 4]，那么这个函数就会返回一个[6, N]的矩阵。其中6表示有6个有效的gt box，N表示每个点是否在这个gt内。如果在gt内设置为1，不在gt内设置为0。至此，我们就可以计算出来每个gt内存在的点云数量，同时还可以获得每个gt的点云表示。

对每个点云帧场景的每个object表示都用一个bin文件来存储其点云，例如’000000_Pedestrian_0.bin’，同时对这个object的相关信息进行整合，存储到字典all_db_infos的相应类别里面。这样，字典会有8个键值对来分别存储每个类的gt点云表示以及相应信息。
在这里插入图片描述

核心代码如下所示：

infos = pickle.load(f)
......
for k in range(len(infos)):     # len(infos): 3712
    info = infos[k]
    ......
    num_obj = gt_boxes.shape[0]     # 有效object数量
    point_indices = roiaware_pool3d_utils.points_in_boxes_cpu(    # 返回每个box中的点云索引[0 0 0 1 0 1 1...]，这里与in_hull方法有点类似
        torch.from_numpy(points[:, 0:3]), torch.from_numpy(gt_boxes)
    ).numpy()  # (nboxes, npoints)
    
    # 对每个有效的object进行信息存储，一方面gt内的点信息保存在bin文件中，同时构建其info字典保存在相应类别列表中
    for i in range(num_obj):
        filename = '%s_%s_%d.bin' % (sample_idx, names[i], i)   # '000000_Pedestrian_0.bin'
        filepath = database_save_path / filename    # 存放'000000_Pedestrian_0.bin'的绝对路径
        gt_points = points[point_indices[i] > 0]    # 只保留在gt内的点，进行筛选
    
        gt_points[:, :3] -= gt_boxes[i, :3]     # 将第i个box内点转化为局部坐标
        with open(filepath, 'w') as f:          # 把gt_points的信息写入文件里
            gt_points.tofile(f)
    
        # 类别是否被选择检测，一般常用的类别是['Car', 'Pedestrian', 'Cyclist']，其他类别可能会被忽略处理
        if (used_classes is None) or names[i] in used_classes:
            db_path = str(filepath.relative_to(self.root_path))
            # 根据当前物体的信息组成info
            db_info = {
    
    'name': names[i],
                       'path': db_path,             # gt_database/xxxxx.bin
                       'image_idx': sample_idx,     # 当前点云帧index，多个gt可能共享一个index
                       'gt_idx': i,                 # gt编号
                       'box3d_lidar': gt_boxes[i],  # gt信息 (7, ) [xyz, dxdydz, heading]
                       'num_points_in_gt': gt_points.shape[0],  # gt内的点数
                       'difficulty': difficulty[i],
                       'bbox': bbox[i],     # 在图像上的2d标注框，在标注文件中可以获取 (4, )
                       'score': annos['score'][i]
            }
            # 对每个类别进行gt汇总
            if names[i] in all_db_infos:
                all_db_infos[names[i]].append(db_info)  # 如果存在该类别则追加
            else:
                all_db_infos[names[i]] = [db_info]      # 如果不存在该类别则新增
......
# 将序列信息保存，包含了每个类别的gt整合信息
with open(db_info_save_path, 'wb') as f:
    pickle.dump(all_db_infos, f)

最后将整个序列字典信息保存为kitti_dbinfos_train.pkl，就是只对训练集的gt进行统计整合。这个序列文件在gt_sampling中会进行数据增强的使用，操作是随机从整个类别中进行采样一定数量的gt放置到当前场景中进行额外的gt补充，前提是需要进行碰撞测试，不与原本当前场景的gt有重叠。

至此，在运行完python -m pcdet.datasets.kitti.kitti_dataset create_kitti_infos tools/cfgs/dataset_configs/kitti_dataset.yaml命令后，在data/kitti/的目录下，一共会多出5个文件以及一个目录。处理完的目录结构如下所示：

kitti
├── ImageSets
│   ├── test.txt
│   ├── train.txt
│   ├── trainval.txt
│   ├── val.txt
├── testing
│   ├── calib
│   ├── image_2
│   ├── velodyne
│   ├── velodyne_reduced (optional)
├── training
│   ├── calib
│   ├── image_2
│   ├── label_2
│   ├── velodyne
│   ├── velodyne_reduced (optional)
│   ├── planes (optional)
├── kitti_gt_database
│   ├── xxxxx.bin
│   ├── xxxxx.bin
├── kitti_infos_train.pkl
├── kitti_infos_val.pkl
├── kitti_dbinfos_train.pkl
├── kitti_infos_test.pkl
├── kitti_infos_trainval.pkl

OpenPCDet系列 | 4.1KITTI数据集各文件信息整合与gt database的构建