『计算机视觉』Mask-RCNN_从服装关键点检测看KeyPoints分支（待续）

原论文中提到过Mask_RCNN是可以进行关键点检测的，不过我们学习的这个工程并没有添加关键点检测分支，而有人基于本工程进行了完善Mask_RCNN_Humanpose，本文我们将简要的了解如何将关键点识别分支添加进模型，更进一步的，我们将尝试使用Mask_RCNN对实际数据进行识别。

一、数据类建立

1、关键点标注形式

回顾一下之前的数据集介绍，在非关键点检测任务中，我们需要的数据有两种：1、原始的图片文件；2、图片上每个instance的掩码；但是由于Mask_RCNN会对掩码进行一次加工，获取每个instance的坐标框，即实际上还需要：3、每个instance的坐标框。

既然这里要检测关键点，那我们就需要：4、图像的关键点标注。

key_points: num_keypoints coordinates and visibility (x,y,v)  [num_person,num_keypoints,3] of num_person

首先我们需要明确，keypoints从属于某个instance，即上面的num_person的由来（人体关键点检测为例，一个instance就是一个人），而一个instance有num_keypoints个关键点，每一个点由3个值组成：横坐标，纵坐标，状态。其中状态有三种：该类不存在此关键点，被遮挡，可见。对于COCO而言，0表示这个关键点没有标注（这种情况下x=y=v=0），1表示这个关键点标注了但是不可见（被遮挡了），2表示这个关键点标注了同时也可见。

在不同的数据集上，可能有不同的数字来表达这三个点，但是在此框架训练中，建议统一到COCO的标准，避免过多的修改model代码（主要是避免修改关键点损失函数中的代码，带来不必要的意外）。

2、服装关键点标注

有了这些基础，我们以天池的服饰关键点定位数据为例，看一看如何设计Dataset class。

具体数据说明自行查阅上面说明，本节重点在介绍Mask RCNN关键点加测思路而非数据本身，其文档如下，我们设计的Dataset class（见『计算机视觉』Mask-RCNN_训练网络其一：数据集与Dataset类）目的就是基于文档信息为网络结构输送数据。

a、服装类别和Mask RCNN

值得注意的是，Mask RCNN的分类、检测、Mask生成任务都是多分类，但是关键点识别由于其本身难度更高（一个类别有众多关键点，不同类别关键点类型之间关系不大甚至完全不同），所以建议每一个大类单独训练一个model检测其关键点，实际上pose关键点检测对应过来就是：检测person这一个类的框、Mask，以及每一个instance（每一个人）的不同部位的关键点，实际的class分类值有person和背景两个类。对应到服饰数据集，我们需要训练5次，对框应五种服装。

b、服装检测框

服装数据标注仅有关键点，但是检测框对于Mask RCNN来说是必要的，因为RPN网络需要它（RPN之后的回归网络分支可以注释掉，但是RPN是网络的主干部分，不能注释），所以我们采取Mask RCNN工程的检测框生成思路，利用关键点生成检测框，由于关键点未必在服装边缘（一般是在的），我们的检测框取大一点，尽量完全包含服装，下面的函数见utils.py脚本（暂不涉及这个函数，只是说到了贴上来而已）。

def extract_keypoint_bboxes(keypoints, image_size):
    """
    :param keypoints: [instances, keypoints_per_instance, 3]
    :param image_size: [w, h]
    :return:
    """
    bboxes = np.zeros([keypoints.shape[0], 4], dtype=np.int32)
    for i in range(keypoints.shape[0]):
        x = keypoints[i, :, 0][keypoints[i, :, 0]>0]
        y = keypoints[i, :, 1][keypoints[i, :, 1]>0]
        x1 = x.min()-10 if x.min()-10>0 else 0
        y1 = y.min()-10 if y.min()-10>0 else 0
        x2 = x.max()+11 if x.max()+11<image_size[0] else image_size[0]
        y2 = y.max()+11 if y.max()+11<image_size[1] else image_size[1]
        bboxes[i] = np.array([y1, x1, y2, x2], np.int32)
    return bboxes

c、Mask说明

服装数据是没有Mask信息的，按照Mask RCNN论文的说法，掩码使用关键点位置为1其他位置为0的形式即可，感觉不太靠谱，而在COCO数据集里（即本文参考工程Mask_RCNN_Humanpose），掩码信息使用的是人的掩码（见下图），

我在Dataset class中生成了掩码信息作为演示，在build网络中取消了Mask分支，下图摘自李沐博士的《手动学习深度学习》，可以很直观的理解我们为什么可以把Mask分支取消掉。

3、class FIDataset

正如Dataset注释所说，要想运行自己的数据集，我们首先要实现一个方法（load_shapes，根据数据集取名即可）收集原始图像、类别信息，然后实现两个方法（load_image、load_mask）分别实现获取单张图片数据、获取单张图片对应的objs的masks和classes，这样基本完成了数据集类的构建。

对于本数据集，

我们使用load_FI方法代替load_shapes，调用self.add_class和self.add_image，记录图片、类别信息
父类的load_image会去读取self.image_info中每张图片的"path"路径，载入图片，我们不必重写，保证在load_FI中录入了即可

load_mask被load_keupoints取代（Mask_RCNN_Humanpose做了这个改动，并已经捋顺了相关调用），其注释如下，我们不需要mask信息，返回None占位即可，后面需要将网络中有关Mask信息的调用注释处理掉，这里先不介绍：

"""
Returns:
key_points: num_keypoints coordinates and visibility (x,y,v)  [num_person,num_keypoints,3] of num_person
masks: A bool array of shape [height, width, instance count] with
    one mask per instance.
class_ids: a 1D array of class IDs of the instance masks, here is always equal to [num_person, 1]
"""

至此我们介绍了Dataset class的目的，下面给出实现见FI_train.py ，由于训练时需要验证集，而我截至撰文时没有实现验证集划分（用训练集冒充验证集），所以load_FI的参数train_data没有意义，更新会在github上进行，后续本文不予修改：

class FIDataset(utils.Dataset):
    """Generates the shapes synthetic dataset. The dataset consists of simple
    shapes (triangles, squares, circles) placed randomly on a blank surface.
    The images are generated on the fly. No file access required.
    """
    def load_FI(self, train_data=True):
        """Generate the requested number of synthetic images.
        count: number of images to generate.
        height, width: the size of the generated images.
        """
        if train_data:
            csv_data = pd.concat([pd.read_csv('../keypoint_data/train1.csv'),
                                  pd.read_csv('../keypoint_data/train2.csv')],
                                 axis=0,
                                 ignore_index=True  # 忽略索引表示不会直接拼接索引，会重新计算行数索引
                                )
            class_data = csv_data[csv_data.image_category.isin(['blouse'])]

        # Add classes
        self.add_class(source="FI", class_id=1, class_name='blouse')

        # Add images
        for i in range(class_data.shape[0]):
            annotation = class_data.iloc[i]
            img_path = os.path.join("../keypoint_data", annotation.image_id)
            keypoints = np.array([p.split('_')
                                  for p in class_data.iloc[i][2:]], dtype=int)[PART_INDEX[IMAGE_CATEGORY], :]
            keypoints[:, -1] += 1
            self.add_image(source="FI",
                           image_id=i,
                           path=img_path,
                           annotations=keypoints)

    def load_keypoints(self, image_id, with_mask=True):
        """
        Returns:
        key_points: num_keypoints coordinates and visibility (x,y,v)  [num_person,num_keypoints,3] of num_person
        masks: A bool array of shape [height, width, instance count] with
            one mask per instance.
        class_ids: a 1D array of class IDs of the instance masks, here is always equal to [num_person, 1]
        """
        key_points = np.expand_dims(self.image_info[image_id]["annotations"], 0)  # 已知图中仅有一个对象
        class_ids = np.array([1])

        if with_mask:
            annotations = self.image_info[image_id]["annotations"]
            w, h = image_size(self.image_info[image_id]["path"])
            mask = np.zeros([w, h], dtype=int)
            mask[annotations[:, 1], annotations[:, 0]] = 1
            return key_points.copy(), np.expand_dims(mask, -1), class_ids
        return key_points.copy(), None, class_ids