前言

最近在学习区域卷积神经网络(RCNN)时，候选框产生使用了选择搜索(selective search)，为了更透彻地理解RCNN的工作原理，所以决定基于python代码，实现选择搜索(selective search)。

简介

关于选择搜索(selective search)的基本原理和初步认知，可以参考以下博客：
https://blog.csdn.net/mao_kun/article/details/50576003

在这里主要结合自己的理解作简要总结和梳理：

使用 Efficient Graph-Based Image Segmentation的方法获取原始分割区域R={r1,r2,…,rn}，具体可见我的另一篇博客：
https://blog.csdn.net/u014796085/article/details/83449972
初始化相似度集合S=∅
计算两两相邻区域之间的相似度，将其添加到相似度集合S中
从相似度集合S中找出，相似度最大的两个区域 ri 和rj，将其合并成为一个区域 rt，从相似度集合中除去原先与ri和rj相邻区域之间计算的相似度，计算rt与其相邻区域（原先与ri或rj相邻的区域）的相似度，将其结果添加的到相似度集合S中。同时将新区域 rt 添加区域集合R中。
重复步骤5，直到S=∅，即最后一个新区域rt为整幅图像。
获取R中每个区域的Bounding Boxes，去除像素数量小于2000，以及宽高比大于1.2的，剩余的框就是物体位置的可能结果L

代码实现与解读

图像初步分割

def _generate_segments(img_path, neighbor, sigma, scale, min_size):  
    # open the Image
    im_mask = graphbased_segmentation(img_path, neighbor, sigma, scale, min_size)
    im_orig = skimage.io.imread(img_path)
    # merge mask channel to the image as a 4th channel
    im_orig = numpy.append(
        im_orig, numpy.zeros(im_orig.shape[:2])[:, :, numpy.newaxis], axis=2)
    im_orig[:, :, 3] = im_mask

    return im_orig

对原图像作图像分割，把分割的每个像素所属区域的编号作为图像的第4通道。

区域相似度的定义

def _calc_colour_hist(img):
    """
        calculate colour histogram for each region

        the size of output histogram will be BINS * COLOUR_CHANNELS(3)

        number of bins is 25 as same as [uijlings_ijcv2013_draft.pdf]

        extract HSV
    """

    BINS = 25
    hist = numpy.array([])

    for colour_channel in (0, 1, 2):

        # extracting one colour channel
        c = img[:, colour_channel]

        # calculate histogram for each colour and join to the result
        hist = numpy.concatenate(
            [hist] + [numpy.histogram(c, BINS, (0.0, 255.0))[0]])

    # L1 normalize
    hist = hist / len(img)

    return hist


def _calc_texture_gradient(img):
    """
        calculate texture gradient for entire image

        The original SelectiveSearch algorithm proposed Gaussian derivative
        for 8 orientations, but we use LBP instead.

        output will be [height(*)][width(*)]
    """
    ret = numpy.zeros((img.shape[0], img.shape[1], img.shape[2]))

    for colour_channel in (0, 1, 2):
        ret[:, :, colour_channel] = skimage.feature.local_binary_pattern(
            img[:, :, colour_channel], 8, 1.0)

    return ret


def _calc_texture_hist(img):
    """
        calculate texture histogram for each region

        calculate the histogram of gradient for each colours
        the size of output histogram will be
            BINS * ORIENTATIONS * COLOUR_CHANNELS(3)
    """
    BINS = 10

    hist = numpy.array([])

    for colour_channel in (0, 1, 2):

        # mask by the colour channel
        fd = img[:, colour_channel]

        # calculate histogram for each orientation and concatenate them all
        # and join to the result
        hist = numpy.concatenate(
            [hist] + [numpy.histogram(fd, BINS, (0.0, 1.0))[0]])

    # L1 Normalize
    hist = hist / len(img)

    return hist

_calc_colour_hist(img)，计算图像的颜色直方图，用于计算两个区域的颜色相似度。
_calc_texture_gradient(img)，计算图像的纹理梯度，用于计算其纹理直方图。
_calc_texture_hist(img)，计算纹理直方图，用来计算两个区域的纹理相似度。

def _sim_colour(r1, r2):
    """
        calculate the sum of histogram intersection of colour
    """
    # return sum([min(a, b) for a, b in zip(r1["hist_c"], r2["hist_c"])])
    return sum([1 if a==b else 1-float(abs(a - b))/max(a, b) for a, b in zip(r1["hist_c"], r2["hist_c"])])/len(r1)



def _sim_texture(r1, r2):
    """
        calculate the sum of histogram intersection of texture
    """
    # return sum([min(a, b) for a, b in zip(r1["hist_t"], r2["hist_t"])])
    return sum([1 if a==b else 1-float(abs(a - b))/max(a, b) for a, b in zip(r1["hist_t"], r2["hist_t"])])/len(r1)


def _sim_size(r1, r2, imsize):
    """
        calculate the size similarity over the image
    """
    return 1.0 - (r1["size"] + r2["size"]) / imsize


def _sim_fill(r1, r2, imsize):
    """
        calculate the fill similarity over the image
    """
    bbsize = (
        (max(r1["max_x"], r2["max_x"]) - min(r1["min_x"], r2["min_x"]))
        * (max(r1["max_y"], r2["max_y"]) - min(r1["min_y"], r2["min_y"]))
    )
    return 1.0 - (bbsize - r1["size"] - r2["size"]) / imsize


def _calc_sim(r1, r2, imsize):
    return (_sim_colour(r1, r2) + _sim_texture(r1, r2)
            + _sim_size(r1, r2, imsize) + _sim_fill(r1, r2, imsize))

计算区域r1，r2的颜色相似度、纹理相似度、大小相似度、吻合相似度，将几种相似度结合到一起，得到r1、r2综合相似度。具体可以参见简介中第一篇博客。

创建区域词典R

def _extract_regions(img):
    # 创建字典
    R = {}
    # get hsv image
    hsv = skimage.color.rgb2hsv(img[:, :, :3])

    # pass 1: count pixel positions
    # 遍历img中所有的元素，y为索引，i为一个（r,g,b,l）
    for y, i in enumerate(img):
        for x, (r, g, b, l) in enumerate(i):
            # initialize a new region
            if l not in R:
                R[l] = {
                    "min_x": 0xffff, "min_y": 0xffff,
                    "max_x": 0, "max_y": 0, "labels": [l]}
            # bounding box
            if R[l]["min_x"] > x:
                R[l]["min_x"] = x
            if R[l]["min_y"] > y:
                R[l]["min_y"] = y
            if R[l]["max_x"] < x:
                R[l]["max_x"] = x
            if R[l]["max_y"] < y:
                R[l]["max_y"] = y

    # pass 2: calculate texture gradient
    tex_grad = _calc_texture_gradient(img)

    # pass 3: calculate colour histogram of each region
    for k, v in list(R.items()):
        # colour histogram
        masked_pixels = hsv[:, :, :][img[:, :, 3] == k]
        R[k]["size"] = len(masked_pixels / 4)
        R[k]["hist_c"] = _calc_colour_hist(masked_pixels)
        # texture histogram
        R[k]["hist_t"] = _calc_texture_hist(tex_grad[:, :][img[:, :, 3] == k])

    return R

创建区域字典R，n为区域数。
在这里插入图片描述

创建邻接列表neighbours

def _extract_neighbours(regions):

    def intersect(a, b):
        if (a["min_x"] < b["min_x"] < a["max_x"]
                and a["min_y"] < b["min_y"] < a["max_y"]) or (
            a["min_x"] < b["max_x"] < a["max_x"]
                and a["min_y"] < b["max_y"] < a["max_y"]) or (
            a["min_x"] < b["min_x"] < a["max_x"]
                and a["min_y"] < b["max_y"] < a["max_y"]) or (
            a["min_x"] < b["max_x"] < a["max_x"]
                and a["min_y"] < b["min_y"] < a["max_y"]):
            return True
        return False

    R = list(regions.items())
    neighbours = []
    for cur, a in enumerate(R[:-1]):
        for b in R[cur + 1:]:
            if intersect(a[1], b[1]):
                neighbours.append((a, b))

    return neighbours

(1) 定义两区域相交：区域b的最小外接矩形中任意一个顶点，在区域a的最小外接矩形内部，就认为区域a和b相交。a,b相交即认为a、b相邻。
(2) 不重复遍历R中所有的一对区域，如果相交，就加入到相邻列表neighbours中。
在这里插入图片描述
${{l_{in}}}$ 为区域的编号，{…}为描述该区域的词典。

定义区域合并

def _merge_regions(r1, r2):
    new_size = r1["size"] + r2["size"]
    rt = {
        "min_x": min(r1["min_x"], r2["min_x"]),
        "min_y": min(r1["min_y"], r2["min_y"]),
        "max_x": max(r1["max_x"], r2["max_x"]),
        "max_y": max(r1["max_y"], r2["max_y"]),
        "size": new_size,
        "hist_c": (
            r1["hist_c"] * r1["size"] + r2["hist_c"] * r2["size"]) / new_size,
        "hist_t": (
            r1["hist_t"] * r1["size"] + r2["hist_t"] * r2["size"]) / new_size,
        "labels": r1["labels"] + r2["labels"]
    }
    return rt

合并区域r1、r2，生成新的区域rt，注意此时区域词典R中r1、r2对应的词典不删除、不改变，只增加新区域rt对应的词典。

选择搜索

def selective_search(
        img_path, neighbor, sigma, scale, min_size):

    # load image and get smallest regions
    # region label is stored in the 4th value of each pixel [r,g,b,(region)]
    img = _generate_segments(img_path, neighbor, sigma, scale, min_size)

    if img is None:
        return None, {}

    imsize = img.shape[0] * img.shape[1]
    R = _extract_regions(img)
    # print(R[0])

    # extract neighbouring information
    neighbours = _extract_neighbours(R)
    # print(neighbours[0])
    # calculate initial similarities
    # 创建字典
    S = {}
    for (ai, ar), (bi, br) in neighbours:
        # print(ai)
        # print(bi)
        S[(ai, bi)] = _calc_sim(ar, br, imsize)

    # hierarchal search
    while S != {}:

        # get highest similarity
        i, j = sorted(S.items(), key=lambda i: i[1])[-1][0]

        # merge corresponding regions
        t = max(R.keys()) + 1.0
        R[t] = _merge_regions(R[i], R[j])

        # mark similarities for regions to be removed
        key_to_delete = []
        for k, v in list(S.items()):
            if (i in k) or (j in k):
                # 去除这两个区域与相邻区域的相似度
                key_to_delete.append(k)

        # remove old similarities of related regions
        for k in key_to_delete:
            del S[k]

        # calculate similarity set with the new region
        # 计算合并后区域与相邻区域的相似度
        for k in [a for a in key_to_delete if a != (i, j)]:
            n = k[1] if k[0] in (i, j) else k[0]
            S[(t, n)] = _calc_sim(R[t], R[n], imsize)

    regions = []
    for k, r in list(R.items()):
        regions.append({
            'rect': (
                r['min_x'], r['min_y'],
                r['max_x'] - r['min_x'], r['max_y'] - r['min_y']),
            'size': r['size'],
            'labels': r['labels']
        })

    return img, regions

(1) 图片分割，生成原始的区域词典R。
(2) 创建相似度词典S，对所有相邻区域，计算他们的相似度：
在这里插入图片描述
sim为两个相邻区域的综合相似度。
(3) 得到相似度最高的两个相邻区域；合并这两个相邻区域，得到新区域；去除词典S中这两个区域与相邻区域的相似度，计算合并后区域与相邻区域的相似度。重复这一步，直到S为空，即最后合并得到的新区域就是整幅图像。
(4) 创建列表region，保存区域词典R中所有区域对应的词典{‘rect’: ,‘size’: ,‘labels’: }：
在这里插入图片描述

主函数

def main():

    img_path = "2.jpg"
    # loading astronaut image
    img = skimage.io.imread(img_path)

    # perform selective search
    img_lbl, regions = selective_search(
        img_path, neighbor = 8 , sigma = 0.5, scale = 200, min_size = 20)

    # 创建集合candidate
    candidates = set()
    for r in regions:
        # excluding same rectangle (with different segments)
        if r['rect'] in candidates:
            continue
        # excluding regions smaller than 2000 pixels
        if r['size'] < 2000:
            continue
        # distorted rects
        x, y, w, h = r['rect']
        if w / h > 1.2 or h / w > 1.2:
            continue
        candidates.add(r['rect'])

    # draw rectangles on the original image
    fig, ax = plt.subplots(ncols=1, nrows=1, figsize=(6, 6))
    ax.imshow(img)
    for x, y, w, h in candidates:
        print(x, y, w, h)
        rect = mpatches.Rectangle(
            (x, y), w, h, fill=False, edgecolor='red', linewidth=1)
        ax.add_patch(rect)

    plt.show()

得到列表region，存储所有旧区域和新区域对应词典，取出包含像素数量大于2000的，和宽高比小于1.2的，作为候选框。

问题

关于区域的相似度（颜色、纹理、大小、吻合），不是特别理解，需要后续加以研究。

选择搜索(selective search)python实现

选择搜索selsctive search

前言

简介