cs231n_目标检测

1. 图像任务：

分类、语义分割、目标检测、实例分割

2. 目标检测任务：

在2D的图像中，画框，框出目标物体。因此既有分类任务，也有定位的任务。

输入：图像 HxWx3

输出：带类别的2D框，cx,cy,width,height, class

挑战：1. 输出不定数量的bbox

2. 需要输出类别和定位尺寸

3. 输入图像尺寸分辨率一般会比较大（相对于纯分类任务）

2.1 单目标检测任务

如果确定某个样本中只有一个目标，那就直接分别预测类别，回归bbox的位置和尺寸，通过multi task的加权loss去优化网络。

分类用ce/loss

定位用 l1/l2 loss

2.2 多目标检测任务_滑动窗口

滑动的提取图像的小的区域，然后针对这个小区域做图像分类任务。

待检测区域数量太多，可能性的尺寸太多，每一个都要输入分类网络进行计算，不现实。

2.3 Region-Based CNN

通过图像传统算法，获得一定数量的推荐区域；

然后在这些推荐区域上进行分类任务；

推荐的区域尺寸不是很准，同时进行一个regression任务，调整推荐区域的尺寸。

2.4 Fast-RCNN

RCNN的优化，特征提取通过backbone实现，selective search和之前一样在raw image上获得区域，这些区域会提取backbone的feature，然后在这些带feature的区域通过轻量级的convnet进行后续的任务。

如何区域性的提取feature

可以通过简单的pooling，把region对应的特征塞到对应的grid里

通过插值的方式，不是通过pooling的方式

2.5 Faster-RCNN

Fast-RCNN的进一步优化，去除了selective search的工作，利用网络获取roi区域。

RPN，region propose network

通过anchor的方式生成结果，每个位置放置预设的，不同大小、不同比例的anchor。

anchor是否是object。通过focal loss来约束。

gt box相对于anchor的偏移。通过l1/l2来约束。

2.6 single stage detector

大趋势

2.7 Mask-RCNN

2.8 CornetNet

2.9 CenterNet

2.9 Yolo series

3. 常用辅助算法

3.1 IOU for axis aligned bbox

from collections import namedtuple
import numpy as np
import cv2


def bb_intersection_over_union(boxA, boxB):
    # determine the (x, y)-coordinates of the intersection rectangle
    xA = max(boxA[0], boxB[0])
    yA = max(boxA[1], boxB[1])
    xB = min(boxA[2], boxB[2])
    yB = min(boxA[3], boxB[3])

    # compute the area of intersection rectangle
    # 交集
    interArea = (xB - xA + 1) * (yB - yA + 1)

    # compute the area of both the prediction and ground-truth rectangles
    # 并集
    boxAArea = (boxA[2] - boxA[0] + 1) * (boxA[3] - boxA[1] + 1)
    boxBArea = (boxB[2] - boxB[0] + 1) * (boxB[3] - boxB[1] + 1)

    # compute the intersection over union by taking the intersection
    # area and dividing it by the sum of prediction + ground-truth
    # areas - the interesection area
    iou = interArea / float(boxAArea + boxBArea - interArea)

    # return the intersection over union value
    return iou


# define the `Detection` object
Detection = namedtuple("Detection", ["image_path", "gt", "pred"])

# define the list of example detections
examples = [
    Detection("dog.png", [39, 63, 203, 112], [54, 66, 198, 114]),
    Detection("dog.png", [49, 75, 203, 125], [42, 78, 186, 126]),
    Detection("dog.png", [31, 69, 201, 125], [18, 63, 235, 135]),
    Detection("dog.png", [50, 72, 197, 121], [54, 72, 198, 120]),
    Detection("dog.png", [35, 51, 196, 110], [36, 60, 180, 108])]

# loop over the example detections
for detection in examples:
    # load the image
    image = cv2.imread(detection.image_path)

    # draw the ground-truth bounding box along with the predicted
    # bounding box
    cv2.rectangle(image, tuple(detection.gt[:2]),
                  tuple(detection.gt[2:]), (0, 255, 0), 2)
    cv2.rectangle(image, tuple(detection.pred[:2]),
                  tuple(detection.pred[2:]), (0, 0, 255), 2)

    # compute the intersection over union and display it
    iou = bb_intersection_over_union(detection.gt, detection.pred)
    cv2.putText(image, "IoU: {:.4f}".format(iou), (10, 30),
                cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 255, 0), 2)
    print("{}: {:.4f}".format(detection.image_path, iou))

    # show the output image
    cv2.imshow("Image", image)
    cv2.waitKey(0)

3.2 IOU for rotated bbox

Rotated IoU 计算_旋转iou_THE@JOKER的博客-CSDN博客

问了chatgpt，

import math
from shapely.geometry import Polygon

def get_corners(rbbox):
    x, y, w, h, theta = rbbox
    xmin = x - w/2
    xmax = x + w/2
    ymin = y - h/2
    ymax = y + h/2
    cos_theta = math.cos(theta)
    sin_theta = math.sin(theta)
    x1 = x + (xmin - x) * cos_theta + (ymin - y) * sin_theta
    y1 = y - (xmin - x) * sin_theta + (ymin - y) * cos_theta
    x2 = x + (xmax - x) * cos_theta + (ymin - y) * sin_theta
    y2 = y - (xmax - x) * sin_theta + (ymin - y) * cos_theta
    x3 = x + (xmax - x) * cos_theta + (ymax - y) * sin_theta
    y3 = y - (xmax - x) * sin_theta + (ymax - y) * cos_theta
    x4 = x + (xmin - x) * cos_theta + (ymax - y) * sin_theta
    y4 = y - (xmin - x) * sin_theta + (ymax - y) * cos_theta
    return [(x1, y1), (x2, y2), (x3, y3), (x4, y4)]

def intersection_area(rbbox1, rbbox2):
    corners1 = get_corners(rbbox1)
    corners2 = get_corners(rbbox2)
    poly1 = Polygon(corners1)
    poly2 = Polygon(corners2)
    intersection = poly1.intersection(poly2)
    if intersection.is_empty:
        return 0
    else:
        return intersection.area

def union_area(rbbox1, rbbox2):
    corners1 = get_corners(rbbox1)
    corners2 = get_corners(rbbox2)
    poly1 = Polygon(corners1)
    poly2 = Polygon(corners2)
    union = poly1.union(poly2)
    return union.area - intersection_area(rbbox1, rbbox2)

def iou(rbbox1, rbbox2):
    return intersection_area(rbbox1, rbbox2) / union_area(rbbox1, rbbox2)

rbbox1 = (100, 100, 200, 100, math.pi/4)
rbbox2 = (120, 120, 150, 80, math.pi/6)
iou_value = iou(rbbox1, rbbox2)
print(iou_value)

3.3 NMS

因为有很多的重叠的bbox，很多都是冗余的，需要去除，大体的思路怎么做？

选取某个类别，所有类别下bbox排序，

计算所有bbox之间的iou，

从最大的score，开始遍历bbox，去除和这个bbox的iou大于某个nms threshold的bbox

直到所有的都遍历完成

问题：实际场景目标很多，的确有很多的bbox，会去除正确的bbox

python code from chatgpt

def nms(boxes, scores, threshold):
    """Apply Non-Maximum Suppression (NMS) to a set of boxes given their corresponding scores.

    Args:
        boxes (numpy.ndarray): A numpy array of shape `(N, 4)` containing the coordinates of N bounding boxes. 
            Each row is of the form `[x1, y1, x2, y2]` where `(x1, y1)` and `(x2, y2)` are the top-left and 
            bottom-right corners of the bounding box, respectively.
        scores (numpy.ndarray): A numpy array of shape `(N,)` containing the corresponding confidence scores 
            for each bounding box.
        threshold (float): A float value specifying the overlap threshold for suppression.

    Returns:
        numpy.ndarray: A numpy array of indices of the selected bounding boxes after NMS.

    """
    # 1. Initialize an empty list to store the selected bounding boxes
    selected_boxes = []

    # 2. Sort the bounding boxes by their corresponding scores in descending order
    order = np.argsort(scores)[::-1]

    # 3. Loop over the sorted indices while there are still indices left in the order array
    while order.size > 0:
        # 4. Get the index of the box with the highest score (i.e., the first index in the order array)
        index = order[0]

        # 5. Add the corresponding box to the selected boxes list
        selected_boxes.append(index)

        # 6. Compute the overlap of the remaining boxes with the selected box using the compute_overlap function
        overlap = compute_overlap(boxes[index], boxes[order[1:]])

        # 7. Get the indices of the boxes that overlap with the selected box more than the threshold
        indices = np.where(overlap > threshold)[0] + 1

        # 8. Remove the selected box and the overlapping boxes from the order array
        order = np.delete(order, indices)

    # 9. Return the indices of the selected boxes after NMS
    return np.array(selected_boxes)

3.4 mAP计算

1. 预测目标检测结果

2. 根据score，对结果进行排序

3. 从score最大的box开始遍历，计算当前的precision和recall，保存到list中

4. 计算AP

5. 计算mAP水电费

def calculate_precision_recall(gt_boxes, pred_boxes, iou_thresh=0.5):
    # Initialize true positives, false positives, and false negatives
    tp = 0
    fp = 0
    fn = len(gt_boxes)
    
    # Match predictions to ground truth
    for pred_box in pred_boxes:
        best_iou = 0
        best_gt_box = None
        for gt_box in gt_boxes:
            iou = calculate_iou(gt_box, pred_box)
            if iou > best_iou:
                best_iou = iou
                best_gt_box = gt_box
        if best_iou >= iou_thresh:
            # Match found
            tp += 1
            fp -= 1
            fn -= 1
            gt_boxes.remove(best_gt_box)
        else:
            # No match found
            fp += 1
    
    # Calculate precision and recall
    precision = tp / (tp + fp)
    recall = tp / (tp + fn)
    
    return precision, recall

def calculate_ap(gt_boxes, pred_boxes, iou_thresh=0.5):
    # Sort predictions by confidence score in descending order
    pred_boxes = sorted(pred_boxes, key=lambda x: x[4], reverse=True)
    
    # Calculate precision and recall for each prediction
    precisions = []
    recalls = []
    for i, pred_box in enumerate(pred_boxes):
        precision, recall = calculate_precision_recall(gt_boxes.copy(), pred_boxes[:i+1], iou_thresh=iou_thresh)
        precisions.append(precision)
        recalls.append(recall)
    
    # Calculate average precision using the 11-point method
    ap = 0
    for t in np.arange(0, 1.1, 0.1):
        if len([recall for recall in recalls if recall >= t]) == 0:
            p = 0
        else:
            p = max([precision for precision, recall in zip(precisions, recalls) if recall >= t])
        ap += p / 11
    
    return ap

def calculate_mAP(gt_boxes_list, pred_boxes_list, iou_thresh=0.5):
    # Calculate AP for each class
    aps = []
    for c in range(len(gt_boxes_list)):
        gt_boxes = gt_boxes_list[c]
        pred_boxes = pred_boxes_list[c]
        ap = calculate_ap(gt_boxes, pred_boxes, iou_thresh=iou_thresh)
        aps.append(ap)

3.5 Focal Loss

在目标检测任务中，分类分支用于判定某个anchor和这center是否是前景，是哪种前景色。如果简单的使用cross entropy loss，会导致模型更偏向去学习数量众多的简单的背景任务，反而在数量少的前景任务上表现不佳。为了平衡这种难易、数量少多的样本分布，引入focal loss。

FL(p_t) = - (1 - p_t)^gamma * log(p_t)

where p_t is the predicted probability of the true class (either foreground or background), and gamma is the focusing parameter. When gamma is set to 0, the focal loss becomes equivalent to the standard cross-entropy loss.

class FocalLoss(nn.Module):
    def __init__(self, gamma=2, alpha=None, reduction='mean'):
        super(FocalLoss, self).__init__()
        self.gamma = gamma
        self.alpha = alpha
        self.reduction = reduction

    def forward(self, inputs, targets):
        BCE_loss = F.binary_cross_entropy_with_logits(inputs, targets, reduction='none')
        pt = torch.exp(-BCE_loss)
        focal_loss = ((1-pt)**self.gamma * BCE_loss)
        
        if self.alpha is not None:
            alpha = self.alpha.to(inputs.device)
            focal_loss = alpha * focal_loss
        
        if self.reduction == 'mean':
            return torch.mean(focal_loss)
        elif self.reduction == 'sum':
            return torch.sum(focal_loss)
        else:
            return focal_loss

1. 图像任务：

2. 目标检测任务：

3. 常用辅助算法

猜你喜欢