均值偏移和凸轮偏移
本小节我们主要学习如何使用均值偏移和凸轮偏移来对视频中的物体进行追踪
一、均值偏移
meanshift 背后的思想很简单。 考虑有一组点。 (它可以是像直方图反投影这样的像素分布)。 您有一个小窗口(可能是一个圆圈),您必须将该窗口移动到最大像素密度(或最大点数)的区域。 如下图所示:
初始窗口以蓝色圆圈显示,名称为“C1”。 它的原始中心用蓝色矩形标记,命名为“C1_o”。 但是,如果您在该窗口内找到点的质心,您将得到点“C1_r”(标记为蓝色小圆圈),它是窗口的真实质心。 他们肯定不匹配。 所以移动你的窗口,使新窗口的圆圈与之前的质心匹配。 再次找到新的质心。 很可能,它不会匹配。 所以再次移动它,并继续迭代,使窗口的中心及其质心落在同一位置(或在一个小的期望误差内)。 所以最后你得到的是一个具有最大像素分布的窗口。 它标有一个绿色圆圈,名为“C2”。 正如您在图像中看到的,它具有最大的点数。 整个过程在下面的静态图像上演示:
所以我们通常传递直方图反投影图像和初始目标位置。 当物体运动时,很明显运动会反映在直方图的反投影图像中。 结果,meanshift算法将我们的窗口移动到具有最大密度的新位置。
1. OpenCV中的均值偏移
要在 OpenCV 中使用 meanshift,首先我们需要设置目标,找到它的直方图,以便我们可以在每一帧上对目标进行反向投影以计算 meanshift。 我们还需要提供窗口的初始位置。 对于直方图,这里只考虑色调。 此外,为了避免由于低光照导致的错误值,使用 cv.inRange() 函数丢弃低光照值。
- 用法如下:cv.inRange( src, lowerb, upperb[, dst] ) -> dst
import numpy as np
import cv2 as cv
import argparse
# parser = argparse.ArgumentParser(description='This sample demonstrates the meanshift algorithm. \
# The example file can be downloaded from: \
# https://www.bogotobogo.com/python/OpenCV_Python/images/mean_shift_tracking/slow_traffic_small.mp4')
# parser.add_argument('image',type=str, help='path to image file')
# args = parser.parse_args(args = [])
# cap = cv.VideoCapture(args.image)
cap = cv.VideoCapture('slow_traffic_small.mp4')
# take first frame of the video
ret,frame = cap.read()
# setup initial location of window
x, y, w, h = 300, 200, 100, 50 # simply hardcoded the values
track_window = (x, y, w, h)
# set up the ROI for tracking
roi = frame[y:y+h, x:x+w]
hsv_roi = cv.cvtColor(roi, cv.COLOR_BGR2HSV)
mask = cv.inRange(hsv_roi, np.array((0., 60.,32.)), np.array((180.,255.,255.)))
roi_hist = cv.calcHist([hsv_roi],[0],mask,[180],[0,180])
cv.normalize(roi_hist,roi_hist,0,255,cv.NORM_MINMAX)
# Setup the termination criteria, either 10 iteration or move by at least 1 pt
term_crit = ( cv.TERM_CRITERIA_EPS | cv.TERM_CRITERIA_COUNT, 10, 1 )
while(1):
ret, frame = cap.read()
if ret == True:
hsv = cv.cvtColor(frame, cv.COLOR_BGR2HSV)
dst = cv.calcBackProject([hsv],[0],roi_hist,[0,180],1)
# apply meanshift to get the new location
ret, track_window = cv.meanShift(dst, track_window, term_crit)
# Draw it on image
x,y,w,h = track_window
img2 = cv.rectangle(frame, (x,y), (x+w,y+h), 255,2)
cv.imshow('img2',img2)
k = cv.waitKey(30) & 0xff
if k == 27:
break
else:
break
cap.release()
cv.destroyAllWindows()
复制代码
二、凸轮偏移
你仔细看最后的结果了吗? 这儿存在一个问题。 无论汽车离相机很远还是很近,我们的窗户总是有相同的尺寸。 这是不好的。 我们需要根据目标的大小和旋转来调整窗口大小。 那么解决办法也有,来自“OpenCV Labs”,它被称为 CAMshift(Continuously Adaptive Meanshift),由 Gary Bradsky 在他的论文“Computer Vision Face Tracking for Use in a Perceptual User Interface”于 1998 年发表。 它首先应用均值偏移。 一旦meanshift收敛,它将窗口的大小更新为 。 它还计算最佳拟合椭圆的方向。 它再次使用新的缩放搜索窗口和先前的窗口位置应用均值偏移。 该过程继续进行,直到满足所需的精度。
1. OpenCV中的凸轮偏移
import numpy as np
import cv2 as cv
import argparse
# parser = argparse.ArgumentParser(description='This sample demonstrates the camshift algorithm. \
# The example file can be downloaded from: \
# https://www.bogotobogo.com/python/OpenCV_Python/images/mean_shift_tracking/slow_traffic_small.mp4')
# parser.add_argument('image', type=str, help='path to image file')
# args = parser.parse_args()
# cap = cv.VideoCapture(args.image)
cap = cv.VideoCapture('slow_traffic_small.mp4')
# take first frame of the video
ret,frame = cap.read()
# setup initial location of window
x, y, w, h = 300, 200, 100, 50 # simply hardcoded the values
track_window = (x, y, w, h)
# set up the ROI for tracking
roi = frame[y:y+h, x:x+w]
hsv_roi = cv.cvtColor(roi, cv.COLOR_BGR2HSV)
mask = cv.inRange(hsv_roi, np.array((0., 60.,32.)), np.array((180.,255.,255.)))
roi_hist = cv.calcHist([hsv_roi],[0],mask,[180],[0,180])
cv.normalize(roi_hist,roi_hist,0,255,cv.NORM_MINMAX)
# Setup the termination criteria, either 10 iteration or move by at least 1 pt
term_crit = ( cv.TERM_CRITERIA_EPS | cv.TERM_CRITERIA_COUNT, 10, 1 )
while(1):
ret, frame = cap.read()
if ret == True:
hsv = cv.cvtColor(frame, cv.COLOR_BGR2HSV)
dst = cv.calcBackProject([hsv],[0],roi_hist,[0,180],1)
# apply camshift to get the new location
ret, track_window = cv.CamShift(dst, track_window, term_crit)
# Draw it on image
pts = cv.boxPoints(ret)
pts = np.int0(pts)
img2 = cv.polylines(frame,[pts],True, 255,2)
cv.imshow('img2',img2)
k = cv.waitKey(30) & 0xff
if k == 27:
break
else:
break
cap.release()
cv.destroyAllWindows()
复制代码
三、补充资料
- French Wikipedia page on Camshift. (The two animations are taken from there)
- Bradski, G.R., "Real time face and object tracking as a component of a perceptual user interface," Applications of Computer Vision, 1998. WACV '98. Proceedings., Fourth IEEE Workshop on , vol., no., pp.214,219, 19-21 Oct 1998
四、练习
#!/usr/bin/env python
'''
Camshift tracker
================
This is a demo that shows mean-shift based tracking
You select a color objects such as your face and it tracks it.
This reads from video camera (0 by default, or the camera number the user enters)
[1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.14.7673
Usage:
------
camshift.py [<video source>]
To initialize tracking, select the object with mouse
Keys:
-----
ESC - exit
b - toggle back-projected probability visualization
'''
# Python 2/3 compatibility
from __future__ import print_function
import sys
PY3 = sys.version_info[0] == 3
if PY3:
xrange = range
import numpy as np
import cv2 as cv
# local module
# 需要将video.py模块放置在同目录下进行引用
import video
from video import presets
class App(object):
def __init__(self, video_src):
self.cam = video.create_capture(video_src, presets['cube'])
_ret, self.frame = self.cam.read()
cv.namedWindow('camshift')
cv.setMouseCallback('camshift', self.onmouse)
self.selection = None
self.drag_start = None
self.show_backproj = False
self.track_window = None
def onmouse(self, event, x, y, flags, param):
if event == cv.EVENT_LBUTTONDOWN:
self.drag_start = (x, y)
self.track_window = None
if self.drag_start:
xmin = min(x, self.drag_start[0])
ymin = min(y, self.drag_start[1])
xmax = max(x, self.drag_start[0])
ymax = max(y, self.drag_start[1])
self.selection = (xmin, ymin, xmax, ymax)
if event == cv.EVENT_LBUTTONUP:
self.drag_start = None
self.track_window = (xmin, ymin, xmax - xmin, ymax - ymin)
def show_hist(self):
bin_count = self.hist.shape[0]
bin_w = 24
img = np.zeros((256, bin_count*bin_w, 3), np.uint8)
for i in xrange(bin_count):
h = int(self.hist[i])
cv.rectangle(img, (i*bin_w+2, 255), ((i+1)*bin_w-2, 255-h), (int(180.0*i/bin_count), 255, 255), -1)
img = cv.cvtColor(img, cv.COLOR_HSV2BGR)
cv.imshow('hist', img)
def run(self):
while True:
_ret, self.frame = self.cam.read()
if not _ret:
break
vis = self.frame.copy()
hsv = cv.cvtColor(self.frame, cv.COLOR_BGR2HSV)
mask = cv.inRange(hsv, np.array((0., 60., 32.)), np.array((180., 255., 255.)))
if self.selection:
x0, y0, x1, y1 = self.selection
hsv_roi = hsv[y0:y1, x0:x1]
mask_roi = mask[y0:y1, x0:x1]
hist = cv.calcHist( [hsv_roi], [0], mask_roi, [16], [0, 180] )
cv.normalize(hist, hist, 0, 255, cv.NORM_MINMAX)
self.hist = hist.reshape(-1)
self.show_hist()
vis_roi = vis[y0:y1, x0:x1]
cv.bitwise_not(vis_roi, vis_roi)
vis[mask == 0] = 0
if self.track_window and self.track_window[2] > 0 and self.track_window[3] > 0:
self.selection = None
prob = cv.calcBackProject([hsv], [0], self.hist, [0, 180], 1)
prob &= mask
term_crit = ( cv.TERM_CRITERIA_EPS | cv.TERM_CRITERIA_COUNT, 10, 1 )
track_box, self.track_window = cv.CamShift(prob, self.track_window, term_crit)
if self.show_backproj:
vis[:] = prob[...,np.newaxis]
try:
cv.ellipse(vis, track_box, (0, 0, 255), 2)
except:
print(track_box)
cv.imshow('camshift', vis)
ch = cv.waitKey(10)
if ch == 27:
break
if ch == ord('b'):
self.show_backproj = not self.show_backproj
cv.destroyAllWindows()
if __name__ == '__main__':
print(__doc__)
import sys
try:
# video_src = sys.argv[1]
video_src = 'slow_traffic_small.mp4'
except:
video_src = 0
App(video_src).run()
复制代码
Camshift tracker
================
This is a demo that shows mean-shift based tracking
You select a color objects such as your face and it tracks it.
This reads from video camera (0 by default, or the camera number the user enters)
[1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.14.7673
Usage:
------
camshift.py [<video source>]
To initialize tracking, select the object with mouse
Keys:
-----
ESC - exit
b - toggle back-projected probability visualization
复制代码