文章目录
1. 前言
主要针对 《OpenCV 3 计算机视觉 Python语言实现》 第4章 深度估计与分割 做的个人笔记。
本章并没有使用深度摄像头,主要分为三部分:极几何计算视差图(应用StereoSGBM算法) 、GrabCut算法 和 分水岭算法 。
其中,对分水岭算法理解遇到了点困难,好在已解决,特此记录。
2. 极几何计算视差图
2.1 StereoSGBM算法代码
import numpy as np
import cv2
def update():
stereo.setBlockSize(cv2.getTrackbarPos('window_size', 'disparity'))
stereo.setUniquenessRatio(cv2.getTrackbarPos('uniquenessRatio', 'disparity'))
stereo.setSpeckleWindowSize(cv2.getTrackbarPos('speckleWindowSize', 'disparity'))
stereo.setSpeckleRange(cv2.getTrackbarPos('speckleRange', 'disparity'))
stereo.setDisp12MaxDiff(cv2.getTrackbarPos('disp12MaxDiff', 'disparity'))
print('computing disparity...')
disp = stereo.compute(imgL, imgR).astype(np.float32) / 16.0
cv2.imshow('left', imgL)
cv2.imshow('right', imgR)
cv2.imshow('disp', disp)
cv2.imshow('disp', (disp - min_disp) / num_disp)
cv2.imshow('disparity', (disp - min_disp) / num_disp)
if __name__ == "__main__":
window_size = 3
min_disp = 16
num_disp = 192 - min_disp
blockSize = window_size
uniquenessRatio = 1
speckleRange = 12
speckleWindowSize = 3
disp12MaxDiff = 200
P1 = 600
P2 = 2400
imgL = cv2.imread('depth1.jpg')
imgR = cv2.imread('depth2.jpg')
imgL = cv2.resize(imgL, (800, 450))
imgR = cv2.resize(imgR, (800, 450))
print(imgL.shape)
print(imgR.shape)
cv2.namedWindow('disparity')
cv2.createTrackbar('speckleRange', 'disparity', speckleRange, 50, update)
cv2.createTrackbar('window_size', 'disparity', window_size, 10, update)
cv2.createTrackbar('speckleWindowSize', 'disparity', speckleWindowSize, 200, update)
cv2.createTrackbar('uniquenessRatio', 'disparity', uniquenessRatio, 50, update)
cv2.createTrackbar('disp12MaxDiff', 'disparity', disp12MaxDiff, 250, update)
'''
cv2.createStereoSGBM(minDisparity, numDisparities, blockSize[, P1[, P2[, disp12MaxDiff[, preFilterCap[, uniquenessRatio[, speckleWindowSize[, speckleRange[, mode]]]]]]]]) → retval
Parameters:
minDisparity – Minimum possible disparity value. Normally, it is zero but sometimes rectification algorithms can shift images, so this parameter needs to be adjusted accordingly.
numDisparities – Maximum disparity minus minimum disparity. The value is always greater than zero. In the current implementation, this parameter must be divisible by 16.
blockSize – Matched block size. It must be an odd number >=1 . Normally, it should be somewhere in the 3..11 range.
P1 – The first parameter controlling the disparity smoothness. See below.
P2 – The second parameter controlling the disparity smoothness. The larger the values are, the smoother the disparity is. P1 is the penalty on the disparity change by plus or minus 1 between neighbor pixels. P2 is the penalty on the disparity change by more than 1 between neighbor pixels. The algorithm requires P2 > P1 . See stereo_match.cpp sample where some reasonably good P1 and P2 values are shown (like 8*number_of_image_channels*SADWindowSize*SADWindowSize and 32*number_of_image_channels*SADWindowSize*SADWindowSize , respectively).
disp12MaxDiff – Maximum allowed difference (in integer pixel units) in the left-right disparity check. Set it to a non-positive value to disable the check.
preFilterCap – Truncation value for the prefiltered image pixels. The algorithm first computes x-derivative at each pixel and clips its value by [-preFilterCap, preFilterCap] interval. The result values are passed to the Birchfield-Tomasi pixel cost function.
uniquenessRatio – Margin in percentage by which the best (minimum) computed cost function value should “win” the second best value to consider the found match correct. Normally, a value within the 5-15 range is good enough.
speckleWindowSize – Maximum size of smooth disparity regions to consider their noise speckles and invalidate. Set it to 0 to disable speckle filtering. Otherwise, set it somewhere in the 50-200 range.
speckleRange – Maximum disparity variation within each connected component. If you do speckle filtering, set the parameter to a positive value, it will be implicitly multiplied by 16. Normally, 1 or 2 is good enough.
mode – Set it to StereoSGBM::MODE_HH to run the full-scale two-pass dynamic programming algorithm. It will consume O(W*H*numDisparities) bytes, which is large for 640x480 stereo and huge for HD-size pictures. By default, it is set to false .
'''
stereo = cv2.StereoSGBM_create(
minDisparity=min_disp,
numDisparities=num_disp,
blockSize=window_size,
uniquenessRatio=uniquenessRatio,
speckleRange=speckleRange,
speckleWindowSize=speckleWindowSize,
disp12MaxDiff=disp12MaxDiff,
P1=P1,
P2=P2
)
update()
cv2.waitKey(0)
cv2.destroyAllWindows()
2.2 StereoSGBM算法思路解析
1. 导入numpy 模块和 cv2 模块
2. 主代码:
给定参数初值
读入两幅图片
创建disparity窗口,并在里面创建五个滚动条
创建一个StereoSGBM实例 stereo
3. update函数:
将滚动条返回的值传给实例 stereo
调用compute方法计算视差图
2.3 个人疑惑
- 计算出disp为何要除以16?
具体原因我不确定,可能的原因是:
参考链接1. 出于精度需要,所有的视差在输出时都扩大了16倍(2^4)。 - 为什么最后要对disparity归一化?
参考链接4.在整数表示的颜色空间中,数值范围是0-255,但在浮点数表示的颜色空间中,数值范围是0-1。
3. GrabCut算法代码
import numpy as np
import cv2
from matplotlib import pyplot as plt
# 读入图片,并创建对应大小的掩膜
img = cv2.imread('statue_small.jpg')
mask = np.zeros(img.shape[:2], np.uint8)
print(img.shape)
# 创建背景、前景模型
bgdModel = np.zeros((1, 65), np.float64)
fgdModel = np.zeros((1, 65), np.float64)
# 给定分割区域,并使用GrabCut分割
rect = (100, 50, 421, 378)
cv2.grabCut(img, mask, rect, bgdModel, fgdModel, 10, cv2.GC_INIT_WITH_RECT)
# 判读掩膜,将背景置0,否则置1
mask2 = np.where((mask == 2) | (mask == 0), 0, 1).astype('uint8')
# 将掩膜变成三维与img与,得到结果
img = img*mask2[:, :, np.newaxis]
# 输出结果
plt.subplot(121), plt.imshow(cv2.cvtColor(cv2.imread('statue_small.jpg'), cv2.COLOR_BGR2RGB))
plt.title("original"), plt.xticks([]), plt.yticks([])
plt.subplot(122), plt.imshow(img)
plt.title("grabcut"), plt.xticks([]), plt.yticks([])
plt.show()
参考链接5对GrabCutGrabCut函数说明及参数介绍已经足够详细,提供的代码我也加了详细的注释,就不再单独对算法思路进行介绍了。
4. 分水岭算法
4.1 分水岭代码
本代码,是在源码的基础上,结合个人需要改进的,原创。
import numpy as np
import cv2
from matplotlib import pyplot as plt
# 读入照片并灰度化
img = cv2.imread('basil.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# 使用反二进制和OTSU进行阈值处理,大于阈值为黑,小于阈值为白
_, ret = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)
# 消除噪声
kernel = np.ones((3, 3), np.uint8)
opening = cv2.morphologyEx(ret, cv2.MORPH_OPEN, kernel, iterations = 2)
# 确定背景区域 sure background area
sure_bg = cv2.dilate(opening, kernel, iterations=3)
# 寻找确定的前景区域 Finding sure foreground area
dist_transform = cv2.distanceTransform(opening, cv2.DIST_L2, 5)
_, sure_fg = cv2.threshold(dist_transform, 0.7*dist_transform.max(), 255, 0)
# 寻找不确定区域 Finding unknown region
sure_fg = np.uint8(sure_fg)
unknown = cv2.subtract(sure_bg, sure_fg)
# 显示不同的区域
plt.figure("ret")
plt.subplot(2, 2, 1), plt.title('ret')
plt.imshow(ret, cmap='gray'), plt.axis('off')
plt.subplot(2, 2, 2), plt.title('sure_bg')
plt.imshow(sure_bg, cmap='gray'), plt.axis('off')
plt.subplot(2, 2, 3), plt.title('sure_fg')
plt.imshow(sure_fg, cmap='gray'), plt.axis('off')
plt.subplot(2, 2, 4), plt.title('unknown')
plt.imshow(unknown, cmap='gray'), plt.axis('off')
plt.show()
# 确定标记,并读取区域的个数(含背景) Marker labelling
num, markers = cv2.connectedComponents(sure_fg)
print(num)
# 将markers中的元素乘30,方便观察
markers2 = [item * 30 for item in markers]
markers2 = np.array(markers2, dtype=np.uint8)
# 所有背景区域+1 Add one to all labels so that sure background is not 0, but 1
markers = markers + 1
markers3 = [item * 30 for item in markers]
markers3 = np.array(markers3, dtype=np.uint8)
# 不确定区域置0 Now, mark the region of unknown with zero
markers[unknown == 255] = 0
markers3 = [item * 30 for item in markers]
markers3 = np.array(markers3, dtype=np.uint8)
# 使用分水岭算法执行基于标记的图像分割
markers = cv2.watershed(img, markers)
markers4 = np.array(markers, dtype=np.uint8)
# 汇总对比不同时期的markers
plt.figure("markers")
plt.subplot(2, 2, 1)
plt.imshow(markers2, cmap='gray'), plt.axis('off')
plt.subplot(2, 2, 2)
plt.imshow(markers3, cmap='gray'), plt.axis('off')
plt.subplot(2, 2, 3)
plt.imshow(markers4, cmap='gray'), plt.axis('off')
plt.subplot(2, 2, 4)
plt.imshow(markers, cmap='gray'), plt.axis('off')
plt.show()
# 区域之间的边界处为-1,绘制红色
img[markers == -1] = [255, 0, 0]
# 显示图片
plt.figure("img")
plt.imshow(img)
plt.show()
4.2 部分代码详解
参考链接7.在对照书读完分水岭算法后,突然发现此链接!
引用的内容都很精辟,后悔没提前看到此链接。
4.2.1 对灰度化的图像gray进行阈值处理
采用OTSU算法(大津法)得到阈值,然后大于阈值的点置0,其他的置1.得到ret.
_,ret=cv2.threshold(gray,0,255,cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)
4.2.2 创建标记,并打印区域的个数(含背景)
创建标记,然后在里面标记区域,我们确定的区域(前景或者背景)用不同的正整数标记出来,我们不确认的区域保持0,我们可以用 cv2.connectedComponents() 实现。其将图像背景标成0,其他目标用从1开始(1,2,3,……)的整数标记。
#确定标记,并读取区域的个数(含背景) Marker labelling
num, markers = cv2.connectedComponents(sure_fg)
print(num)
其实这就是我们的种子,水漫时会从这里漫出。
4.2.3 设置栅栏
根据未知区域unknown在markers中设置栅栏,并将背景区域加入种子区域,一起漫水
watershed漫水算法需要我们将栅栏区域设置为0,所以我们需要将markers中背景区域(原来为0,会干扰算法)设置为其他整数,解决方法:将markers整体+1.
如果背景标记为 0 不改变,那分水岭算法就会把它当成未知区域了。
markers = markers + 1
markers[unknown == 255] = 0
4.2.4 漫水并寻找栅栏
根据种子开始漫水,让水漫起来找到最后的漫出点(栅栏边界),越过这个点后各个山谷中水开始合并。注意watershed会将找到的栅栏在markers中设置为-1.
5. 结语
本文主要针对 《OpenCV 3 计算机视觉 Python语言实现》 第4章 深度估计与分割 的内容,捋一捋思路后针对自己的理解,做了一份笔记。
在写此博客期间,还有新的收获,解决了一个写博客之前存在的问题,很是lucky。
对分水岭算法理解遇到了点困难,动笔前以为理解了,结果写的过程中,发现原先的理解还是有偏差。
也许这就是我写博客的一大原因吧!一路走来,万分庆幸!
(主要是记录过往,还能装逼哈哈哈哈哈)
不过,我现在也还是好奇,为嘛背景也参与漫水?最后的漫水操作,怎么就能得到分水岭?
就此打住吧,不深究了。这些写好的函数,真的强。
哦,对了,大家七夕快乐哟~
半(程序)猿半(攻城)狮本渣渣,祝天下有情人终成眷属呀~❤
参考链接
- 双目匹配与视差计算
https://blog.csdn.net/pinbodexiaozhu/article/details/45585361 - OpenCV3-Python深度估计—基于图像
http://www.yyearth.com/article/18-06/227.html
该文还提供了StereoBM()的实现,但我并没有测试。 - cv::StereoSGBM Class Reference
https://docs.opencv.org/master/d2/d85/classcv_1_1StereoSGBM.html
Open CV 文档中对StereoSGBM的定义 - Opencv中convertTo函数
https://blog.csdn.net/liuhuicsu/article/details/70994840?fps=1&locationNum=8 - OpenCV(EmguCV)2.1新特性介绍之图像分割GrabCut(GrabCut Of OpenCV 2.1)
https://www.cnblogs.com/xrwang/archive/2010/04/27/GrabCut.html - Structural Analysis and Shape Descriptors
https://docs.opencv.org/3.1.0/d3/dc0/group__imgproc__shape.html#gac2718a64ade63475425558aa669a943a
Open CV 文档中对connectedComponents的定义 - OpenCV—分水岭算法
https://www.cnblogs.com/ssyfj/p/9278815.html
此链接包含了很多有用的链接,建议大家都打开看看。