之前在调试USB camera的时候,应用走的RK的USBcameraHAL,会出现问题:USB camera不支持热拔插,在apk预览的时候,拔掉USB camera,会出现卡死,apk没返回,导致插入后下次打不开apk预览,追溯其根源原因发现是HAL在DQBUF的时候阻塞没有返回。这篇文章简单讲一下这个问题的排查解决。
(1)问题描述
USB camera插入的状态下,apk预览,此时拔掉USB camera,即断开数据传输,出现应用无返回,必须杀死进程。
(2)问题原因
追溯问题的原因是因为预览的场景下,拔掉usb camera,数据流断开,但是上层应用还是在取流,在DQBUF的操作时候阻塞了,没有返回,导致的卡死。
(3)代码追溯
使用VIDIOC_DQBUF命令调用ioctl,是应用向驱动节点取数据的ioctl,最终会调用到vb2_dqbuf函数,内核使用vb2_dqbuf函数将填充满数据的缓存从驱动中返回给应用。
下面贴一下流程:
vb2_dqbuf
/**
* vb2_dqbuf() - Dequeue a buffer to the userspace
* @q:>->---videobuf2 queue
* @b:>->---buffer structure passed from userspace to VIDIOC_DQBUF() handler
*>->---in driver
* @nonblocking: if true, this call will not sleep waiting for a buffer if no
*>->--- buffers ready for dequeuing are present. Normally the driver
*>->--- would be passing (file->f_flags & O_NONBLOCK) here
*
* Should be called from VIDIOC_DQBUF() ioctl handler of a driver.
*
* This function:
*
* #) verifies the passed buffer,
* #) calls buf_finish callback in the driver (if provided), in which
* driver can perform any additional operations that may be required before
* returning the buffer to userspace, such as cache sync,
* #) the buffer struct members are filled with relevant information for
* the userspace.
*
* The return values from this function are intended to be directly returned
* from VIDIOC_DQBUF() handler in driver.
*/
int vb2_dqbuf(struct vb2_queue *q, struct v4l2_buffer *b, bool nonblocking)
{
int ret;
if (vb2_fileio_is_active(q)) {
dprintk(1, "file io in progress\n");
return -EBUSY;
}
if (b->type != q->type) {
dprintk(1, "invalid buffer type\n");
return -EINVAL;
}
ret = vb2_core_dqbuf(q, NULL, b, nonblocking);
/*
* After calling the VIDIOC_DQBUF V4L2_BUF_FLAG_DONE must be
* cleared.
*/
b->flags &= ~V4L2_BUF_FLAG_DONE;
return ret;
}
中间的代码流程这边先省略,看下最后调用的地方:__vb2_wait_for_done_vb
/**
* __vb2_wait_for_done_vb() - wait for a buffer to become available
* for dequeuing
*
* Will sleep if required for nonblocking == false.
*/
static int __vb2_wait_for_done_vb(struct vb2_queue *q, int nonblocking)
{
for (;;) {
int ret;
/*
* 调用过STREAM_ON 这里streaming值为1
* 如何为0,说明执行了stream_off操作,就不会有数据继续产生
*/
if (!q->streaming) {
dprintk(1, "streaming off, will not wait for buffers\n");
return -EINVAL;
}
if (q->error) {
dprintk(1, "Queue in error state, will not wait for buffers\n");
return -EIO;
}
if (q->last_buffer_dequeued) {
dprintk(3, "last buffer dequeued already, will not wait for buffers\n");
return -EPIPE;
}
/*
* done_list上有buffer则跳出这个循环,继续往下走
* 对于使用了select的方式,这里应该就返回了
*/
if (!list_empty(&q->done_list)) {
/*
* Found a buffer that we were waiting for.
*/
break;
}
if (nonblocking) {
dprintk(3, "nonblocking and no buffers to dequeue, will not wait\n");
return -EAGAIN;
}
/*
* We are streaming and blocking, wait for another buffer to
* become ready or for streamoff. Driver's lock is released to
* allow streamoff or qbuf to be called while waiting.
*/
call_void_qop(q, wait_prepare, q);
/*
* All locks have been released, it is safe to sleep now.
*/
dprintk(3, "will sleep waiting for buffers\n");
/*
* wait_event_interruptible(wq, condition)
* 对于condition来说
* condition = 0 休眠
* condition = 1 唤醒
* 前提是wake_up_interruptible唤醒后,进一步才是condition
*/
ret = wait_event_interruptible(q->done_wq,
!list_empty(&q->done_list) || !q->streaming ||
q->error);
/*
* We need to reevaluate both conditions again after reacquiring
* the locks or return an error if one occurred.
*/
call_void_qop(q, wait_finish, q);
if (ret) {
dprintk(1, "sleep was interrupted\n");
return ret;
}
}
return 0;
}
阻塞的时候就是卡在了wait_event_interruptible,这里一直没有返回,实际出现的问题的场景下是因为上层再取数据的时候,底层数据发生了异常,导致buf队列没有数据,然后就卡在这个位置没有返回了,根本是需要解决底层没有数据的问题。
但是应用的流程上应该需要在卡住的时候设置超时,返回错误,而不是卡死,最后是使用了select的函数来监听文件句柄。
(4)解决方法
使用select函数对fd文件句柄进行监测,当阻塞时,则不会去dqbuf,这样就可以有效避免上述问题的出现。
@@ -3916,11 +3949,28 @@ sp<V4L2Frame> ExternalCameraDeviceSession::dequeueV4l2FrameLocked(/*out*/nsecs_t
buffer.m.planes = planes;
buffer.length = PLANES_NUM;
}
-
+ ALOGE("@%s(%d) VIDIOC_DQBUF begin",__FUNCTION__,__LINE__);
+ int ts;
+ fd_set fds;
+ struct timeval tv;
+
+ FD_ZERO(&fds);
+ FD_SET(mV4l2Fd.get(), &fds);
+ tv.tv_sec = 2;
+ tv.tv_usec = 0;
+
+ ts = select(mV4l2Fd.get() + 1, &fds, NULL, NULL, &tv);
+ ALOGE("@%s(%d) select time",__FUNCTION__,__LINE__);
+ if(ts == 0)
+ {
+ ALOGE("@%s(%d) select time out",__FUNCTION__,__LINE__);
+ return -1;
+ }
if (TEMP_FAILURE_RETRY(ioctl(mV4l2Fd.get(), VIDIOC_DQBUF, &buffer)) < 0) {
ALOGE("%s: DQBUF fails: %s", __FUNCTION__, strerror(errno));
return ret;
}
+ ALOGE("@%s(%d) VIDIOC_DQBUF done",__FUNCTION__,__LINE__);
#endif
ATRACE_END();
①如果参数timeout设为NULL,则表示select()一直阻塞,直到有句柄状态变化
②如果timeout值为0,则select不阻塞直接返回
③如果timeout为某个特定值,则在特定时间内阻塞直到有句柄状态变化,如果这个时间所有句柄状态都无变化,则超时返回0