The KITTI Vision Benchmark Suite - home

The KITTI Vision Benchmark Suite - home

A project of Karlsruhe Institute of Technology and Toyota Technological Institute at Chicago
http://www.cvlibs.net/datasets/kitti/index.php

这里写图片描述

Welcome to the KITTI Vision Benchmark Suite!

We take advantage of our autonomous driving platform Annieway to develop novel challenging real-world computer vision benchmarks. Our tasks of interest are: stereo, optical flow, visual odometry, 3D object detection and 3D tracking. For this purpose, we equipped a standard station wagon with two high-resolution color and grayscale video cameras. Accurate ground truth is provided by a Velodyne laser scanner and a GPS localization system. Our datasets are captured by driving around the mid-size city of Karlsruhe, in rural areas and on highways. Up to 15 cars and 30 pedestrians are visible per image. Besides providing all data in raw format, we extract benchmarks for each task. For each of our benchmarks, we also provide an evaluation metric and this evaluation website. Preliminary experiments show that methods ranking high on established benchmarks such as Middlebury perform below average when being moved outside the laboratory to the real world. Our goal is to reduce this bias and complement existing benchmarks by providing real-world benchmarks with novel difficulties to the community.
KITTI 由德国卡尔斯鲁厄理工学院和丰田美国技术研究院联合创办,是目前国际上最大的自动驾驶场景下的计算机视觉算法评测数据集。该数据集用于评测立体图像 (stereo),光流 (optical flow),视觉测距 (visual odometry),3D物体检测 (object detection) 和3D跟踪 (tracking) 等计算机视觉技术在车载环境下的性能。KITTI 包含市区、乡村和高速公路等场景采集的真实图像数据,每张图像中多达 15 辆车和 30 个行人,还有各种程度的遮挡。由于车道场景丰富、光照差异性大,道路两侧停车车辆、中央行驶车辆和步行行人形成各类干扰,部分场景为转弯或十字交叉路口,检测难度非常大。
KITTI 数据集中,目标检测包括了车辆检测、行人检测、自行车等三个单项。目标追踪包括车辆追踪、行人追踪等两个单项。道路分割包括 urban unmarked、urban marked、urban multiple marked 三个场景及前三个场景的平均值 urban road 等四个单项。
从准确率、召回率、错误率、漏检率等六个维度进行综合评测。此技术可应用于交通流量统计、拥堵和抛洒物检测,也可拓展至监控视频活动目标提取、人群密度监测等。
To get started, grab a cup of your favorite beverage and watch our video trailer (5 minutes):

这里写图片描述

This video: in high-resolution (720 MB) at youtube
http://www.cvlibs.net/datasets/kitti/video/kitti_trailer.zip

Copyright

All datasets and benchmarks on this page are copyright by us and published under the Creative Commons Attribution - NonCommercial - ShareAlike 3.0 License. This means that you must attribute the work in the manner specified by the authors, you may not use this work for commercial purposes and if you alter, transform, or build upon this work, you may distribute the resulting work only under the same license.

Citation

When using this dataset in your research, we will be happy if you cite us! (or bring us some self-made cake or ice-cream)
For the stereo 2012, flow 2012, odometry, object detection or tracking benchmarks, please cite:

@INPROCEEDINGS{Geiger2012CVPR,
  author = {Andreas Geiger and Philip Lenz and Raquel Urtasun},
  title = {Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite},
  booktitle = {Conference on Computer Vision and Pattern	Recognition (CVPR)},
  year = {2012}
} 
For the raw dataset, please cite:
@ARTICLE{Geiger2013IJRR,
  author = {Andreas Geiger and Philip Lenz and Christoph Stiller and Raquel Urtasun},
  title = {Vision meets Robotics: The KITTI Dataset},
  journal = {International Journal of Robotics Research (IJRR)},
  year = {2013}
} 
For the road benchmark, please cite:
@INPROCEEDINGS{Fritsch2013ITSC,
  author = {Jannik Fritsch and Tobias Kuehnl and Andreas Geiger},
  title = {A New Performance Measure and Evaluation Benchmark for Road Detection Algorithms},
  booktitle = {International Conference on Intelligent Transportation Systems (ITSC)},
  year = {2013}
} 
For the stereo 2015, flow 2015 and scene flow 2015 benchmarks, please cite:
@INPROCEEDINGS{Menze2015CVPR,
  author = {Moritz Menze and Andreas Geiger},
  title = {Object Scene Flow for Autonomous Vehicles},
  booktitle = {Conference on Computer Vision and Pattern	Recognition (CVPR)},
  year = {2015}
}

Changelog

Privacy

This dataset is made available for academic use only. However, we take your privacy seriously! If you find yourself or personal belongings in this dataset and feel unwell about it, please contact us and we will immediately remove the respective data from our server.

Credits

We thank Karlsruhe Institute of Technology (KIT) and Toyota Technological Institute at Chicago (TTI-C) for funding this project and Jan Cech (CTU) and Pablo Fernandez Alcantarilla (UoA) for providing initial results. We further thank our 3D object labeling task force for doing such a great job: Blasius Forreiter, Michael Ranjbar, Bernhard Schuster, Chen Guo, Arne Dersein, Judith Zinsser, Michael Kroeck, Jasmin Mueller, Bernd Glomb, Jana Scherbarth, Christoph Lohr, Dominik Wewers, Roman Ungefuk, Marvin Lossa, Linda Makni, Hans Christian Mueller, Georgi Kolev, Viet Duc Cao, Bünyamin Sener, Julia Krieg, Mohamed Chanchiri, Anika Stiller. Many thanks also to Qianli Liao (NYU) for helping us in getting the don’t care regions of the object detection benchmark correct. Special thanks for providing the voice to our video go to Anja Geiger!

Wordbook

Karlsruhe Institute of Technology (German: Karlsruher Institut für Technologie),KIT:卡尔斯鲁厄理工学院
Toyota Technological Institute at Chicago,TTIC:丰田工业大学芝加哥分校
University of Toronto,U of T, UToronto, or Toronto:多伦多大学
optical flow or optic flow:光流
stereo [ˈsterɪəʊ]:立体声,立体声系统,铅版,立体照片,立体的,立体声的,立体感觉的
visual odometry:视觉测程法
Velodyne LiDAR:威力登雷达
Karlsruhe ['kɑ:rls,ru:ə]:卡尔斯鲁厄 (德国城市)
Middlebury:米德尔伯里 (美国城市)

APPENDIX

视频监控或图像识别,成为自动驾驶的一个关键技术支撑,其他支撑还包括雷达及 GPS 技术等传感器应用。

摄像机的优势是信息量比较大 (捕捉的是可见光)、成本不高,另外还有一个优势,就是它的频率相对于其他传感器来说是非常高的,一般的正常摄像机都应该是 30 到 60 HZ 上工作,针对高速移动的物体具有明显优势。

现在智能驾驶的平台都会装上多个摄像头,包括前向、侧向、后向,摄像头会记录周围 360 度环境数据,根据摄像头的输入,再加上一些人工智能的算法,可以得到一些对周围物体的感知,基于这些感知再做后一步,比如路径规划等。

另外的传感器,就是我们所说的激光雷达,一般就叫雷达。雷达的工作原理是发射出一束激光,激光并不是可见光,是肉眼看不见的,激光碰到物体以后会反射回来,然后再测量这个激光从发射到反射中间渡过的时间,这样的话可以具体知道这个物体的深度。一根激光可以得到一个点的深度,多个激光就可以得到多个点的深度,这就是平时说的多少线的激光雷达。

雷达相对于摄像头来说有一个问题,就是它虽然可以提供比较多的深度,但是它在现在最好的雷达上面也只有 64条线,那它的分辨率相对于比如说高精摄像头来说远远不够,很难表现出这个物体上比较丰富的变化。比如说人的话,一个可能位置在几十米开外,这个人可能就只有 50 个点到 100 个点在上面,所以它可以有人的大体的轮廓。

摄像头感知距离。对摄像头来说,一般来说有几种方案来感知距离,比较传统的就是用双目摄像头来做,就是通过两个摄像头来检测同一个物体,如果知道这个物体在两个摄像头里面具体的图片中的位置,如果两个摄像头中间有一些距离,你可以知道这个位置的视差,通过这个视差再加上一些几何上的过程,可以知道具体物体的位置。

但是,比较明显的就是双目摄像头来进行深度的检测,那它会制约于几个因素,第一个因素就是两个摄像头之间的距离。另外一个就是双目摄像头来测量深度会局限在具体的匹配的算法,会出现一定的误差。但是总体来说,双目摄像头在现在的这种算法上还是做的比较准确的,特别是在十米、二十范围为内可以做到比较好的一个感知

基于深度学习算法的积累,构建最优的多任务联合模型,样本分解、模型调整到参数调优。

提出了基于区域融合决策和上下文相关的多任务深度神经网络,用于复杂场景下的车辆检测任务,重点解决其中多视角,多姿态以及车辆遮挡等问题。

在网络结构设计上,利用反卷积操作提高了小目标的召回率,同时拼接了多层特征以融合低层的局部信息和高层的语义信息,提高了边框定位的准确率。在训练过程中,还借鉴了 GAN (生成对抗网络) 中的对抗训练模式。

猜你喜欢

转载自blog.csdn.net/chengyq116/article/details/82659732