Paper List
Computer Vision
2D Vision
- (NeurIPS 2015) Spatial Transformer Networks
- Blog: CSDN STN 详解
3D Vision
Object Generation
- (arxiv 2023 12.) Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting
Neural Radience Fields
Novel View Synthesis
Scene Generation
Learning
Visual Reinforcement Learning
Robotics
Robotic Manipulation
Mobile Manipulation
- (arxiv 2024 1. ) Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation
Pre-training with Large Model
- (arxiv 2023) Unleashing Large-Scale Video Generative Pre-training
for Visual Robot Manipulation
- We introduce GR-1, a straightforward GPT-style model designed for multi-task language conditioned visual robot manipulation.
RL+robotics
- (arxiv 2023) Mastering Stacking of Diverse Shapes with Large-Scale Iterative Reinforcement Learning on Real Robots
- main take-aways: 1. never discard any data; 2. use multi-task training to stabilize offline RL; 3. perform iterative training phases to hill-climb your performance.
Dexterous Hand Manipulation
- (arxiv 2023) Neural feels with neural fields:
Visuo-tactile perception for in-hand manipulation - (CoRL 2023) General In-Hand Object Rotation with Vision and Touch
- TLDR main takeaways:
Sim2real
- (IEEE 2023) A Real2Sim2Real Method for Robust Object Grasping with Neural Surface Reconstruction Webpage
Human Motion
- (arxiv 2023 12. ) PhysHOI: Physics-Based Imitation of Dynamic Human-Object Interaction
- (arxiv 2023 12. ) Perpetual Humanoid Control for Real-time Simulated Avatars Webpage
Benchmark
- (arxiv 2023 Dec.) Open-Source Reinforcement Learning Environments Implemented in MuJoCo with Franka Manipulator
- (arxiv 2023 12.) EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
Utilize LLM
- (arxiv 2023 12. ) ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation
- Takeaways: 1. Learning-based robot manipulation, trained on a limited category within a simulator struggles to achieve generalizability. 2. We introduce an innovative approach for robot manipulation that leverages the robust reasoning capabilities of Multimodal Large Language Models to enhance the generalization of manipulation. 3. By fine-tuning the injected adapters, we preserve the inherent common sense and reasoning ability of the MLLMs while equipping them with the ability for manipulation. 4. Moreover, in real world, we design a test-time adaptation (TTA) strategy to enable the model better adapt to the real-world scene.
Survey
- (arxiv 2023) Toward General-Purpose Robots via Foundation Models: A survey and Meta-Analysis
- Blog: 微信公众号: CAAI认知系统与信息处理专委会
如何改变文本的样式
强调文本 强调文本
加粗文本 加粗文本
标记文本
删除文本
引用文本
生成一个适合你的列表
- 计划任务
- 完成任务
设定内容居中、居左、居右
使用:---------:
居中
使用:----------
居左
使用----------:
居右
第一列 | 第二列 | 第三列 |
---|---|---|
第一列文本居中 | 第二列文本居右 | 第三列文本居左 |