Articulated Object Manipulation using Online Axis Estimation with SAM2-Based Tracking

Xi Wang*, 2    Tianxing Chen*, 1, 2    Qiaojun Yu*, 3    Tianling Xu4    
Zanxin Chen2    Yiting Fu2    Cewu Lu3, †    Yao Mu1, †    Ping Luo1, †
1University of Hong Kong     2Shenzhen University
3Shanghai Jiaotong University
4Southern University of Science and Technology

*Equal Contributions     Corresponding author

Abstract

Articulated object manipulation requires precise object interaction, where the object's axis must be carefully considered. Previous research employed interactive perception for manipulating articulated objects, but typically, open-loop approaches often suffer from overlooking the interaction dynamics. To address this limitation, we present a closed-loop pipeline integrating interactive perception with online axis estimation from segmented 3D point clouds. Our method leverages any interactive perception technique as a foundation for interactive perception, inducing slight object movement to generate point cloud frames of the evolving dynamic scene. These point clouds are then segmented using Segment Anything Model 2 (SAM2), after which the moving part of the object is masked for accurate motion online axis estimation, guiding subsequent robotic actions. Our approach significantly enhances the precision and efficiency of manipulation tasks involving articulated objects. Experiments in simulated environments demonstrate that our method outperforms baseline approaches, especially in tasks that demand precise axis-based control.



Our Pipeline

In our pipeline, an RGB-D camera captures the dynamic scene, which is induced by the slight movement from the Interactive Perception & Init-Manipulation Module. The captured scene is then processed by the Tracking & Segmentation Module, which tracks and segments the moving part of the articulated object at a 3D level. This segmented data is subsequently passed to the Axis Estimation & Manipulation Module. Here, the motion axis is explicitly calculated, providing informed guidance for the robot's manipulation policy.

Video


Results

We conduct our experiments in the SAPIEN simulator. Tasks involve opening doors or drawers to different extents.

We select an object from each category to visualize the manipulation process with online axis estimation refinement. The initial estimated axis is represented by a lighter shade of red, while the progressively refined axis is indicated by increasingly darker shades of red.
Success Rate for Basic Tasks
For each task, we evaluate our methods compared with RGBManip and other baselines separately on RGBManip's training set and testing set. Success rates of the first 100 experiments are used as metrics for comparison respectively.
Experimental results illustrate that, both our method and RGBManip almost outperform other baseline approaches while Ours consistently surpasses RGBManip in basic tasks.
Success Rate for More Challenging Tasks
Experimental results show that Ours consistently outperforms RGBManip with a significant enhancement in success rates. This clearly demonstrate the superiority of our online axis estimation approach over traditional methods, especially for tasks demands large-amplitute manipulation and precise axis-based control.

Acknowledgements

Our code is built upon RGBManip, GroundingDINO and SAM2. We would like to thank the authors for their excellent works.

BibTeX

@misc{wang2024articulatedobjectmanipulationusing,
      title={Articulated Object Manipulation using Online Axis Estimation with SAM2-Based Tracking}, 
      author={Xi Wang and Tianxing Chen and Qiaojun Yu and Tianling Xu and Zanxin Chen and Yiting Fu and Cewu Lu and Yao Mu and Ping Luo},
      year={2024},
      eprint={2409.16287},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2409.16287}, 
}