Beyond instance-level reasoning in object pose estimation and tracking for robotic manipulation

Wen, Bowen

doi:doi:10.7282/t3-hzy9-ep43

RUcore: Rutgers University Community Repository

Search
- All
- Text
- Images
- Audio
- Video
Advanced Search | Help

Search all content in all RUcore collections.
Services
Collections

Help Contact Us My Account

Home

Resource

Beyond instance-level reasoning in object pose estimation and tracking for robotic manipulation

PDF

PDF format is widely accepted and good for printing.

Plug-in required

PDF-1(17.09 MB)

Citation & Export

View Usage Statistics

Staff View

Citation & Export
Hide

Simple citation

Wen, Bowen. Beyond instance-level reasoning in object pose estimation and tracking for robotic manipulation. Retrieved from https://doi.org/doi:10.7282/t3-hzy9-ep43

Export

Click here for information about Citation Management Tools at Rutgers.

Statistics
Hide

Description

TitleBeyond instance-level reasoning in object pose estimation and tracking for robotic manipulation

NameWen, Bowen (author); Bekris, Kostas (chair); Boularias, Abdeslam (member); Metaxas, Dimitri (member); Song, Shuran (member); Rutgers University; School of Graduate Studies

Date Created2022

Other Date2022-10 (degree)

SubjectComputer science, Artificial intelligence, Computer vision, Machine learning, Robotics

Extent1 online resource (181 pages) : illustrations

DescriptionThis thesis deals with object pose estimation and tracking, and solve robot manipulation tasks. It aims to address uncertainty due to dynamics and generalize to novel object instances by reducing the dependency on either instance or category level 3D models.

Robot object manipulation often requires reasoning about object poses given visual data. For instance, pose estimation can be used to initiate pick-and-drop manipulation and has been studied extensively. Purposeful manipulation, however, such as precise assembly or within-hand re-orientation, requires sustained reasoning of an object's state, since dynamic effects due to contacts and slippage, may alter the relative configuration between the object and the robotic hand. This motivates the temporal tracking of object poses over image sequences, which reduces computational latency, while maintaining or even enhancing pose quality relative to single-shot pose estimation. Most existing techniques in this domain assume instance-level 3D models. This complicates generalization to novel, unseen instances, and thus hinders deployment to novel environments. Even if instance-level 3D models are unavailable, however, it may be possible to access category-level models. Thus, it is desirable to learn category-level priors, which can be used for the visual understanding of novel, unknown object instances. In the most general case, where the robot has to deal with out-of-distribution instances or it cannot access category-level priors, object-agnostic perception methods are needed.

Given this context, this thesis proposes a category-level representation, called NUNOCS, to unify the representation of various intra-class object instances and facilitate the transfer of category-level knowledge across such instances. This work also integrates the strengths of both modern deep learning as well as pose graph optimization to achieve generalizable object tracking in the SE(3) space, without needing either instance or category level 3D models. When instance-level object models are available, a synthetic data generation pipeline is developed to learn the relative motion along manifolds by reasoning over image residuals. This allows to achieve state-of-art SE(3) pose tracking results, while circumventing manual efforts in data collection or annotation. It also demonstrates that the developed solutions for object tracking provide efficient solutions to multiple manipulation challenges.

Specifically, this thesis starts from a single-image object pose estimation approach that deals with severe occlusions during manipulation. It then moves to long-term object pose tracking via reasoning over image residuals between consecutive frames, while training exclusively over synthetic data. In the case of object tracking along a video sequence, the dependency on either instance-level or category-level CAD models is reduced via leveraging multi-view consistency, in the form of a memory-augmented pose graph optimization, to achieve spatial-temporal consistency. For initializing pose estimates in video sequences involving novel unseen objects, category-level priors are extracted by taking advantage of easily accessible virtual 3D model databases. Following these ideas, frameworks for category-level, task-relevant grasping, and vision-based, closed-loop manipulation are developed, which resolve complicated and high precision tasks. The learning process is scalable as the training is performed exclusively over synthetic data or through a robot's self-interaction process conducted solely in simulation.

The proposed methods are evaluated first over public computer vision benchmarks, boosting the previous state-of-art tracking accuracy from 33.3% to 87.4% on the NOCS dataset, despite reducing dependency on category-level 3D models for training. When applied to real robotic setups, they significantly improve category-level manipulation performance, validating their effectiveness and robustness. In addition, this thesis unlocks and demonstrates multiple complex manipulation skills in open world environments. This is despite limited input assumptions, such as training solely over synthetic data, dealing with novel unknown objects, or learning from a single visual demonstration.

NotePh.D.

NoteIncludes bibliographical references

Genretheses

Persistent URLhttps://doi.org/doi:10.7282/t3-hzy9-ep43

LanguageEnglish

CollectionSchool of Graduate Studies Electronic Theses and Dissertations

Organization NameRutgers, The State University of New Jersey

RightsThe author owns the copyright to this work.

Version 8.5.5

Citation & ExportHide

Simple citation

Export

StatisticsHide

Description

Citation & Export
Hide

Statistics
Hide