Award Abstract # 2024057
Collaborative Research: NRI: FND: Graph Neural Networks for Multi-Object Manipulation

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: UNIVERSITY OF WASHINGTON
Initial Amendment Date: September 10, 2020
Latest Amendment Date: May 19, 2022
Award Number: 2024057
Award Instrument: Standard Grant
Program Manager: Cang Ye
cye@nsf.gov
 (703)292-4702
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: October 1, 2020
End Date: December 31, 2023 (Estimated)
Total Intended Award Amount: $405,037.00
Total Awarded Amount to Date: $429,037.00
Funds Obligated to Date: FY 2020 = $405,037.00
FY 2021 = $8,000.00

FY 2022 = $16,000.00
History of Investigator:
  • Dieter Fox (Principal Investigator)
    fox@cs.washington.edu
Recipient Sponsored Research Office: University of Washington
4333 BROOKLYN AVE NE
SEATTLE
WA  US  98195-1016
(206)543-4043
Sponsor Congressional District: 07
Primary Place of Performance: University of Washington
185 Stevens Way, Computer Scienc
Seattle
WA  US  98195-2500
Primary Place of Performance
Congressional District:
07
Unique Entity Identifier (UEI): HD1WMN6945W6
Parent UEI:
NSF Program(s): IIS Special Projects,
NRI-National Robotics Initiati
Primary Program Source: 01002223DB NSF RESEARCH & RELATED ACTIVIT
01002021DB NSF RESEARCH & RELATED ACTIVIT

01002122DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 8086, 9251
Program Element Code(s): 748400, 801300
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

For robots to act as ubiquitous assistants in daily life, they must regularly contend with environments involving many objects and objects built of many constituent parts. Current robotics research focuses on providing solutions to isolated manipulation tasks, developing specialized representations that do not readily work across tasks. This project seeks to enable robots to learn to represent and understand the world from multiple sensors, across many manipulation tasks. Specifically, the project will examine tasks in heavily cluttered environments that require multiple distinct picking and placing actions. This project will develop autonomous manipulation methods suitable for use in robotic assistants. Assistive robots stand to make a substantial impact in increasing the quality of life of older adults and persons with certain degenerative diseases. These methods also apply to manipulation in natural or man-made disasters areas, where explicit object models are not available. The tools developed in this project can also improve robot perception, grasping, and multi-step manipulation skills for manufacturing.

With their ability to learn powerful representations from raw perceptual data, deep neural networks provide the most promising framework to approach key perceptual and reasoning challenges underlying autonomous robot manipulation. Despite? ?their success, existing approaches scale poorly to the diverse set of scenarios autonomous robots will handle in natural environments. These current limitations of neural networks arise from being trained on isolated tasks, use of different architectures for different problems, and inability to scale to complex scenes containing a varying or large number of objects. This project hypothesizes that graph neural networks provide a powerful framework that can encode multiple sensor streams over time to provide robots with rich and scalable representations for multi-object and multi-task perception and manipulation. This project examines a number of extensions to graph neural networks in order to address current limitations for their use in autonomous manipulation. Furthermore this project examines novel ways of leveraging learned graph neural networks for manipulation planning and control in clutter and for multi-step, multi-object manipulation tasks. In order to train these large-scale graph net representations this project will use extremely large scale, physically accurate, photo-realistic simulation. All perceptual and behavior generation techniques developed in this project will be experimentally validated on a set of challenging real-world manipulation tasks.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Xiang, Yu and Xie, Christopher and Mousavian, Arsalan and Fox, Dieter "Learning RGB-D Feature Embeddings for Unseen Object Instance Segmentation" Conference on Robot Learning CoRL , 2021 Citation Details
Xie, Christopher and Mousavian, Arsalan and Xiang, Yu and Fox, Dieter "RICE: Refining Instance Masks in Cluttered Environments with Graph Neural Networks" Conference on Robot Learning , 2021 Citation Details
Xie, Christopher and Xiang, Yu and Mousavian, Arsalan and Fox, Dieter "Unseen Object Instance Segmentation for Robotic Environments" IEEE transactions on robotics , 2021 Citation Details
Yuan, Wentao and Murali, Adithyavairavan and Mousavian, Arsalan "M2T2: Multi-Task Masked Transformer for Object-centric Pick and Place" 7th Annual Conference on Robot Learning , 2023 Citation Details
Yuan, Wentao and Paxton, Chris and Desingh, Karthik and Fox, Dieter "SORNet: Spatial object-centric representations for sequential manipulation" Conference on Robot Learning , 2022 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Robots that can robustly perform a wide range of manipulation tasks can have significant impact on many important application domains, including industrial manufacturing, warehouse logistics, health care, and home care for the elderly.  The goal of this project was to develop a learning framework that provides robust solutions to key challenges in manipulation. To achieve this, we developed advances in deep learning techniques and showed how to leverage photorealistic, physics-based simulation tools to provide training experiences for robot manipulators.


Our initial effort has focused on developing an object-centric representation that enables spatial reasoning for robot manipulation of common household objects. The ability to infer spatial relations among objects from sensor observations is a crucial step towards automatic planning for complex tasks. Our approach, called SORNet, was at the forefront of generative AI techniques for robotics reasoning, learning powerful representations of objects in manipulation scenes. We showed that SORNet generalizes well to unseen objects without any additional training, and that it can make good predictions on real-world scenes as well (despite being trained in simulation only). Toward the end of this project, we re-visited the topic of learning spatial reasoning for robot manipulation. Our approach, called RoboPoint, took advantage of the most recent advances in large language and vision models to fine-tune such models on manipulation-specific reasoning tasks. Our results demonstrate significant improvements even over the most advanced models such as GPT-4o.


Toward real world manipulation tasks, we also developed the Multi-Task Masked Transformer (M2T2) model, a unified model for learning multiple action primitives for manipulation tasks. Given a point cloud observation of a scene, M2T2 predicts collision-free gripper poses for two types of actions; 6-DoF grasping and 3-DoF placing, eliminating the need to use different methods for different actions. M2T2 is able to generate a diverse set of goal poses that provide sufficient options for low-level motion planners. It can also be combined with high-level reasoning models such as RoboPoint to solve complex tasks based on natural language input.


Overall, the combination of robust manipulation skills enabled through tools such as SORNet and M2T2, along with the improved language-based understanding capabilities via the RoboPoint approach, greatly improve the ability of robot manipulators to operate in real world settings such as home environments or hospitals. 

 

 


Last Modified: 06/25/2024
Modified by: Dieter Fox

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page