
NSF Org: |
IIS Division of Information & Intelligent Systems |
Recipient: |
|
Initial Amendment Date: | September 10, 2020 |
Latest Amendment Date: | May 19, 2022 |
Award Number: | 2024057 |
Award Instrument: | Standard Grant |
Program Manager: |
Cang Ye
cye@nsf.gov (703)292-4702 IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | October 1, 2020 |
End Date: | December 31, 2023 (Estimated) |
Total Intended Award Amount: | $405,037.00 |
Total Awarded Amount to Date: | $429,037.00 |
Funds Obligated to Date: |
FY 2021 = $8,000.00 FY 2022 = $16,000.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
4333 BROOKLYN AVE NE SEATTLE WA US 98195-1016 (206)543-4043 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
185 Stevens Way, Computer Scienc Seattle WA US 98195-2500 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): |
IIS Special Projects, NRI-National Robotics Initiati |
Primary Program Source: |
01002021DB NSF RESEARCH & RELATED ACTIVIT 01002122DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
For robots to act as ubiquitous assistants in daily life, they must regularly contend with environments involving many objects and objects built of many constituent parts. Current robotics research focuses on providing solutions to isolated manipulation tasks, developing specialized representations that do not readily work across tasks. This project seeks to enable robots to learn to represent and understand the world from multiple sensors, across many manipulation tasks. Specifically, the project will examine tasks in heavily cluttered environments that require multiple distinct picking and placing actions. This project will develop autonomous manipulation methods suitable for use in robotic assistants. Assistive robots stand to make a substantial impact in increasing the quality of life of older adults and persons with certain degenerative diseases. These methods also apply to manipulation in natural or man-made disasters areas, where explicit object models are not available. The tools developed in this project can also improve robot perception, grasping, and multi-step manipulation skills for manufacturing.
With their ability to learn powerful representations from raw perceptual data, deep neural networks provide the most promising framework to approach key perceptual and reasoning challenges underlying autonomous robot manipulation. Despite? ?their success, existing approaches scale poorly to the diverse set of scenarios autonomous robots will handle in natural environments. These current limitations of neural networks arise from being trained on isolated tasks, use of different architectures for different problems, and inability to scale to complex scenes containing a varying or large number of objects. This project hypothesizes that graph neural networks provide a powerful framework that can encode multiple sensor streams over time to provide robots with rich and scalable representations for multi-object and multi-task perception and manipulation. This project examines a number of extensions to graph neural networks in order to address current limitations for their use in autonomous manipulation. Furthermore this project examines novel ways of leveraging learned graph neural networks for manipulation planning and control in clutter and for multi-step, multi-object manipulation tasks. In order to train these large-scale graph net representations this project will use extremely large scale, physically accurate, photo-realistic simulation. All perceptual and behavior generation techniques developed in this project will be experimentally validated on a set of challenging real-world manipulation tasks.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Robots that can robustly perform a wide range of manipulation tasks can have significant impact on many important application domains, including industrial manufacturing, warehouse logistics, health care, and home care for the elderly. The goal of this project was to develop a learning framework that provides robust solutions to key challenges in manipulation. To achieve this, we developed advances in deep learning techniques and showed how to leverage photorealistic, physics-based simulation tools to provide training experiences for robot manipulators.
Our initial effort has focused on developing an object-centric representation that enables spatial reasoning for robot manipulation of common household objects. The ability to infer spatial relations among objects from sensor observations is a crucial step towards automatic planning for complex tasks. Our approach, called SORNet, was at the forefront of generative AI techniques for robotics reasoning, learning powerful representations of objects in manipulation scenes. We showed that SORNet generalizes well to unseen objects without any additional training, and that it can make good predictions on real-world scenes as well (despite being trained in simulation only). Toward the end of this project, we re-visited the topic of learning spatial reasoning for robot manipulation. Our approach, called RoboPoint, took advantage of the most recent advances in large language and vision models to fine-tune such models on manipulation-specific reasoning tasks. Our results demonstrate significant improvements even over the most advanced models such as GPT-4o.
Toward real world manipulation tasks, we also developed the Multi-Task Masked Transformer (M2T2) model, a unified model for learning multiple action primitives for manipulation tasks. Given a point cloud observation of a scene, M2T2 predicts collision-free gripper poses for two types of actions; 6-DoF grasping and 3-DoF placing, eliminating the need to use different methods for different actions. M2T2 is able to generate a diverse set of goal poses that provide sufficient options for low-level motion planners. It can also be combined with high-level reasoning models such as RoboPoint to solve complex tasks based on natural language input.
Overall, the combination of robust manipulation skills enabled through tools such as SORNet and M2T2, along with the improved language-based understanding capabilities via the RoboPoint approach, greatly improve the ability of robot manipulators to operate in real world settings such as home environments or hospitals.
Last Modified: 06/25/2024
Modified by: Dieter Fox
Please report errors in award information by writing to: awardsearch@nsf.gov.