
NSF Org: |
ECCS Division of Electrical, Communications and Cyber Systems |
Recipient: |
|
Initial Amendment Date: | August 4, 2014 |
Latest Amendment Date: | August 4, 2014 |
Award Number: | 1407925 |
Award Instrument: | Standard Grant |
Program Manager: |
Usha Varshney
ECCS Division of Electrical, Communications and Cyber Systems ENG Directorate for Engineering |
Start Date: | August 15, 2014 |
End Date: | July 31, 2017 (Estimated) |
Total Intended Award Amount: | $154,244.00 |
Total Awarded Amount to Date: | $154,244.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
107 S INDIANA AVE BLOOMINGTON IN US 47405-7000 (317)278-3473 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
LOCKEFIELD 2232; 980 INDIANA AVE Indianapolis IN US 46202-2915 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | EPCN-Energy-Power-Ctrl-Netwrks |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.041 |
ABSTRACT
This project attempts to develop better methods for Reinforcement Learning and Approximate Dynamic Programming (RLADP), in order to be able to handle decision tasks with greater complexity both in time and in space. Reinforcement learning systems are systems which can learn to maximize any measure of performance or satisfaction, based on their experience of observing their environment, acting on the environment, and receiving feedback on performance, similar to the pain or pleasure which is used to reinforce animal behavior. Current reinforcement learning methods do not learn fast enough to perform well, when their environment is too complex in space or in time. This project will develop new methods to handle that kind of complexity. The team will also have a collaboration with IBM research, and will try to address a testbed problem involving the management of a fleet of plug-in hybrid cars.
Complexity in time will be handled by use of a multiple model approach, connecting various options or skills by evaluation and updating of the landmark states which mark transitions between different regions of state space. This is similar to previous work on decision blocks and modified Bellman equations previously presented at the PI's workshop on learning and adaptive systems, but otherwise is a unique, new an important direction. Complexity in space is addressed by a multiagent approach, based on a kind of spatial decomposition.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Reinforcement learning refers to a set of techniques for an (artificial) agent (such as a robot) to learn optimal actions or strategies in an unknown, dynamic, environment through trial and error. Although they hold great promise towards developing intelligent autonomous devices and/or software, their slow speed of learning or convergence limits their practical applicability.
This NSF funded project investigated the feasibility of speeding up the convergence of reinforcement learning algorithms by means of state decomposition, i.e., decomposing a complex problem (system) into many simple, smaller ones, and then assigning a separate learning agent to each of the sub-problems.
While a decentralized approach (without any communication between the multiple agents assigned to the sub-problems) has the potential to achieve fast response, in the event the multiple sub-problems are not completely independent, the realized (learned) action may only be sub-optimal, because it ignores the interconnections between the subsystems.
To address this problem, during the given project, an approach called “selective decentralization” was developed where each agent selectively interacts with some (but not all) of the other agents, and learns the interconnection pattern (i.e., who to communicate with) by employing multiple models, one for each possible interconnection pattern. Such selectively decentralized reinforcement learning algorithms has the potential to realize solutions that are both fast and accurate (optimal).Attached Figure 1 shows a typical response comparing completely centralized, completely decentralized, and selectively decentralized algorithms in a typical simulation study. It is clear from the figure that the response with selective decentralization is the among the fastest and most accurate responses.
Similar behavior was also observed in numerous other simulation studies which have been reported in many peer-reviewed research papers published during the project.The theoretical and experimental studies clearly indicate that selectively decentralized reinforcement learning algorithms hold great promise in many complex, practical applications.
Last Modified: 10/03/2017
Modified by: Snehasis Mukhopadhyay
Please report errors in award information by writing to: awardsearch@nsf.gov.