NSF Award Search: Award # 1407925

Award Abstract # 1407925

Fast Reinforcement Learning Using Multiple Models and State Decomposition

NSF Org:	ECCS Division of Electrical, Communications and Cyber Systems
Recipient:	TRUSTEES OF INDIANA UNIVERSITY
Initial Amendment Date:	August 4, 2014
Latest Amendment Date:	August 4, 2014
Award Number:	1407925
Award Instrument:	Standard Grant
Program Manager:	Usha Varshney ECCS Division of Electrical, Communications and Cyber Systems ENG Directorate for Engineering
Start Date:	August 15, 2014
End Date:	July 31, 2017 (Estimated)
Total Intended Award Amount:	$154,244.00
Total Awarded Amount to Date:	$154,244.00
Funds Obligated to Date:	FY 2014 = $154,244.00
History of Investigator:	Snehasis Mukhopadhyay (Principal Investigator) smukhopa@iupui.edu
Recipient Sponsored Research Office:	Indiana University 107 S INDIANA AVE BLOOMINGTON IN US 47405-7000 (317)278-3473
Sponsor Congressional District:	09
Primary Place of Performance:	Indiana University Purdue University Indianapolis LOCKEFIELD 2232; 980 INDIANA AVE Indianapolis IN US 46202-2915
Primary Place of Performance Congressional District:	07
Unique Entity Identifier (UEI):	YH86RTW2YVJ4
Parent UEI:
NSF Program(s):	EPCN-Energy-Power-Ctrl-Netwrks
Primary Program Source:	01001415DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):	1653
Program Element Code(s):	760700
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.041

ABSTRACT

This project attempts to develop better methods for Reinforcement Learning and Approximate Dynamic Programming (RLADP), in order to be able to handle decision tasks with greater complexity both in time and in space. Reinforcement learning systems are systems which can learn to maximize any measure of performance or satisfaction, based on their experience of observing their environment, acting on the environment, and receiving feedback on performance, similar to the pain or pleasure which is used to reinforce animal behavior. Current reinforcement learning methods do not learn fast enough to perform well, when their environment is too complex in space or in time. This project will develop new methods to handle that kind of complexity. The team will also have a collaboration with IBM research, and will try to address a testbed problem involving the management of a fleet of plug-in hybrid cars.

Complexity in time will be handled by use of a multiple model approach, connecting various options or skills by evaluation and updating of the landmark states which mark transitions between different regions of state space. This is similar to previous work on decision blocks and modified Bellman equations previously presented at the PI's workshop on learning and adaptive systems, but otherwise is a unique, new an important direction. Complexity in space is addressed by a multiagent approach, based on a kind of spatial decomposition.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Thanh Nguyen and Snehasis Mukhopadhyay "Identification and Optimal Control of Large-Scale Systems Using Selective Decentralization" IEEE Systems, Man, and Cybernetics International Conference (IEEE SMC) , 2016

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Reinforcement learning refers to a set of techniques for an (artificial) agent (such as a robot) to learn optimal actions or strategies in an unknown, dynamic, environment through trial and error. Although they hold great promise towards developing intelligent autonomous devices and/or software, their slow speed of learning or convergence limits their practical applicability.

This NSF funded project investigated the feasibility of speeding up the convergence of reinforcement learning algorithms by means of state decomposition, i.e., decomposing a complex problem (system) into many simple, smaller ones, and then assigning a separate learning agent to each of the sub-problems.

While a decentralized approach (without any communication between the multiple agents assigned to the sub-problems) has the potential to achieve fast response, in the event the multiple sub-problems are not completely independent, the realized (learned) action may only be sub-optimal, because it ignores the interconnections between the subsystems.

To address this problem, during the given project, an approach called “selective decentralization” was developed where each agent selectively interacts with some (but not all) of the other agents, and learns the interconnection pattern (i.e., who to communicate with) by employing multiple models, one for each possible interconnection pattern. Such selectively decentralized reinforcement learning algorithms has the potential to realize solutions that are both fast and accurate (optimal).Attached Figure 1 shows a typical response comparing completely centralized, completely decentralized, and selectively decentralized algorithms in a typical simulation study. It is clear from the figure that the response with selective decentralization is the among the fastest and most accurate responses.

Similar behavior was also observed in numerous other simulation studies which have been reported in many peer-reviewed research papers published during the project.The theoretical and experimental studies clearly indicate that selectively decentralized reinforcement learning algorithms hold great promise in many complex, practical applications.

Last Modified: 10/03/2017
Modified by: Snehasis Mukhopadhyay

Image

Please report errors in award information by writing to: awardsearch@nsf.gov.

Success

Error