NSF Award Search: Award # 1948017

Award Abstract # 1948017

CRII: RI: Characterizing Algorithm-Relative Difficulty of Agent Benchmarks

NSF Org:	IIS Division of Information & Intelligent Systems
Recipient:	AMERICAN UNIVERSITY
Initial Amendment Date:	March 17, 2020
Latest Amendment Date:	March 17, 2020
Award Number:	1948017
Award Instrument:	Standard Grant
Program Manager:	Cang Ye cye@nsf.gov (703)292-4702 IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering
Start Date:	April 1, 2020
End Date:	March 31, 2024 (Estimated)
Total Intended Award Amount:	$174,951.00
Total Awarded Amount to Date:	$174,951.00
Funds Obligated to Date:	FY 2020 = $174,951.00
History of Investigator:	Mark Nelson (Principal Investigator) mnelson@american.edu
Recipient Sponsored Research Office:	American University 4400 MASSACHUSETTS AVE NW WASHINGTON DC US 20016-8003 (202)885-3440
Sponsor Congressional District:	00
Primary Place of Performance:	American University 4400 Massachusetts Ave, NW Washington DC US 20016-8002
Primary Place of Performance Congressional District:	00
Unique Entity Identifier (UEI):	H4VNDUN2VWU5
Parent UEI:
NSF Program(s):	Robust Intelligence
Primary Program Source:	01002021DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):	7495, 8228
Program Element Code(s):	749500
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.070

ABSTRACT

There are a wide variety of artificial intelligence (AI) algorithms designed to make decisions for a number of different real-world problems. One important task of AI research is to determine how well these algorithms solve various problems. Researchers often use smaller problems such as games to study algorithmic decision-making. For example, the game Go can be used to test strategic decision-making, or arcade games to test tactical decision-making. How hard these test problems are may vary for different algorithms, and can depend on factors such as how much computation time is available. The purpose of this project is to systematically understand the difficulty that AI challenge problems pose to standard decision-making algorithms, as well as how robust such conclusions are to variations in problem design, problem size, computational resources, and algorithm configuration.

This project will use three methods to develop metrics for algorithm-relative benchmark difficulty, studying standard decision-making algorithms for both real-time statistical planning and reinforcement learning. First, systematic generation of scaling curves on each benchmark problem showing how performance scales with computational resources given to an agent, as well as with problem size, size of the action space, and other configurable parameters. Second, identification of problems that reliably differentiate algorithm performance, i.e., those on which some algorithms perform very well but others very poorly, illuminating their relative strengths. Third, applying recent algorithms that scale up analytical solution methods to larger problems, possibly approaching those used as more recent AI benchmarks, in order to compare scaling curves with optimal performance, when optima are possible to compute. Doing so has the potential to improve our understanding of broadly used AI and machine-learning algorithms, particularly how certain problem features impact the performance of these algorithms. Such information can potentially be used to design better and more robust algorithms that perform well across a variety of problem settings.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Meyerson, Elliot and Nelson, Mark J and Bradley, Herbie and Gaier, Adam and Moradi, Arash and Hoover, Amy K and Lehman, Joel "Language Model Crossover: Variation through Few-Shot Prompting" ACM Transactions on Evolutionary Learning and Optimization , 2024 https://doi.org/10.1145/3694791 Citation Details

Moradi_Karkaj, Arash and Nelson, Mark J and Koutis, Ioannis and Hoover, Amy K "Prompt Wrangling: On Replication and Generalization in Large Language Models for PCG Levels" , 2024 https://doi.org/10.1145/3649921.3659853 Citation Details

Nelson, Mark J. "Estimates for the Branching Factors of Atari Games" Proceedings of the 2021 IEEE Conference on Games , 2021 https://doi.org/10.1109/CoG52621.2021.9619137 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

This project's core concern has been to investigate scaling of AI decision-making algorithms, with a focus on how their decision-making performance scales with computational resources in particular.

One of the main algorithms studied in the project was Monte Carlo Tree Search (MCTS). This is a statistical sampling version of tree search whose first high-profile success was in playing the board game Go. It has since been applied to a number of other problems, from compiler optimization selection to robot decision-making. It is not always as successful in these other domains, however. This project implemented and released a new open-source library, MCTSLib, supporting MCTS and popular variants, to investigate those issues. Through experiments using MCTSLib, we identified two main reasons for poor scaling in some domains. One reason is that many real-world domains have much longer action horizons, which MCTS struggles to do well at, especially without a good intermediate state evaluation function. The other reason is that many domains have pairs of actions that "undo" each other, such as moving right twice then left once. This might put you in the same end place as moving right once, but MCTS will often be bogged down in evaluating an combinatorial explosion of such action interleavings. We developed an algorithm that mitigates the latter problem in approximately half of domains we tested.

In addition to MCTS, we investigated whether recent advances in large language models (LLMs) can help improve scaling of decision-making, for example by being able to take candidate solutions and more rapidly improve them to better solutions. An initial algorithm in this direction, which we call language model crossover (LMX), was developed and published in collaboration with a team of other researchers in industry and academia.

As this project was awarded through the CISE Research Initiation Initiative (CRII), in addition to the core scientific work of the project, it also had a goal of jumpstarting a new faculty member's research group. To that end, it supported two MS students who have graduated and are now applying skills developed during this project in their industry jobs. One is at Google working on machine learning compiler infrastructure, and one is CTO of a health-tech startup.

Last Modified: 07/30/2024
Modified by: Mark J Nelson

Please report errors in award information by writing to: awardsearch@nsf.gov.

Success

Error