search

UMD     This Site






In recent years, the Go-playing artificial intelligence AlphaGo and its successors AlphaGo Zero and AlphaZero have made international headlines with their incredible successes in game playing. They are part of a line of AI systems developed to beat humans at games like Go, checkers, chess, Scrabble and Jeopardy. Each successive challenge extends the boundaries of machine learning and its capabilities. The programs have been touted as evidence of the immense potential of artificial intelligence, and in particular, machine learning.

At the core of AlphaGo and its successors are the ideas related to adaptive multistage sampling (AMS) simulation-based algorithms for Markov decision processes (MDPs) first explored by four University of Maryland researchers in a 2005 Operations Research paper. Now, one of the researchers, Professor Michael C. Fu (BMGT/ISR), has written “Simulation-Based Algorithms for Markov Decision Processes: Monte Carlo Tree Search from AlphaGo to AlphaZero,” a review of the original ideas and the ensuing developments in the Asia-Pacific Journal of Operational Research, Vol. 36, No. 06, 1940009 (2019).

The deep neural networks of AlphaGo, AlphaZero, and all their incarnations are trained using a technique called Monte Carlo tree search (MCTS), whose roots can be traced back to an AMS simulation-based algorithm for MPDs  published in Operations Research back in 2005.

 “An adaptive sampling algorithm for solving Markov decision processes”  was written by Institute for Systems Research (ISR) Postdoctoral Researcher Hyeong Soo Chang, Professor Michael C. Fu, Electrical and Computer Engineering (ECE) Ph.D. student Jiaqiao Hu, and Professor Steven I. Marcus (ECE/ISR). The idea was introduced even earlier in 2002.

In the current review article, Fu reviews the history and background of AlphaGo through AlphaZero, traces the origins of MCTS back to simulation-based algorithms for MDPs, and examines its role in training the neural networks that essentially carry out the value/policy function approximation used in approximate dynamic programming, reinforcement learning, and neuro-dynamic programming. Fu also includes discussion recently proposed enhancements that build on statistical ranking and selection research in the operations research simulation community.



Related Articles:
Manocha talks AI on Federal News Network podcast
ISR faculty leading, playing key roles in ARL cooperative agreement
Planning and learning algorithms developed for refinement acting engine
Who's walking deceptively? Manocha's team thinks they know.
New affiliate faculty Mark Fuge is expert in machine learning and artificial intelligence
Maryland research contributes to Google’s AlphaGo AI system
Reinforcement learning is a game for Kaiqing Zhang
'OysterNet' + underwater robots will aid in accurate oyster count
New system uses machine learning to detect ripe strawberries and guide harvests
Manocha Receives 2022 Verisk AI Faculty Research Award

December 17, 2019


«Previous Story  

 

 

Current Headlines

Srivastava Named Inaugural Director of Semiconductor Initiatives and Innovation

State-of-the-Art 3D Nanoprinter Now at UMD

UMD, Partners Receive $31M for Semiconductor Research

Two NSF Awards for ECE Alum Michael Zuzak (Ph.D. ’22)

Applications Open for Professor and Chair of UMD's Department of Materials Science and Engineering

Ghodssi Honored With Gaede-Langmuir Award

Milchberg and Wu named Distinguished University Professors

New features on ingestible capsule will deliver targeted drugs to better treat IBD, Crohn’s disease

Forty years of MEMS research at the Hilton Head Workshop

Baturalp Buyukates (ECE Ph.D. ’21) Honored by IEEE ComSoc

 
 
Back to top  
Home Clark School Home UMD Home