A Computational Note on Markov Decision Processes Without Discounting

Date
1987-07
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract

The Markov decision process is treated in a variety of forms or cases: finite or infinite horizon, with or without discounting. The finite horizon cases and the case of infinite horizon with discounting have received considerable attention. In the infinite horizon case, with discounting, the problem either receives a linear programming treatment or is treated by the elegant and effective policy-iteration procedure by Ronald Howard. In the undiscounted case, however, a special form of this procedure is required, which detracts from the directness and elegance of the method. The difficulty comes in the step generally called the value-determination procedure. The equations used in this step are linearly dependent, so that the solution of the system of linear equations requires some adjustment. We propose a new computational procedure which avoids this difficulty and works directly with the average next-period gains and powers of the transition probability matrix. The fundamental computational tools are matrix multiplication and addition.

Description
Advisor
Degree
Type
Technical report
Keywords
Citation

Pfeiffer, Paul E. and Dennis, J.E. Jr.. "A Computational Note on Markov Decision Processes Without Discounting." (1987) https://hdl.handle.net/1911/101629.

Has part(s)
Forms part of
Published Version
Rights
Link to license
Citable link to this page