A Computational Note on Markov Decision Processes Without Discounting

Pfeiffer, Paul E.; Dennis, J.E. Jr.

A Computational Note on Markov Decision Processes Without Discounting

Files

TR87-19.pdf (136.25 KB)

Date

1987-07

Authors

Pfeiffer, Paul E.

Dennis, J.E. Jr.

Abstract

The Markov decision process is treated in a variety of forms or cases: finite or infinite horizon, with or without discounting. The finite horizon cases and the case of infinite horizon with discounting have received considerable attention. In the infinite horizon case, with discounting, the problem either receives a linear programming treatment or is treated by the elegant and effective policy-iteration procedure by Ronald Howard. In the undiscounted case, however, a special form of this procedure is required, which detracts from the directness and elegance of the method. The difficulty comes in the step generally called the value-determination procedure. The equations used in this step are linearly dependent, so that the solution of the system of linear equations requires some adjustment. We propose a new computational procedure which avoids this difficulty and works directly with the average next-period gains and powers of the transition probability matrix. The fundamental computational tools are matrix multiplication and addition.

Type

Technical report

Citation

Pfeiffer, Paul E. and Dennis, J.E. Jr.. "A Computational Note on Markov Decision Processes Without Discounting." (1987) https://hdl.handle.net/1911/101629.

Citable link to this page

https://hdl.handle.net/1911/101629

Collections

CMOR Technical Reports

Full item page