Details zur Publikation |
| Kategorie | Textpublikation |
| Referenztyp | Zeitschriften |
| DOI | 10.1109/LCSYS.2025.3547629 |
| Volltext | Autorenversion |
| Titel (primär) | Off-policy temporal difference learning for perturbed Markov Decision Processes: theoretical insights and extensive simulations |
| Autor | Forootani, A.; Iervolino, R.; Tipaldi, M.; Khosravi, M. |
| Quelle | IEEE Control Systems Letters |
| Erscheinungsjahr | 2024 |
| Department | BIOENERGIE |
| Band/Volume | 8 |
| Seite von | 3488 |
| Seite bis | 3493 |
| Sprache | englisch |
| Topic | T5 Future Landscapes |
| Keywords | Reinforcement Learning; Markov Decision Processes; Temporal Difference Learning; Perturbed Probability Transition Matrix |
| Abstract | Dynamic Programming suffers from the curse of dimensionality due to large state and action spaces, a challenge further compounded by uncertainties in the environment. To mitigate these issue, we explore an off-policy based Temporal Difference Approximate Dynamic Programming approach that preserves contraction mapping when projecting the problem into a subspace of selected features, accounting for the probability distribution of the perturbed transition probability matrix. We further demonstrate how this Approximate Dynamic Programming approach can be implemented as a particular variant of the Temporal Difference learning algorithm, adapted for handling perturbations. To validate our theoretical findings, we provide a numerical example using a Markov Decision Process corresponding to a resource allocation problem. |
| dauerhafte UFZ-Verlinkung | https://www.ufz.de/index.php?en=20939&ufzPublicationIdentifier=30507 |
| Forootani, A., Iervolino, R., Tipaldi, M., Khosravi, M. (2024): Off-policy temporal difference learning for perturbed Markov Decision Processes: theoretical insights and extensive simulations IEEE Control Syst. Lett. 8 , 3488 - 3493 10.1109/LCSYS.2025.3547629 |
|