2024
Optimization of utility-based shortfall risk: A non-asymptotic viewpoint
Sumedh Gupte, Prashanth L.A., Sanjay Bhat
CDCOnline Estimation and Optimization of Utility-Based Shortfall Risk
Vishwajit Hegde, Arvind S. Menon, Prashanth L.A., Krishna Jagannathan
Mathematics of Operations ResearchPolicy Evaluation for Variance in Risk-Sensitive Average Reward Reinforcement Learning
Shubhada Agrawal, Prashanth L.A., Siva Theja Maguluri
ICMLRisk Estimation in a Markov Cost Process: Lower and Upper Bounds
Gugan Thoppe, Prashanth L.A., Sanjay Bhat
ICMLA Cubic-regularized Policy Newton Algorithm for Reinforcement Learning
Mizhaan Prajit Maniyar, Prashanth L.A., Akash Mondal, Shalabh Bhatnagar
AISTATSA Gradient Smoothed Functional Algorithm with Truncated Cauchy Random Perturbations for Stochastic Optimization
Akash Mondal, Prashanth L.A., Shalabh Bhatnagar
AutomaticaFinite Time Analysis of Temporal Difference Learning for Mean-Variance in a Discounted MDP
Tejaram Sangadi, Prashanth L.A., Krishna Jagannathan
Draft.Concentration Bounds for Optimized Certainty Equivalent Risk Estimation
Ayon Ghosh, Prashanth L.A., Krishna Jagannathan
Under review.Generalized Simultaneous Perturbation-based Gradient Search with Reduced Estimator Bias
Soumen Pachal, S.Bhatnagar and Prashanth L.A.
Under review.Adaptive Estimation of Random Vectors with Bandit Feedback: A mean-squared error viewpoint
Dipayan Sen, Prashanth L.A., Aditya Gopalan
Draft.
2023
Adaptive Estimation of Random Vectors with Bandit Feedback
Dipayan Sen, Prashanth L.A., Aditya Gopalan
ICCA policy gradient approach for optimization of smooth risk measures
N. Vijayan and Prashanth L.A.
UAIGeneralized Simultaneous Perturbation Stochastic Approximation with Reduced Estimator Bias
S.Bhatnagar and Prashanth L.A.
CISSFinite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation
Gandharv Patil, Prashanth L.A., Dheeraj Nagaraj, Doina Precup
AISTATSBandit algorithms to emulate human decision making using probabilistic distortions
Ravi Kumar Kolla, Prashanth L.A., Aditya Gopalan, Krishna Jagannathan, Michael Fu, Steve Marcus
Draft.
2022
A Wasserstein distance approach for concentration of empirical risk estimates
Prashanth L.A. and Sanjay P. Bhat
Journal of Machine Learning ResearchRisk-Sensitive Reinforcement Learning via Policy Gradient Search
Prashanth L.A. and Michael Fu
Foundations and Trends in Machine LearningA Survey of Risk-Aware Multi-Armed Bandits
Vincent Y. F. Tan, Prashanth L.A., and Krishna Jagannathan
International Joint Conference on Artificial Intelligence (IJCAI) (Survey track)Non-asymptotic bounds for stochastic optimization with biased noisy gradient oracles
Nirav Bhavsar and Prashanth L.A.
IEEE Transactions on Automatic Control
2021
Smoothed functional-based gradient algorithms for off-policy reinforcement learning: A non-asymptotic viewpoint
N. Vijayan and Prashanth L.A.
Systems & Control LettersEstimation of Spectral Risk Measures
Ajay Kumar Pandey, Prashanth L.A. and Sanjay P. Bhat
AAAIConcentration bounds for temporal difference learning with linear function approximation: The case of batch data and uniform sampling
Prashanth L.A., Nathaniel Korda and Remi Munos
Machine Learning
2020
Concentration bounds for CVaR estimation: The cases of light-tailed and heavy-tailed distributions
Prashanth L.A., Krishna Jagannathan and Ravi Kumar Kolla
International Conference on Machine Learning (ICML)Random directions stochastic approximation with deterministic perturbations
Prashanth L.A., S.Bhatnagar, Nirav Bhavsar, Michael Fu and Steve Marcus
IEEE Transactions on Automatic Control
2019
Concentration of risk measures: A Wasserstein distance approach
Sanjay P. Bhat and Prashanth L.A.
Neural Information Processing Systems (NeurIPS)Correlated bandits or: How to minimize mean-squared error online
V.P. Boda and Prashanth L.A.
International Conference on Machine Learning (ICML)Concentration bounds for empirical conditional value-at-risk: The unbounded case
Ravi Kumar Kolla, Prashanth L.A., Sanjay P. Bhat, Krishna Jagannathan
Operations Research Letters
2018
Stochastic optimization in a cumulative prospect theory framework
Jie Cheng, Prashanth L.A., Michael Fu, Steve Marcus and Csaba Szepesvari
IEEE Transactions on Automatic Control, Vol. 63, No. 9, pp. 2867-2882.
2017
Adaptive system optimization using random directions stochastic approximation
Prashanth L.A., S.Bhatnagar, Michael Fu and Steve Marcus
IEEE Transactions on Automatic Control, Vol. 62, Issue 5, pp.2223–2238.Weighted bandits or: How bandits learn distorted values that are not expected
Aditya Gopalan, Prashanth L.A., Michael Fu and Steve Marcus
AAAI Conference on Artificial Intelligence
2016
(Bandit) Convex Optimization with Biased Noisy Gradient Oracles
Xiaowei Hu, Prashanth L.A., Andras Gyorgy and Csaba Szepesvari
Draft.Improved Hessian estimation for adaptive random directions stochastic approximation
D. Sai Koti Reddy, Prashanth L.A. and S.Bhatnagar
IEEE Conference on Decision and Control (CDC)Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control
Prashanth L.A., Jie Cheng, Michael Fu, Steve Marcus and Csaba Szepesvari
International Conference on Machine Learning (ICML)Variance-Constrained Actor-Critic Algorithms for Discounted and Average Reward MDPs
Prashanth L.A. and Mohammad Ghavamzadeh
Machine LearningA constrained optimization perspective on actor critic algorithms and application to network routing
Prashanth L.A., H.L.Prasad, S.Bhatnagar and Prakash Chandra
Systems & Control Letters.(Bandit) Convex Optimization with Biased Noisy Gradient Oracles
Xiaowei Hu, Prashanth L.A., Andras Gyorgy and Csaba Szepesvari
International Conference on Artificial Intelligence and Statistics (AISTATS)
2015
On TD(0) with function approximation: Concentration bounds and a centered variant with exponential convergence
Nathaniel Korda and Prashanth L.A.
International Conference on Machine Learning (ICML)
[Proof has a bug, rendering the bounds invalid. A fix will happen later than sooner..]Two-Timescale Algorithms for Learning Nash Equilibria in General-Sum Stochastic Games
H.L.Prasad, Prashanth L.A. and S.Bhatnagar
International Conference on Autonomous Agents and Multiagent Systems (AAMAS)Fast gradient descent for drifting least squares regression, with application to bandits
Nathaniel Korda, Prashanth L.A. and Remi Munos
AAAI Conference on Artificial IntelligenceSimultaneous Perturbation Methods for Adaptive Labor Staffing in Service Systems
Prashanth L.A., H.L.Prasad, N.Desai, S.Bhatnagar and G.Dasgupta
Simulation, DOI: 10.1177/0037549715581198, pp. 1-24.Simultaneous Perturbation Newton Algorithms for Simulation Optimization
S.Bhatnagar and Prashanth L.A.
Journal of Optimization Theory and Applications, Vol. 164, Issue. 2, pp. 621-643.Actor-Critic Algorithms for Learning Nash Equilibria in N-player General-Sum Games
Prashanth L.A., H.L.Prasad and S.Bhatnagar
Draft
2014
Simultaneous Perturbation Algorithms for Batch Off-Policy Search
Raphael Fonteneau and Prashanth L.A.
IEEE Conference on Decision and Control (CDC)Policy Gradients for CVaR-Constrained MDPs
Prashanth L.A.
International Conference on Algorithmic Learning Theory (ALT)Fast LSTD using stochastic approximation: Finite time analysis and application to traffic control
Prashanth L.A., Nathaniel Korda and Remi Munos
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD)Two Timescale Convergent Q-learning for Sleep–Scheduling in Wireless Sensor Networks
Prashanth L.A., A. Chatterjee and S.Bhatnagar
Wireless Networks, Vol. 20, Issue. 8, pp. 2589-2604.
2013
Actor-Critic Algorithms for Risk-Sensitive MDPs
Prashanth L.A. and Mohammad Ghavamzadeh
Neural Information Processing Systems (NIPS) (Full oral presentation)Mechanisms for Hostile Agents with Capacity Constraints
Prashanth L.A., H.L.Prasad, N.Desai and S.Bhatnagar
International Conference on Autonomous Agents and Multiagent Systems (AAMAS)Stochastic Recursive Algorithms for Optimization: Simultaneous Perturbation Methods
S.Bhatnagar, H.L.Prasad and Prashanth L.A.
Lecture Notes in Control and Information Sciences Series, Vol. 434, Springer, ISBN 978-1-4471-4284-3, Edition: 2013, 302 pages.Adaptive Smoothed Functional Algorithms for Optimal Staffing Levels in Service Systems
H.L.Prasad, L.A.Prashanth, S.Bhatnagar and N.Desai
Service Science (INFORMS), Vol. 5, No. 1, pp. 29-55.
2012
Threshold Tuning using Stochastic Optimization for Graded Signal Control
Prashanth L.A. and S.Bhatnagar
IEEE Transactions on Vehicular Technology, Vol. 61, No. 9, pp.3865-3880.Adaptive feature pursuit: Online adaptation of features in reinforcement learning
S.Bhatnagar, V.S.Borkar and Prashanth L.A.
Reinforcement Learning and Approximate Dynamic Programming for Feedback Control (Ed. F. Lewis and D. Liu), IEEE Press Computational Intelligence Series, pp. 517-534Resource Allocation for Sequential Decision Making under Uncertainty: Studies in Vehicular Traffic Control, Service Systems, Sensor Networks and Mechanism Design
Prashanth L.A.
Ph.D. thesis, Indian Institute of Science (IEEE ITSS Best Ph.D. Dissertation 2014 - Third Prize).
2011
Stochastic optimization for adaptive labor staffing in service systems
Prashanth L.A., H.L.Prasad, N.Desai, S.Bhatnagar and G.Dasgupta
International Conference on Service Oriented Computing (ICSOC)Reinforcement Learning with Average Cost for Adaptive Control of Traffic Lights at Intersections
Prashanth L.A. and S.Bhatnagar
IEEE Intelligent Transportation Systems ConferenceReinforcement learning with function approximation for traffic signal control
Prashanth L.A. and S.Bhatnagar
IEEE Transactions on Intelligent Transportation Systems, Vol. 12, No. 2, pp.412-421.
Copyright Notice: Since most of these papers are published, the copyright has been transferred to the respective publishers. The following is IEEE's copyright notice; other publishers have similar ones.
IEEE Copyright Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therin are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of the works published in IEEE publications in other works must be obtained from the IEEE.