2025
- Finite Time Analysis of Temporal Difference Learning for Mean-Variance in a Discounted MDP 
 Tejaram Sangadi, Prashanth L.A., Krishna Jagannathan
 Reinforcement Learning Conference (RLC) (Accepted)
- Risk-sensitive Bandits: Arm Mixture Optimality and Regret-efficient Algorithms 
 Meltem Tatli, Arpan Mukherjee, Prashanth L. A., Karthikeyan Shanmugam, Ali Tajer
 AISTATS
- Generalized Simultaneous Perturbation-based Gradient Search with Reduced Estimator Bias 
 Soumen Pachal, S.Bhatnagar and Prashanth L.A.
 IEEE Transactions on Automatic Control
- Preference-centric Bandits: Optimality of Mixtures and Regret-efficient Algorithms 
 Meltem Tatlı, Arpan Mukherjee, Prashanth L.A., Karthikeyan Shanmugam, Ali Tajer
 Under review.
- Adaptive Estimation of Random Vectors with Bandit Feedback: A mean-squared error viewpoint 
 Ayon Ghosh, Prashanth L.A., Dipayan Sen, Aditya Gopalan
 Under review.
- Optimizing Shortfall Risk Metric for Learning Regression Models 
 Harish G. Ramaswamy and Prashanth L.A.
 Under review.
- Learning to optimize convex risk measures: The cases of utility-based shortfall risk and optimized certainty equivalent risk 
 Sumedh Gupte, Prashanth L.A., Sanjay Bhat
 Under review.
- Concentration Bounds for Optimized Certainty Equivalent Risk Estimation 
 Ayon Ghosh, Prashanth L.A., Krishna Jagannathan
 Under review.
- Policy Newton methods for Distortion Riskmetrics 
 Soumen Pachal, Mizhaan Prajit Maniyar, Prashanth L.A.
 Under review.
2024
- Optimization of utility-based shortfall risk: A non-asymptotic viewpoint 
 Sumedh Gupte, Prashanth L.A., Sanjay Bhat
 CDC
- Online Estimation and Optimization of Utility-Based Shortfall Risk 
 Vishwajit Hegde, Arvind S. Menon, Prashanth L.A., Krishna Jagannathan
 Mathematics of Operations Research
- Policy Evaluation for Variance in Risk-Sensitive Average Reward Reinforcement Learning 
 Shubhada Agrawal, Prashanth L.A., Siva Theja Maguluri
 ICML
- Risk Estimation in a Markov Cost Process: Lower and Upper Bounds 
 Gugan Thoppe, Prashanth L.A., Sanjay Bhat
 ICML
- A Cubic-regularized Policy Newton Algorithm for Reinforcement Learning 
 Mizhaan Prajit Maniyar, Prashanth L.A., Akash Mondal, Shalabh Bhatnagar
 AISTATS
- A Gradient Smoothed Functional Algorithm with Truncated Cauchy Random Perturbations for Stochastic Optimization 
 Akash Mondal, Prashanth L.A., Shalabh Bhatnagar
 Automatica
2023
- Adaptive Estimation of Random Vectors with Bandit Feedback 
 Dipayan Sen, Prashanth L.A., Aditya Gopalan
 ICC
- A policy gradient approach for optimization of smooth risk measures 
 N. Vijayan and Prashanth L.A.
 UAI
- Generalized Simultaneous Perturbation Stochastic Approximation with Reduced Estimator Bias 
 S.Bhatnagar and Prashanth L.A.
 CISS
- Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation 
 Gandharv Patil, Prashanth L.A., Dheeraj Nagaraj, Doina Precup
 AISTATS
- Bandit algorithms to emulate human decision making using probabilistic distortions 
 Ravi Kumar Kolla, Prashanth L.A., Aditya Gopalan, Krishna Jagannathan, Michael Fu, Steve Marcus
 Draft.
2022
- A Wasserstein distance approach for concentration of empirical risk estimates 
 Prashanth L.A. and Sanjay P. Bhat
 Journal of Machine Learning Research
- Risk-Sensitive Reinforcement Learning via Policy Gradient Search 
 Prashanth L.A. and Michael Fu
 Foundations and Trends in Machine Learning
- A Survey of Risk-Aware Multi-Armed Bandits 
 Vincent Y. F. Tan, Prashanth L.A., and Krishna Jagannathan
 International Joint Conference on Artificial Intelligence (IJCAI) (Survey track)
- Non-asymptotic bounds for stochastic optimization with biased noisy gradient oracles 
 Nirav Bhavsar and Prashanth L.A.
 IEEE Transactions on Automatic Control
2021
- Smoothed functional-based gradient algorithms for off-policy reinforcement learning: A non-asymptotic viewpoint 
 N. Vijayan and Prashanth L.A.
 Systems & Control Letters
- Estimation of Spectral Risk Measures 
 Ajay Kumar Pandey, Prashanth L.A. and Sanjay P. Bhat
 AAAI
- Concentration bounds for temporal difference learning with linear function approximation: The case of batch data and uniform sampling 
 Prashanth L.A., Nathaniel Korda and Remi Munos
 Machine Learning
2020
- Concentration bounds for CVaR estimation: The cases of light-tailed and heavy-tailed distributions 
 Prashanth L.A., Krishna Jagannathan and Ravi Kumar Kolla
 International Conference on Machine Learning (ICML)
- Random directions stochastic approximation with deterministic perturbations 
 Prashanth L.A., S.Bhatnagar, Nirav Bhavsar, Michael Fu and Steve Marcus
 IEEE Transactions on Automatic Control
2019
- Concentration of risk measures: A Wasserstein distance approach 
 Sanjay P. Bhat and Prashanth L.A.
 Neural Information Processing Systems (NeurIPS)
- Correlated bandits or: How to minimize mean-squared error online 
 V.P. Boda and Prashanth L.A.
 International Conference on Machine Learning (ICML)
- Concentration bounds for empirical conditional value-at-risk: The unbounded case 
 Ravi Kumar Kolla, Prashanth L.A., Sanjay P. Bhat, Krishna Jagannathan
 Operations Research Letters
2018
- Stochastic optimization in a cumulative prospect theory framework 
 Jie Cheng, Prashanth L.A., Michael Fu, Steve Marcus and Csaba Szepesvari
 IEEE Transactions on Automatic Control, Vol. 63, No. 9, pp. 2867-2882.
2017
- Adaptive system optimization using random directions stochastic approximation 
 Prashanth L.A., S.Bhatnagar, Michael Fu and Steve Marcus
 IEEE Transactions on Automatic Control, Vol. 62, Issue 5, pp.2223–2238.
- Weighted bandits or: How bandits learn distorted values that are not expected 
 Aditya Gopalan, Prashanth L.A., Michael Fu and Steve Marcus
 AAAI Conference on Artificial Intelligence
2016
- (Bandit) Convex Optimization with Biased Noisy Gradient Oracles 
 Xiaowei Hu, Prashanth L.A., Andras Gyorgy and Csaba Szepesvari
 Draft.
- Improved Hessian estimation for adaptive random directions stochastic approximation 
 D. Sai Koti Reddy, Prashanth L.A. and S.Bhatnagar
 IEEE Conference on Decision and Control (CDC)
- Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control 
 Prashanth L.A., Jie Cheng, Michael Fu, Steve Marcus and Csaba Szepesvari
 International Conference on Machine Learning (ICML)
- Variance-Constrained Actor-Critic Algorithms for Discounted and Average Reward MDPs 
 Prashanth L.A. and Mohammad Ghavamzadeh
 Machine Learning
- A constrained optimization perspective on actor critic algorithms and application to network routing 
 Prashanth L.A., H.L.Prasad, S.Bhatnagar and Prakash Chandra
 Systems & Control Letters.
- (Bandit) Convex Optimization with Biased Noisy Gradient Oracles 
 Xiaowei Hu, Prashanth L.A., Andras Gyorgy and Csaba Szepesvari
 International Conference on Artificial Intelligence and Statistics (AISTATS)
2015
- On TD(0) with function approximation: Concentration bounds and a centered variant with exponential convergence 
 Nathaniel Korda and Prashanth L.A.
 International Conference on Machine Learning (ICML)
 [Proof has a bug, rendering the bounds invalid. A fix will happen later than sooner..]
- Two-Timescale Algorithms for Learning Nash Equilibria in General-Sum Stochastic Games 
 H.L.Prasad, Prashanth L.A. and S.Bhatnagar
 International Conference on Autonomous Agents and Multiagent Systems (AAMAS)
- Fast gradient descent for drifting least squares regression, with application to bandits 
 Nathaniel Korda, Prashanth L.A. and Remi Munos
 AAAI Conference on Artificial Intelligence
- Simultaneous Perturbation Methods for Adaptive Labor Staffing in Service Systems 
 Prashanth L.A., H.L.Prasad, N.Desai, S.Bhatnagar and G.Dasgupta
 Simulation, DOI: 10.1177/0037549715581198, pp. 1-24.
- Simultaneous Perturbation Newton Algorithms for Simulation Optimization 
 S.Bhatnagar and Prashanth L.A.
 Journal of Optimization Theory and Applications, Vol. 164, Issue. 2, pp. 621-643.
- Actor-Critic Algorithms for Learning Nash Equilibria in N-player General-Sum Games 
 Prashanth L.A., H.L.Prasad and S.Bhatnagar
 Draft
2014
- Simultaneous Perturbation Algorithms for Batch Off-Policy Search 
 Raphael Fonteneau and Prashanth L.A.
 IEEE Conference on Decision and Control (CDC)
- Policy Gradients for CVaR-Constrained MDPs 
 Prashanth L.A.
 International Conference on Algorithmic Learning Theory (ALT)
- Fast LSTD using stochastic approximation: Finite time analysis and application to traffic control 
 Prashanth L.A., Nathaniel Korda and Remi Munos
 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD)
- Two Timescale Convergent Q-learning for Sleep–Scheduling in Wireless Sensor Networks 
 Prashanth L.A., A. Chatterjee and S.Bhatnagar
 Wireless Networks, Vol. 20, Issue. 8, pp. 2589-2604.
2013
- Actor-Critic Algorithms for Risk-Sensitive MDPs 
 Prashanth L.A. and Mohammad Ghavamzadeh
 Neural Information Processing Systems (NIPS) (Full oral presentation)
- Mechanisms for Hostile Agents with Capacity Constraints 
 Prashanth L.A., H.L.Prasad, N.Desai and S.Bhatnagar
 International Conference on Autonomous Agents and Multiagent Systems (AAMAS)
- Stochastic Recursive Algorithms for Optimization: Simultaneous Perturbation Methods 
 S.Bhatnagar, H.L.Prasad and Prashanth L.A.
 Lecture Notes in Control and Information Sciences Series, Vol. 434, Springer, ISBN 978-1-4471-4284-3, Edition: 2013, 302 pages.
- Adaptive Smoothed Functional Algorithms for Optimal Staffing Levels in Service Systems 
 H.L.Prasad, L.A.Prashanth, S.Bhatnagar and N.Desai
 Service Science (INFORMS), Vol. 5, No. 1, pp. 29-55.
2012
- Threshold Tuning using Stochastic Optimization for Graded Signal Control 
 Prashanth L.A. and S.Bhatnagar
 IEEE Transactions on Vehicular Technology, Vol. 61, No. 9, pp.3865-3880.
- Adaptive feature pursuit: Online adaptation of features in reinforcement learning 
 S.Bhatnagar, V.S.Borkar and Prashanth L.A.
 Reinforcement Learning and Approximate Dynamic Programming for Feedback Control (Ed. F. Lewis and D. Liu), IEEE Press Computational Intelligence Series, pp. 517-534
- Resource Allocation for Sequential Decision Making under Uncertainty: Studies in Vehicular Traffic Control, Service Systems, Sensor Networks and Mechanism Design 
 Prashanth L.A.
 Ph.D. thesis, Indian Institute of Science (IEEE ITSS Best Ph.D. Dissertation 2014 - Third Prize).
2011
- Stochastic optimization for adaptive labor staffing in service systems 
 Prashanth L.A., H.L.Prasad, N.Desai, S.Bhatnagar and G.Dasgupta
 International Conference on Service Oriented Computing (ICSOC)
- Reinforcement Learning with Average Cost for Adaptive Control of Traffic Lights at Intersections 
 Prashanth L.A. and S.Bhatnagar
 IEEE Intelligent Transportation Systems Conference
- Reinforcement learning with function approximation for traffic signal control 
 Prashanth L.A. and S.Bhatnagar
 IEEE Transactions on Intelligent Transportation Systems, Vol. 12, No. 2, pp.412-421.
Copyright Notice: Since most of these papers are published, the copyright has been transferred to the respective publishers. The following is IEEE's copyright notice; other publishers have similar ones.
IEEE Copyright Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therin are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of the works published in IEEE publications in other works must be obtained from the IEEE.
