Volgen
Zhuoran Yang
Titel
Geciteerd door
Geciteerd door
Jaar
Multi-agent reinforcement learning: A selective overview of theories and algorithms
K Zhang, Z Yang, T Başar
Handbook of reinforcement learning and control, 321-384, 2021
15742021
Provably efficient reinforcement learning with linear function approximation
C Jin, Z Yang, Z Wang, MI Jordan
Mathematics of Operations Research 48 (3), 1496-1521, 2023
8162023
A Theoretical Analysis of Deep Q-Learning
781*2020
Fully decentralized multi-agent reinforcement learning with networked agents
K Zhang, Z Yang, H Liu, T Zhang, T Basar
International conference on machine learning, 5872-5881, 2018
6892018
Is pessimism provably efficient for offline rl?
Y Jin, Z Yang, Z Wang
International Conference on Machine Learning, 5084-5096, 2021
4082021
Provably efficient exploration in policy optimization
Q Cai, Z Yang, C Jin, Z Wang
International Conference on Machine Learning, 1283-1294, 2020
3062020
A two-timescale stochastic algorithm framework for bilevel optimization: Complexity analysis and application to actor-critic
M Hong, HT Wai, Z Wang, Z Yang
SIAM Journal on Optimization 33 (1), 147-180, 2023
2782023
Neural policy gradient methods: Global optimality and rates of convergence
L Wang, Q Cai, Z Yang, Z Wang
arXiv preprint arXiv:1909.01150, 2019
2562019
Neural trust region/proximal policy optimization attains globally optimal policy
B Liu, Q Cai, Z Yang, Z Wang
Advances in neural information processing systems 32, 2019
2182019
Multi-agent reinforcement learning via double averaging primal-dual optimization
HT Wai, Z Yang, Z Wang, M Hong
Advances in Neural Information Processing Systems 31, 2018
2022018
Provably efficient safe exploration via primal-dual policy optimization
D Ding, X Wei, Z Yang, Z Wang, M Jovanovic
International conference on artificial intelligence and statistics, 3304-3312, 2021
1772021
Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium
Q Xie, Y Chen, Z Wang, Z Yang
Mathematics of Operations Research 48 (1), 433-462, 2023
1542023
Neural temporal difference and q learning provably converge to global optima
Q Cai, Z Yang, JD Lee, Z Wang
Mathematics of Operations Research 49 (1), 619-651, 2024
153*2024
Provably global convergence of actor-critic: A case for linear quadratic regulator with ergodic cost
Z Yang, Y Chen, M Hong, Z Wang
Advances in neural information processing systems 32, 2019
1492019
Policy optimization provably converges to Nash equilibria in zero-sum linear quadratic games
K Zhang, Z Yang, T Basar
Advances in Neural Information Processing Systems 32, 2019
1432019
A near-optimal algorithm for stochastic bilevel optimization via double-momentum
P Khanduri, S Zeng, M Hong, HT Wai, Z Wang, Z Yang
Advances in neural information processing systems 34, 30271-30283, 2021
1312021
Convergent policy optimization for safe reinforcement learning
M Yu, Z Yang, M Kolar, Z Wang
Advances in Neural Information Processing Systems 32, 2019
1282019
On function approximation in reinforcement learning: Optimism in the face of large state spaces
Z Yang, C Jin, Z Wang, M Wang, MI Jordan
arXiv preprint arXiv:2011.04622, 2020
120*2020
Networked multi-agent reinforcement learning in continuous spaces
K Zhang, Z Yang, T Basar
2018 IEEE conference on decision and control (CDC), 2771-2776, 2018
1172018
Neural certificates for safe control policies
W Jin, Z Wang, Z Yang, S Mou
arXiv preprint arXiv:2006.08465, 2020
922020
Het systeem kan de bewerking nu niet uitvoeren. Probeer het later opnieuw.
Artikelen 1–20