Volgen
Wenhao Zhan
Wenhao Zhan
Graduate Student, Princeton University
Geverifieerd e-mailadres voor princeton.edu - Homepage
Titel
Geciteerd door
Geciteerd door
Jaar
Offline reinforcement learning with realizability and single-policy concentrability
W Zhan, B Huang, A Huang, N Jiang, J Lee
Conference on Learning Theory, 2730-2775, 2022
1272022
Policy mirror descent for regularized reinforcement learning: A generalized framework with linear convergence
W Zhan*, S Cen*, B Huang, Y Chen, JD Lee, Y Chi
SIAM Journal on Optimization 33 (2), 1061-1091, 2023
862023
Provable Offline Preference-Based Reinforcement Learning
W Zhan, M Uehara, N Kallus, JD Lee, W Sun
The Twelfth International Conference on Learning Representations, 2024
51*2024
Pac reinforcement learning for predictive state representations
W Zhan, M Uehara, W Sun, JD Lee
The Eleventh International Conference on Learning Representations, 2023
462023
Provable Reward-Agnostic Preference-Based Reinforcement Learning
W Zhan, M Uehara, W Sun, JD Lee
The Twelfth International Conference on Learning Representations, 2024
27*2024
Dataset Reset Policy Optimization for RLHF
JD Chang, W Zhan, O Oertell, K Brantley, D Misra, JD Lee, W Sun
arXiv preprint arXiv:2404.08495, 2024
192024
REBEL: Reinforcement Learning via Regressing Relative Rewards
Z Gao, JD Chang, W Zhan, O Oertell, G Swamy, K Brantley, T Joachims, ...
arXiv preprint arXiv:2404.16767, 2024
152024
Decentralized optimistic hyperpolicy mirror descent: Provably no-regret learning in markov games
W Zhan, JD Lee, Z Yang
The Eleventh International Conference on Learning Representations, 2023
132023
Reward-agnostic fine-tuning: Provable statistical benefits of hybrid reinforcement learning
G Li, W Zhan, JD Lee, Y Chi, Y Chen
Advances in Neural Information Processing Systems 36, 2023
112023
Optimal multi-distribution learning
Z Zhang, W Zhan, Y Chen, SS Du, JD Lee
The Thirty Seventh Annual Conference on Learning Theory, 5220-5223, 2024
102024
Correcting the Mythos of KL-Regularization: Direct Alignment without Overoptimization via Chi-Squared Preference Optimization
A Huang, W Zhan, T Xie, JD Lee, W Sun, A Krishnamurthy, DJ Foster
arXiv preprint arXiv:2407.13399, 2024
3*2024
Provably Efficient CVaR RL in Low-rank MDPs
Y Zhao, W Zhan, X Hu, H Leung, F Farnia, W Sun, JD Lee
The Twelfth International Conference on Learning Representations, 2024
32024
Over-the-Air Statistical Estimation of Sparse Models
CZ Lee, LP Barnes, W Zhan, A Özgür
2021 IEEE Global Communications Conference (GLOBECOM), 1-6, 2021
12021
Delay Optimal Cross-Layer Scheduling Over Markov Channels with Power Constraint
W Zhan, H Tang, J Wang
2020 IEEE International Symposium on Broadband Multimedia Systems and …, 2020
12020
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Z Gao, W Zhan, JD Chang, G Swamy, K Brantley, JD Lee, W Sun
arXiv preprint arXiv:2410.04612, 2024
2024
Exploiting Structure in Offline Multi-Agent RL: The Benefits of Low Interaction Rank
W Zhan, S Fujimoto, Z Zhu, JD Lee, DR Jiang, Y Efroni
arXiv preprint arXiv:2410.01101, 2024
2024
Het systeem kan de bewerking nu niet uitvoeren. Probeer het later opnieuw.
Artikelen 1–16