Volgen
Zhaohan Daniel Guo
Zhaohan Daniel Guo
DeepMind
Geverifieerd e-mailadres voor google.com - Homepage
Titel
Geciteerd door
Geciteerd door
Jaar
Bootstrap your own latent-a new approach to self-supervised learning
JB Grill, F Strub, F Altché, C Tallec, P Richemond, E Buchatskaya, ...
Advances in neural information processing systems 33, 21271-21284, 2020
70332020
Agent57: Outperforming the atari human benchmark
AP Badia, B Piot, S Kapturowski, P Sprechmann, A Vitvitskyi, ZD Guo, ...
International conference on machine learning, 507-517, 2020
6992020
koray kavukcuoglu, Remi Munos, and Michal Valko. Bootstrap your own latent-a new approach to self-supervised learning
JB Grill, F Strub, F Altché, C Tallec, P Richemond, E Buchatskaya, ...
Advances in neural information processing systems 33, 21271-21284, 2020
5112020
Never give up: Learning directed exploration strategies
AP Badia, P Sprechmann, A Vitvitskyi, D Guo, B Piot, S Kapturowski, ...
arXiv preprint arXiv:2002.06038, 2020
3612020
A general theoretical paradigm to understand learning from human preferences
MG Azar, ZD Guo, B Piot, R Munos, M Rowland, M Valko, D Calandriello
International Conference on Artificial Intelligence and Statistics, 4447-4455, 2024
3092024
Joint semantic utterance classification and slot filling with recursive neural networks
D Guo, G Tur, W Yih, G Zweig
2014 IEEE Spoken Language Technology Workshop (SLT), 554-559, 2014
2532014
Bootstrap latent-predictive representations for multitask reinforcement learning
ZD Guo, BA Pires, B Piot, JB Grill, F Altché, R Munos, MG Azar
International Conference on Machine Learning, 3875-3886, 2020
1612020
Neural predictive belief representations
ZD Guo, MG Azar, B Piot, BA Pires, R Munos
arXiv preprint arXiv:1811.06407, 2018
932018
Nash learning from human feedback
R Munos, M Valko, D Calandriello, MG Azar, M Rowland, ZD Guo, Y Tang, ...
arXiv preprint arXiv:2312.00886, 2023
802023
A pac rl algorithm for episodic pomdps
ZD Guo, S Doroudi, E Brunskill
Artificial Intelligence and Statistics, 510-518, 2016
712016
Byol-explore: Exploration by bootstrapped prediction
Z Guo, S Thakoor, M Pîslar, B Avila Pires, F Altché, C Tallec, A Saade, ...
Advances in neural information processing systems 35, 31855-31870, 2022
682022
Generalized preference optimization: A unified approach to offline alignment
Y Tang, ZD Guo, Z Zheng, D Calandriello, R Munos, M Rowland, ...
arXiv preprint arXiv:2402.05749, 2024
522024
Using options and covariance testing for long horizon off-policy policy evaluation
Z Guo, PS Thomas, E Brunskill
Advances in Neural Information Processing Systems 30, 2017
522017
Bootstrap your own latent: A new approach to self-supervised learning. arXiv
JB Grill, F Strub, F Altché, C Tallec, PH Richemond, E Buchatskaya, ...
arXiv preprint arXiv:2006.07733, 2020
442020
Geometric entropic exploration
ZD Guo, MG Azar, A Saade, S Thakoor, B Piot, BA Pires, M Valko, ...
arXiv preprint arXiv:2101.02055, 2021
402021
Understanding self-predictive learning for reinforcement learning
Y Tang, ZD Guo, PH Richemond, BA Pires, Y Chandak, R Munos, ...
International Conference on Machine Learning, 33632-33656, 2023
322023
Concurrent pac rl
Z Guo, E Brunskill
Proceedings of the AAAI Conference on Artificial Intelligence 29 (1), 2015
312015
Understanding the performance gap between online and offline alignment algorithms
Y Tang, DZ Guo, Z Zheng, D Calandriello, Y Cao, E Tarassov, R Munos, ...
arXiv preprint arXiv:2405.08448, 2024
292024
Pac continuous state online multitask reinforcement learning with identification
Y Liu, Z Guo, E Brunskill
Proceedings of the 2016 International Conference on Autonomous Agents …, 2016
222016
Charline Le Lan, Michal Valko, Tianqi Liu, et al. Human alignment of large language models through online preference optimisation
D Calandriello, D Guo, R Munos, M Rowland, Y Tang, BA Pires, ...
arXiv preprint arXiv:2403.08635, 2024
182024
Het systeem kan de bewerking nu niet uitvoeren. Probeer het later opnieuw.
Artikelen 1–20