Volgen
Nino Vieillard
Nino Vieillard
Google DeepMind
Geverifieerd e-mailadres voor google.com
Titel
Geciteerd door
Geciteerd door
Jaar
Gemini: a family of highly capable multimodal models
G Team, R Anil, S Borgeaud, Y Wu, JB Alayrac, J Yu, R Soricut, ...
arXiv preprint arXiv:2312.11805, 2023
13732023
Acme: A research framework for distributed reinforcement learning
MW Hoffman, B Shahriari, J Aslanides, G Barth-Maron, N Momchev, ...
arXiv preprint arXiv:2006.00979, 2022
2492022
Leverage the average: an analysis of KL regularization in reinforcement learning
N Vieillard, T Kozuno, B Scherrer, O Pietquin, R Munos, M Geist
Advances in Neural Information Processing Systems 33, 12163-12174, 2020
119*2020
Munchausen reinforcement learning
N Vieillard, O Pietquin, M Geist
Advances in Neural Information Processing Systems 33, 4235-4246, 2020
942020
On-policy distillation of language models: Learning from self-generated mistakes
R Agarwal*, N Vieillard*, Y Zhou, P Stanczyk, SR Garea, M Geist, ...
The Twelfth International Conference on Learning Representations, 2024
65*2024
Offline reinforcement learning as anti-exploration
S Rezaeifar, R Dadashi, N Vieillard, L Hussenot, O Bachem, O Pietquin, ...
Proceedings of the AAAI Conference on Artificial Intelligence 36 (7), 8106-8114, 2022
542022
Factually consistent summarization via reinforcement learning with textual entailment feedback
P Roit, J Ferret, L Shani, R Aharoni, G Cideron, R Dadashi, M Geist, ...
arXiv preprint arXiv:2306.00186, 2023
532023
Momentum in reinforcement learning
N Vieillard, B Scherrer, O Pietquin, M Geist
International Conference on Artificial Intelligence and Statistics, 2529-2538, 2020
392020
Offline reinforcement learning with pseudometric learning
R Dadashi, S Rezaeifar, N Vieillard, L Hussenot, O Pietquin, M Geist
International Conference on Machine Learning, 2307-2318, 2021
382021
Warm: On the benefits of weight averaged reward models
A Ramé, N Vieillard, L Hussenot, R Dadashi, G Cideron, O Bachem, ...
arXiv preprint arXiv:2401.12187, 2024
332024
Deep conservative policy iteration
N Vieillard, O Pietquin, M Geist
Proceedings of the AAAI Conference on Artificial Intelligence 34 (04), 6070-6077, 2020
292020
Gemma 2: Improving open language models at a practical size
G Team, M Riviere, S Pathak, PG Sessa, C Hardin, S Bhupatiraju, ...
arXiv preprint arXiv:2408.00118, 2024
202024
On connections between constrained optimization and reinforcement learning
N Vieillard, O Pietquin, M Geist
arXiv preprint arXiv:1910.08476, 2019
182019
Implicitly regularized rl with implicit q-values
N Vieillard, M Andrychowicz, A Raichuk, O Pietquin, M Geist
arXiv preprint arXiv:2108.07041, 2021
82021
Kl-entropy-regularized rl with a generative model is minimax optimal
T Kozuno, W Yang, N Vieillard, T Kitamura, Y Tang, J Mei, P Ménard, ...
arXiv preprint arXiv:2205.14211, 2022
72022
Bond: Aligning llms with best-of-n distillation
PG Sessa, R Dadashi, L Hussenot, J Ferret, N Vieillard, A Ramé, ...
arXiv preprint arXiv:2407.14622, 2024
32024
Regularization and variance-weighted regression achieves minimax optimality in linear MDPs: theory and practice
T Kitamura, T Kozuno, Y Tang, N Vieillard, M Valko, W Yang, J Mei, ...
International Conference on Machine Learning, 17135-17175, 2023
22023
Training reinforcement learning agents using augmented temporal difference learning
MF Geist, N Vieillard, OC Pietquin
US Patent App. 17/347,264, 2021
22021
WARP: On the Benefits of Weight Averaged Rewarded Policies
A Ramé, J Ferret, N Vieillard, R Dadashi, L Hussenot, PL Cedoz, ...
arXiv preprint arXiv:2406.16768, 2024
12024
Imitating Language via Scalable Inverse Reinforcement Learning
M Wulfmeier, M Bloesch, N Vieillard, A Ahuja, J Bornschein, S Huang, ...
arXiv preprint arXiv:2409.01369, 2024
2024
Het systeem kan de bewerking nu niet uitvoeren. Probeer het later opnieuw.
Artikelen 1–20