Grandmaster level in StarCraft II using multi-agent reinforcement learning O Vinyals, I Babuschkin, WM Czarnecki, M Mathieu, A Dudzik, J Chung, ... nature 575 (7782), 350-354, 2019 | 5443* | 2019 |
Rainbow: Combining improvements in deep reinforcement learning M Hessel, J Modayil, H Van Hasselt, T Schaul, G Ostrovski, W Dabney, ... Thirty-Second AAAI Conference on Artificial Intelligence, 2018 | 2860 | 2018 |
Gemini: a family of highly capable multimodal models G Team, R Anil, S Borgeaud, JB Alayrac, J Yu, R Soricut, J Schalkwyk, ... arXiv preprint arXiv:2312.11805, 2023 | 2249 | 2023 |
Deep Q-learning from Demonstrations T Hester, M Vecerik, O Pietquin, M Lanctot, T Schaul, B Piot, D Horgan, ... Association for the Advancement of Artificial Intelligence (AAAI), 2018 | 1330 | 2018 |
Universal Value Function Approximators T Schaul, D Horgan, K Gregor, D Silver Proceedings of the 32nd International Conference on Machine Learning (ICML …, 2015 | 1283 | 2015 |
Distributed Prioritized Experience Replay D Horgan, J Quan, D Budden, G Barth-Maron, M Hessel, H van Hasselt, ... International Conference on Learning Representations 2018, 2018 | 946 | 2018 |
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context G Team, P Georgiev, VI Lei, R Burnell, L Bai, A Gulati, G Tanzer, ... arXiv preprint arXiv:2403.05530, 2024 | 724 | 2024 |
Distributed distributional deterministic policy gradients G Barth-Maron, MW Hoffman, D Budden, W Dabney, D Horgan, D Tb, ... arXiv preprint arXiv:1804.08617, 2018 | 679 | 2018 |
Observe and look further: Achieving consistent performance on atari T Pohlen, B Piot, T Hester, MG Azar, D Horgan, D Budden, G Barth-Maron, ... arXiv preprint arXiv:1805.11593, 2018 | 143 | 2018 |
Unicorn: Continual learning with a universal, off-policy agent DJ Mankowitz, A Žídek, A Barreto, D Horgan, M Hessel, J Quan, J Oh, ... arXiv preprint arXiv:1802.08294, 2018 | 50 | 2018 |
Vision-language models as a source of rewards K Baumli, S Baveja, F Behbahani, H Chan, G Comanici, S Flennerhag, ... arXiv preprint arXiv:2312.09187, 2023 | 18 | 2023 |
Selecting reinforcement learning actions using goals and observations T Schaul, DG Horgan, K Gregor, D Silver US Patent 10,628,733, 2020 | 15 | 2020 |
Reinforcement learning using distributed prioritized replay D Budden, G Barth-Maron, J Quan, DG Horgan US Patent 11,625,604, 2023 | 11 | 2023 |