Rémi Munos
Rémi Munos
DeepMind
Geverifieerd e-mailadres voor inria.fr - Homepage
Titel
Geciteerd door
Geciteerd door
Jaar
Unifying count-based exploration and intrinsic motivation
M Bellemare, S Srinivasan, G Ostrovski, T Schaul, D Saxton, R Munos
Advances in neural information processing systems, 1471-1479, 2016
5682016
Online optimization in X-armed bandits
S Bubeck, G Stoltz, C Szepesvári, R Munos
Advances in Neural Information Processing Systems, 201-208, 2009
4832009
Modification of UCT with patterns in Monte-Carlo Go
S Gelly, Y Wang, R Munos, O Teytaud
INRIA, 2006
4752006
Exploration–exploitation tradeoff using variance estimates in multi-armed bandits
JY Audibert, R Munos, C Szepesvári
Theoretical Computer Science 410 (19), 1876-1902, 2009
4742009
Best arm identification in multi-armed bandits
JY Audibert, S Bubeck
4572010
Thompson sampling: An asymptotically optimal finite-time analysis
E Kaufmann, N Korda, R Munos
International conference on algorithmic learning theory, 199-213, 2012
4362012
A distributional perspective on reinforcement learning
MG Bellemare, W Dabney, R Munos
arXiv preprint arXiv:1707.06887, 2017
4022017
Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures
L Espeholt, H Soyer, R Munos, K Simonyan, V Mnih, T Ward, Y Doron, ...
arXiv preprint arXiv:1802.01561, 2018
3862018
Sample efficient actor-critic with experience replay
Z Wang, V Bapst, N Heess, V Mnih, R Munos, K Kavukcuoglu, ...
arXiv preprint arXiv:1611.01224, 2016
3642016
Variable resolution discretization in optimal control
R Munos, A Moore
Machine learning 49 (2-3), 291-323, 2002
3512002
Pure exploration in multi-armed bandits problems
S Bubeck, R Munos, G Stoltz
International conference on Algorithmic learning theory, 23-37, 2009
3322009
Learning to reinforcement learn
JX Wang, Z Kurth-Nelson, D Tirumala, H Soyer, JZ Leibo, R Munos, ...
arXiv preprint arXiv:1611.05763, 2016
3152016
Noisy networks for exploration
M Fortunato, MG Azar, B Piot, J Menick, I Osband, A Graves, V Mnih, ...
arXiv preprint arXiv:1706.10295, 2017
2942017
Safe and efficient off-policy reinforcement learning
R Munos, T Stepleton, A Harutyunyan, M Bellemare
Advances in Neural Information Processing Systems, 1054-1062, 2016
2882016
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
A Antos, C Szepesvári, R Munos
Machine Learning 71 (1), 89-129, 2008
2772008
Finite-time bounds for fitted value iteration
R Munos, C Szepesvári
Journal of Machine Learning Research 9 (May), 815-857, 2008
2672008
Kullback–leibler upper confidence bounds for optimal sequential allocation
O Cappé, A Garivier, OA Maillard, R Munos, G Stoltz
The Annals of Statistics 41 (3), 1516-1541, 2013
2412013
Gaussian process dynamic programming
MP Deisenroth, CE Rasmussen, J Peters
Neurocomputing 72 (7-9), 1508-1524, 2009
2322009
Count-based exploration with neural density models
G Ostrovski, MG Bellemare, A Oord, R Munos
arXiv preprint arXiv:1703.01310, 2017
2252017
Bandit algorithms for tree search
PA Coquelin, R Munos
arXiv preprint cs/0703062, 2007
2202007
Het systeem kan de bewerking nu niet uitvoeren. Probeer het later opnieuw.
Artikelen 1–20