Shengyi Huang
Shengyi Huang
Hugging Face
E-mailová adresa ověřena na: - Domovská stránka
A closer look at invalid action masking in policy gradient algorithms
S Huang, S Ontañón
The International FLAIRS Conference 2022 35, 2022
Zephyr: Direct distillation of lm alignment
L Tunstall, E Beeching, N Lambert, N Rajani, K Rasul, Y Belkada, ...
arXiv preprint arXiv:2310.16944, 2023
Cleanrl: High-quality single-file implementations of deep reinforcement learning algorithms
S Huang, RFJ Dossa, C Ye, J Braga, D Chakraborty, K Mehta, ...
Journal of Machine Learning Research 23 (274), 1-18, 2022
Trl: Transformer reinforcement learning
L von Werra, Y Belkada, L Tunstall, E Beeching, T Thrush, N Lambert, ...
GitHub. Available online at: https://github. com/lvwerra/trl, 2020
The 37 Implementation Details of Proximal Policy Optimization
S Huang, RFJ Dossa, A Raffin, A Kanervisto, W Wang
International Conference on Learning Representations Blog Track, 2022
Envpool: A highly parallel reinforcement learning environment execution engine
J Weng, M Lin, S Huang, B Liu, D Makoviichuk, V Makoviychuk, Z Liu, ...
Advances in Neural Information Processing Systems 35, 22409-22421, 2022
Gym-RTS: Toward Affordable Full Game Real-time Strategy Games Research with Deep Reinforcement Learning
S Huang, S Ontañón, C Bamford, L Grela
Proceedings of the 3rd IEEE Conference on Games, 2021
A2C is a special case of PPO
S Huang, A Kanervisto, A Raffin, W Wang, S Ontañón, RFJ Dossa
arXiv preprint arXiv:2205.09123, 2022
An empirical investigation of early stopping optimizations in proximal policy optimization
RFJ Dossa, S Huang, S Ontañón, T Matsubara
IEEE Access 9, 117981-117992, 2021
Action guidance: Getting the best of sparse rewards and shaped rewards for real-time strategy games
S Huang, S Ontañón
AIIDE-20 Workshop on Artificial Intelligence for Strategy Games, 2020
Comparing Observation and Action Representations for Deep Reinforcement Learning in RTS
S Huang, S Ontañón
AIIDE-19 Workshop on Artificial Intelligence for Strategy Games, 2019
Medcod: A medically-accurate, emotive, diverse, and controllable dialog system
R Compton, I Valmianski, L Deng, C Huang, N Katariya, X Amatriain, ...
Machine Learning for Health, 110-129, 2021
Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning
S Huang, Q Gallouédec, F Felten, A Raffin, RFJ Dossa, Y Zhao, ...
arXiv preprint arXiv:2402.03046, 2024
The n implementation details of rlhf with ppo
S Huang, T Liu, L Von Werra
The Third Blogpost Track at ICLR 2024, 2024
The N+ Implementation Details of RLHF with PPO: A Case Study on TL; DR Summarization
S Huang, M Noukhovitch, A Hosseini, K Rasul, W Wang, L Tunstall
arXiv preprint arXiv:2403.17031, 2024
Reward Scale Robustness for Proximal Policy Optimization via DreamerV3 Tricks
R Sullivan, A Kumar, S Huang, J Dickerson, J Suarez
Advances in Neural Information Processing Systems 36, 2024
Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform
S Huang, J Weng, R Charakorn, M Lin, Z Xu, S Ontañón
The Twelfth International Conference on Learning Representations, 2023
Measuring Generalization of Deep Reinforcement Learning with Real-time Strategy Games
SO Shengyi Huang
AAAI-21 Workshop on Reinforcement Learning in Games, 2021
StreetTraffic: A Library for Traffic Flow Data Collection and Analysis
S Huang, C Healy
Proceedings of the ACMSE 2018 Conference, 1-3, 2018
Systém momentálně nemůže danou operaci provést. Zkuste to znovu později.
Články 1–19