Ai alignment: A comprehensive survey J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang, Y Duan, Z He, J Zhou, ... arXiv preprint arXiv:2310.19852, 2023 | 170 | 2023 |
Foundational challenges in assuring alignment and safety of large language models U Anwar, A Saparov, J Rando, D Paleka, M Turpin, P Hase, ES Lubana, ... arXiv preprint arXiv:2404.09932, 2024 | 72 | 2024 |
ProAgent: building proactive cooperative agents with large language models C Zhang, K Yang, S Hu, Z Wang, G Li, Y Sun, C Zhang, Z Zhang, A Liu, ... Proceedings of the AAAI Conference on Artificial Intelligence 38 (16), 17591 …, 2024 | 63* | 2024 |
Heterogeneous Value Alignment Evaluation for Large Language Models Z Zhang, N Liu, S Qi, C Zhang, Z Rong, SC Zhu, S Cui, Y Yang AAAI 2024 Workshop: Public Sector LLMs (Oral), 2023 | 13* | 2023 |
Contextual Transformer for Offline Meta Reinforcement Learning R Lin, Y Li, X Feng, Z Zhang, XHW Fung, H Zhang, J Wang, Y Du, Y Yang Foundation Models for Decision Making Workshop at Neural Information …, 2022 | 11 | 2022 |
CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents S Qi, S Chen, Y Li, X Kong, J Wang, B Yang, P Wong, Y Zhong, X Zhang, ... ICLR 2024 (Spotlight), 2024 | 8 | 2024 |
Measuring Value Understanding in Language Models through Discriminator-Critique Gap Z Zhang, F Bai, J Gao, Y Yang arXiv preprint arXiv:2310.00378, 2023 | 6 | 2023 |
STAS: Spatial-Temporal Return Decomposition for Solving Sparse Rewards Problems in Multi-agent Reinforcement Learning S Chen, Z Zhang, Y Yang, Y Du Proceedings of the AAAI Conference on Artificial Intelligence 38 (16), 17337 …, 2024 | 3* | 2024 |
Incentive Compatibility for AI Alignment in Sociotechnical Systems: Positions and Prospects Z Zhang, F Bai, M Wang, H Ye, C Ma, Y Yang arXiv preprint arXiv:2402.12907, 2024 | 2 | 2024 |
Magnetic Preference Optimization: Achieving Last-iterate Convergence for Language Models Alignment M Wang, C Ma, Q Chen, L Meng, Y Han, J Xiao, Z Zhang, J Huo, WJ Su, ... arXiv preprint arXiv:2410.16714, 2024 | | 2024 |
Efficient Model-agnostic Alignment via Bayesian Persuasion F Bai, M Wang, Z Zhang, B Chen, Y Xu, Y Wen, Y Yang arXiv preprint arXiv:2405.18718, 2024 | | 2024 |