Follow
Zhaowei Zhang
Zhaowei Zhang
Other names张 钊为
Verified email at stu.pku.edu.cn - Homepage
Title
Cited by
Cited by
Year
Ai alignment: A comprehensive survey
J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang, Y Duan, Z He, J Zhou, ...
arXiv preprint arXiv:2310.19852, 2023
1702023
Foundational challenges in assuring alignment and safety of large language models
U Anwar, A Saparov, J Rando, D Paleka, M Turpin, P Hase, ES Lubana, ...
arXiv preprint arXiv:2404.09932, 2024
722024
ProAgent: building proactive cooperative agents with large language models
C Zhang, K Yang, S Hu, Z Wang, G Li, Y Sun, C Zhang, Z Zhang, A Liu, ...
Proceedings of the AAAI Conference on Artificial Intelligence 38 (16), 17591 …, 2024
63*2024
Heterogeneous Value Alignment Evaluation for Large Language Models
Z Zhang, N Liu, S Qi, C Zhang, Z Rong, SC Zhu, S Cui, Y Yang
AAAI 2024 Workshop: Public Sector LLMs (Oral), 2023
13*2023
Contextual Transformer for Offline Meta Reinforcement Learning
R Lin, Y Li, X Feng, Z Zhang, XHW Fung, H Zhang, J Wang, Y Du, Y Yang
Foundation Models for Decision Making Workshop at Neural Information …, 2022
112022
CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents
S Qi, S Chen, Y Li, X Kong, J Wang, B Yang, P Wong, Y Zhong, X Zhang, ...
ICLR 2024 (Spotlight), 2024
82024
Measuring Value Understanding in Language Models through Discriminator-Critique Gap
Z Zhang, F Bai, J Gao, Y Yang
arXiv preprint arXiv:2310.00378, 2023
62023
STAS: Spatial-Temporal Return Decomposition for Solving Sparse Rewards Problems in Multi-agent Reinforcement Learning
S Chen, Z Zhang, Y Yang, Y Du
Proceedings of the AAAI Conference on Artificial Intelligence 38 (16), 17337 …, 2024
3*2024
Incentive Compatibility for AI Alignment in Sociotechnical Systems: Positions and Prospects
Z Zhang, F Bai, M Wang, H Ye, C Ma, Y Yang
arXiv preprint arXiv:2402.12907, 2024
22024
Magnetic Preference Optimization: Achieving Last-iterate Convergence for Language Models Alignment
M Wang, C Ma, Q Chen, L Meng, Y Han, J Xiao, Z Zhang, J Huo, WJ Su, ...
arXiv preprint arXiv:2410.16714, 2024
2024
Efficient Model-agnostic Alignment via Bayesian Persuasion
F Bai, M Wang, Z Zhang, B Chen, Y Xu, Y Wen, Y Yang
arXiv preprint arXiv:2405.18718, 2024
2024
The system can't perform the operation now. Try again later.
Articles 1–11