Volgen
Fazl Barez
Fazl Barez
Geverifieerd e-mailadres voor robots.ox.ac.uk - Homepage
Titel
Geciteerd door
Geciteerd door
Jaar
The Larger they are, the Harder they Fail: Language Models do not Recognize Identifier Swaps in Python
AVM Barone*, F Barez*, I Konstas, SB Cohen
The 61st Annual Meeting Of The Association For Computational Linguistics, 2023
23*2023
PMIC: Improving Multi-Agent Reinforcement Learning with Progressive Mutual Information Collaboration
P Li, H Tang, T Yang, X Hao, T Sang, Y Zheng, J Hao, ME Taylor, Z Wang, ...
arXiv preprint arXiv:2203.08553, 2022
232022
Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark
J Hoelscher-Obermaier*, J Persson*, E Kran, I Konstas, F Barez*
Findings of the Association for Computational Linguistics 2023, 11548–11559, 2023
222023
Neuron to Graph: Interpreting Language Model Neurons at Scale
A Foote*, N Nanda, E Kran, I Konstas, S Cohen, F Barez*
arXiv preprint arXiv:2305.19911, 2023
102023
Sleeper agents: Training deceptive llms that persist through safety training
E Hubinger, C Denison, J Mu, M Lambert, M Tong, M MacDiarmid, ...
arXiv preprint arXiv:2401.05566, 2024
92024
Understanding Addition in Transformers
P Quirke, F Barez
International Conference on Learning Representations (ICLR), 2023
52023
System III: Learning with Domain Knowledge for Safety Constraints
F Barez, H Hasanbieg, A Abbate
NeurIPS ML Safety Workshop, 2022
52022
Benchmarking specialized databases for high-frequency data
F Barez, P Bilokon, R Xiong
arXiv preprint arXiv:2301.12561, 2023
42023
Discovering topics and trends in the UK Government web archive
D Beavan, F Barez, M Bel, J Fitzgerald, E Goudarouli, K Kollnig, ...
Data Study Group Final Report. Alan Turing Institute, London, 2021
4*2021
Large language models relearn removed concepts
M Lo, SB Cohen, F Barez
arXiv preprint arXiv:2401.01814, 2024
32024
Exploring the advantages of transformers for high-frequency trading
F Barez, P Bilokon, A Gervais, N Lisitsyn
arXiv preprint arXiv:2302.13850, 2023
32023
Identifying a preliminary circuit for predicting gendered pronouns in gpt-2 small
C Mathwin, G Corlouer, E Kran, F Barez, N Nanda
URL: https://itch. io/jam/mechint/rate/1889871, 2023
32023
Beyond Training Objectives: Interpreting Reward Model Divergence in Large Language Models
M Luke, A Amir, N Clement, A Rauno, T Philip, B Fazl
https://arxiv.org/abs/2310.08164, 2024
2*2024
Interpreting Shared Circuits for Ordered Sequence Prediction in a Large Language Model
M Lan, F Barez
https://arxiv.org/abs/2311.04131, 2023
2*2023
Increasing Trust in Language Models through the Reuse of Verified Circuits
P Quirke, C Neo, F Barez
arXiv preprint arXiv:2402.02619, 2024
12024
Measuring Value Alignment
F Barez, P Torr
arXiv preprint arXiv:2312.15241, 2023
12023
AI Systems of Concern
K Matteucci, S Avin, F Barez, SÓ hÉigeartaigh
arXiv preprint arXiv:2310.05876, 2023
12023
ED2: an environment dynamics decomposition framework for world model construction
C Wang, T Yang, J Hao, Y Zheng, H Tang, F Barez, J Liu, J Peng, H Piao, ...
arXiv preprint arXiv:2112.02817, 2021
12021
Near to Mid-term Risks and Opportunities of Open Source Generative AI
F Eiras, A Petrov, B Vidgen, CS de Witt, F Pizzati, K Elkins, ...
arXiv preprint arXiv:2404.17047, 2024
2024
The Scaling Behavior of Large Language Models
AV Miceli-Barone, F Barez, SB Cohen, E Voita, U Germann, M Lukasik
Proceedings of the First edition of the Workshop on the Scaling Behavior of …, 2024
2024
Het systeem kan de bewerking nu niet uitvoeren. Probeer het later opnieuw.
Artikelen 1–20