Fazl Barez

Geciteerd door

	Alles	Sinds 2019
Citaties	122	122
h-index	5	5
i10-index	4	4

2022202320245 69 47

Medeauteurs

Shay CohenUniversity of EdinburghGeverifieerd e-mailadres voor inf.ed.ac.uk
Philip TorrProfessor, University of OxfordGeverifieerd e-mailadres voor eng.ox.ac.uk
David DuvenaudAssociate Professor, University of TorontoGeverifieerd e-mailadres voor cs.toronto.edu
Ethan PerezAnthropic; New York UniversityGeverifieerd e-mailadres voor anthropic.com
Roger GrosseAssociate Professor, University of TorontoGeverifieerd e-mailadres voor cs.toronto.edu
Mrinank SharmaAnthropicGeverifieerd e-mailadres voor anthropic.com
Sören MindermannUniversity of Oxford, OATMLGeverifieerd e-mailadres voor cs.ox.ac.uk
Jan BraunerUniversity of OxfordGeverifieerd e-mailadres voor cs.ox.ac.uk
Jesse MuAnthropicGeverifieerd e-mailadres voor anthropic.com
Paul ChristianoNational Institute of Standards and TechnologyGeverifieerd e-mailadres voor nist.gov
Samuel R. BowmanNYU and AnthropicGeverifieerd e-mailadres voor nyu.edu

Volgen

Fazl Barez

University of Oxford

Geverifieerd e-mailadres voor robots.ox.ac.uk - Homepage

Machine Learning Safety Interpretability Alignment


Titel Sorteren op citaties Sorteren op jaar Sorteren op titel	Geciteerd door Geciteerd door	Jaar
The Larger they are, the Harder they Fail: Language Models do not Recognize Identifier Swaps in Python AVM Barone, F Barez, I Konstas, SB Cohen The 61st Annual Meeting Of The Association For Computational Linguistics, 2023	23*	2023
PMIC: Improving Multi-Agent Reinforcement Learning with Progressive Mutual Information Collaboration P Li, H Tang, T Yang, X Hao, T Sang, Y Zheng, J Hao, ME Taylor, Z Wang, ... arXiv preprint arXiv:2203.08553, 2022	23	2022
Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark J Hoelscher-Obermaier, J Persson, E Kran, I Konstas, F Barez* Findings of the Association for Computational Linguistics 2023, 11548–11559, 2023	22	2023
Neuron to Graph: Interpreting Language Model Neurons at Scale A Foote, N Nanda, E Kran, I Konstas, S Cohen, F Barez arXiv preprint arXiv:2305.19911, 2023	10	2023
Sleeper agents: Training deceptive llms that persist through safety training E Hubinger, C Denison, J Mu, M Lambert, M Tong, M MacDiarmid, ... arXiv preprint arXiv:2401.05566, 2024	9	2024
Understanding Addition in Transformers P Quirke, F Barez International Conference on Learning Representations (ICLR), 2023	5	2023
System III: Learning with Domain Knowledge for Safety Constraints F Barez, H Hasanbieg, A Abbate NeurIPS ML Safety Workshop, 2022	5	2022
Benchmarking specialized databases for high-frequency data F Barez, P Bilokon, R Xiong arXiv preprint arXiv:2301.12561, 2023	4	2023
Discovering topics and trends in the UK Government web archive D Beavan, F Barez, M Bel, J Fitzgerald, E Goudarouli, K Kollnig, ... Data Study Group Final Report. Alan Turing Institute, London, 2021	4*	2021
Large language models relearn removed concepts M Lo, SB Cohen, F Barez arXiv preprint arXiv:2401.01814, 2024	3	2024
Exploring the advantages of transformers for high-frequency trading F Barez, P Bilokon, A Gervais, N Lisitsyn arXiv preprint arXiv:2302.13850, 2023	3	2023
Identifying a preliminary circuit for predicting gendered pronouns in gpt-2 small C Mathwin, G Corlouer, E Kran, F Barez, N Nanda URL: https://itch. io/jam/mechint/rate/1889871, 2023	3	2023
Beyond Training Objectives: Interpreting Reward Model Divergence in Large Language Models M Luke, A Amir, N Clement, A Rauno, T Philip, B Fazl https://arxiv.org/abs/2310.08164, 2024	2*	2024
Interpreting Shared Circuits for Ordered Sequence Prediction in a Large Language Model M Lan, F Barez https://arxiv.org/abs/2311.04131, 2023	2*	2023
Increasing Trust in Language Models through the Reuse of Verified Circuits P Quirke, C Neo, F Barez arXiv preprint arXiv:2402.02619, 2024	1	2024
Measuring Value Alignment F Barez, P Torr arXiv preprint arXiv:2312.15241, 2023	1	2023
AI Systems of Concern K Matteucci, S Avin, F Barez, SÓ hÉigeartaigh arXiv preprint arXiv:2310.05876, 2023	1	2023
ED2: an environment dynamics decomposition framework for world model construction C Wang, T Yang, J Hao, Y Zheng, H Tang, F Barez, J Liu, J Peng, H Piao, ... arXiv preprint arXiv:2112.02817, 2021	1	2021
Near to Mid-term Risks and Opportunities of Open Source Generative AI F Eiras, A Petrov, B Vidgen, CS de Witt, F Pizzati, K Elkins, ... arXiv preprint arXiv:2404.17047, 2024		2024
The Scaling Behavior of Large Language Models AV Miceli-Barone, F Barez, SB Cohen, E Voita, U Germann, M Lukasik Proceedings of the First edition of the Workshop on the Scaling Behavior of …, 2024		2024

Het systeem kan de bewerking nu niet uitvoeren. Probeer het later opnieuw.

Artikelen 1–20

Citaties per jaar

Dubbele citaties

Samengevoegde citaties

Medeauteurs toevoegenMedeauteurs

Volgen

Geciteerd door

Medeauteurs