Shashwat Goel

I am an AI researcher, currently co-advised by Jonas Geiping and Douwe Kiela through the ELLIS PhD program. I want to find novel ways to scale supervision for models, to make them more useful and safe. I believe any capability that can be measured can be optimized. Thus my work focuses on evaluating model capabilities beyond knowledge. Currently, I’m working on tasks which require inductive reasoning to plan under uncertainty, asking better questions, and long-horizon execution.

See this blog for research I’m excited to work on towards building generally intelligent agents. If you’re interested in any of these problems, reach out, I like getting E-Mails!

News

13 July 2025	Presenting 5 projects (1 spotlight, 1 workshop oral) at ICML 2025
23 June 2025	Started as a Research Scientist intern at Meta GenAI London
2 September 2024	Starting a PhD in Tübingen (Germany), co-advised by Jonas Geiping (ELLIS, MPI-IS) and Douwe Kiela (Contextual AI, Stanford)
24 May 2024	Defended my masters thesis on New Frontiers for Machine Unlearning at IIIT Hyderabad
23 February 2024	Outstanding Paper Award at AAAI 2024 for our paper Proportional Aggregation of Preferences for Sequential Decision Making
1 December 2023	Completed the SERI MATS program (and extension), contributing to the Centre for AI Safety (RepE, WMDP).
1 June 2023	First Quantitative Research Intern in India at Millennium (American Hedge Fund). Worked on AutoML for trading.

Selected Publications

Answer Matching Outperforms Multiple Choice for Language Model Evaluations
Nikhil Chandak*, Shashwat Goel*, Ameya Prabhu, Moritz Hardt, Jonas Geiping
ICML Assessing World Models Workshop, 2025.

Pitfalls in Evaluating Language Model Forecasters
Daniel Paleka*, Shashwat Goel*, Jonas Geiping, Florian Tramèr
ICML Assessing World Models Workshop, 2025.

Measuring Belief Updates in Curious Agents
Joschka Strüber, Ilze Amanda Auzina, Shashwat Goel, Susanne Keller, Jonas Geiping, Ameya Prabhu, Matthias Bethge
(Oral) ICML Assessing World Models Workshop, 2025.

Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation
Shiven Sinha, Shashwat Goel, Ponnurangam Kumaraguru, Jonas Geiping, Matthias Bethge, Ameya Prabhu
(Oral) ICLR Scaling Self Improving Models Workshop, COLM, 2025.
[webpage], [code], [data]

Great Models Think Alike and this Undermines AI Oversight
Shashwat Goel, Joschka Strüber, Ilze Amanda Auzina, Karuna Chandra, P. Kumaraguru, Douwe Kiela, Ameya Prabhu, Matthias Bethge, Jonas Geiping
(Spotlight) ICML, 2025.
[code], [tool], [data]

Corrective Machine Unlearning
Shashwat Goel*, Ameya Prabhu*, Philip Torr, P. Kumaraguru, Amartya Sanyal
TMLR, 2024.
[twitter], [code]

The WMDP Benchmark: Measuring and Reducing Malicious Use with Unlearning
Center for AI Safety, Scale AI
ICML, 2024.
[media], [webpage], [code]

Proportional Aggregation of Preferences for Sequential Decision Making
Nikhil Chandak, Shashwat Goel, Dominik Peters
(Outstanding Paper Award) AAAI, 2024.
[twitter], [talk]

Representation Engineering: A Top-Down Approach to AI Transparency
Center for AI Safety
ArXiv, 2023.
[talk], [webpage], [code]

* denotes equal contribution.

I love problem solving, and learnt a lot from ecosystems like the International Olympiads of Informatics and Linguistics, Exun Clan, and the broader tech circuit in Delhi. I’m also grateful for mentorship from Amartya Sanyal, Ameya Prabhu, Dan Hendrycks, Dominik Peters, Mikel Forcada, Mukesh Kumar, Jérôme Lang, Jorge Gracia, Ponnurangam Kumaraguru, Saujas Vaduguru, and Tanmoy Bhattacharya. Thanks to them, I have been able to explore a diverse range of research areas, including Machine Learning, Interpretability of LLMs, Social Choice Theory, Machine Translation, Semantic Evolution and Algorithm Design.