Curriculum Vitae

Education

PhD in Artificial Intelligence, 2024-present
ELLIS, Max Planck Institute for Intelligent Systems, Tübingen
Topic: Scaling Supervision for AI Advisors: Jonas Geiping and Douwe Kiela

B.Tech. and M.S. (by Research) in Computer Science Engineering, 2019–2024
International Institute of Information Technology (IIIT), Hyderabad
GPA: 9.60/10
Thesis: New Frontiers for Machine Unlearning, advised by Prof. Ponnurangam K.

Experience

(Incoming) Research Scientist Intern, Meta GenAI, London, June 2025 - October 2025
Project: Scalable Oversight

Researcher, Stanford Existential Risk Institute ML Alignment Theory Scholars (SERI MATS), July–Dec 2023
Mentor: Dan Hendrycks

Quantitative Research Intern, Central Research Team, Millennium India, May–June 2023
Project: AutoML for Tree-based and linear ensembles to find alpha across datasets

Research Intern, Social Choice Theory, LAMSADE, CNRS, May–July 2022
Advisors: Jerome Lang, Dominik Peters

Research Assistant, Language Evolution, Santa Fe Institute, July–Sept 2021
Mentor: Tanmoy Chakroborty

Developer, Distributed Computing Laboratory, Summer@EPFL, May–June 2021
Mentors: Matteo Monti, Rachid Guerraroui

Research Developer, Apertium, Google Summer of Code, April–Aug 2020
Mentors: Mikel Forcada, Jorge Gracia

Publications

Measuring Belief Updates in Curious Agents
Joschka Strüber, Ilze Amanda Auzina, Shashwat Goel, Susanne Keller, Jonas Geiping, Ameya Prabhu, Matthias Bethge
ICML Workshop on Assessing World Models, 2025.
Pitfalls in Evaluating Language Model Forecasters
Daniel Paleka*, Shashwat Goel*, Jonas Geiping, Florian Tramèr
ICML Workshop on Assessing World Models, 2025.
Answer Matching Outperforms Multiple Choice for Language Model Evaluations
Nikhil Chandak*, Shashwat Goel*, Ameya Prabhu, Moritz Hardt, Jonas Geiping
ICML Workshop on Assessing World Models, 2025.
Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation
Shiven Sinha, Shashwat Goel, Ponnurangam Kumaraguru, Jonas Geiping, Matthias Bethge, Ameya Prabhu
(Oral) ICLR Scaling Self Improving Foundation Models Workshop, 2025.
[webpage], [code], [data]
Great Models Think Alike and this Undermines AI Oversight
Shashwat Goel, Joschka Strüber, Ilze Amanda Auzina, Karuna Chandra, P. Kumaraguru, Douwe Kiela, Ameya Prabhu, Matthias Bethge, Jonas Geiping
(Spotlight) ICML, 2025.
[code], [tool], [data]
Corrective Machine Unlearning
Shashwat Goel*, Ameya Prabhu*, Philip Torr, P. Kumaraguru, Amartya Sanyal
Transactions on Machine Learning Research (TMLR) 2024
Workshop on Data-centric Machine Learning (DMLR) - Recommended for Journal (Top 15) at the 12th International Conference on Representation Learning (ICLR), 2024.
[twitter], [code]
The WMDP Benchmark: Measuring and Reducing Malicious Use with Unlearning
Center for AI Safety, Scale AI
International Conference on Machine Learning (ICML), 2024.
[media], [webpage], [code]
Proportional Aggregation of Preferences for Sequential Decision Making
Nikhil Chandak, Shashwat Goel, Dominik Peters
Outstanding Paper Award (top 3 out of 12,000+ submissions) at 38th Annual Conference of the Association for the Advancement of Artificial Intelligence (AAAI), 2024.
[twitter], [talk]
Representation Engineering: A Top-Down Approach to AI Transparency
Center for AI Safety
ArXiv, 2023.
[talk], [webpage], [code]
Probing Negation in Language Models
Shashwat Singh*, Shashwat Goel*, Saujas Vaduguru, Ponnurangam Kumaraguru
8th Workshop on Representation Learning for NLP (RepL4NLP)
61st Annual Meeting of the Association for Computational Linguistics (ACL), 2023.
[code]
Towards Adversarial Evaluations of Inexact Machine Unlearning
Shashwat Goel*, Ameya Prabhu*, Amartya Sanyal, Ser-Nam Lim, Phillip Torr, Ponnurangam Kumaraguru
ArXiv, 2023.
[code]

denotes equal contribution.

Honours and Awards

Outstanding Paper Award (Top 3/12,000+), AAAI 2024
Outstanding Reviewer (Top 10%): ICML 2022, ICLR DMLR Workshop 2024
Finalist (Top 50/3000+), ACM-ICPC Indian Regionals, 2020
Honorable Mention, International Olympiad of Linguistics, 2019
National Rank 6, International Olympiad of Informatics Indian Team Selection, 2019
Grand Prize Winner (1/1500+), NASA Ames Space Settlement Design Contest, 2017

Teaching Experience

Head Teaching Assistant, Responsible and Safe AI, IIIT Hyderabad, Spring 2024
Facilitator, AI Safety Fundamentals, BlueDot Impact, Spring 2023
Teaching Assistant, Topics in DL (Graph Neural Networks), IIIT Hyderabad, Spring 2023
Teaching Assistant, Automata Theory, IIIT Hyderabad, Fall 2022

Academic Service and Outreach

Reviewer: CoLLAs 2024, ICLR DMLR Workshop 2024, AISTATS 2024, CoLLAs 2023, CODS-COMAD 2023, ICML 2022
Trainer, Indian Team Selection for the International Olympiad of Informatics (IOI) 2020

University Groups

ML Reading Group @IIIT-H (Founder)
Effective Altruism Group @IIIT-H (Founder)
Theory Group @IIIT-H (Former Admin)
Programming Club @IIIT-H (Former Admin)
Parliamentary Debate Team @IIIT-H
Ping! Student Magazine @IIIT-H (Editor)

last updated: July 23, 2024

Shashwat Goel