Shashwat Goel
I am an AI researcher, currently co-advised by Jonas Geiping and Douwe Kiela through the ELLIS PhD program. I am interested in making ML systems learn when humans can’t provide unambgious ground-truth: whether it be handling conflicting preferences, asking information-seeking questions, or solving novel problems. I like getting email, so reach out! I’m excited about:
- Evaluations and Post-training data
- Enabling iterative improvement of ML models
- Using AI to democratize education and research
News
6 February 2025 | First paper of my PhD Great Models Think Alike and this Undermines AI Oversight is out on ArXiv |
13 October 2024 | Our paper Corrective Machine Unlearning accepted at TMLR |
2 September 2024 | Starting a PhD in Tübingen (Germany), co-advised by Jonas Geiping (ELLIS, MPI-IS) and Douwe Kiela (Contextual AI, Stanford) |
24 May 2024 | Defended my masters thesis on New Frontiers for Machine Unlearning at IIIT Hyderabad |
1 May 2024 | Our paper The WMDP Benchmark:Measuring and Reducing Malicious with Unlearning accepted at ICML 2024 |
10 April 2024 | Recognized as an Exceptional Reviewer at the ICLR Data-Centric ML Workshop |
23 February 2024 | Outstanding Paper Award at AAAI 2024 for our paper Proportional Aggregation of Preferences for Sequential Decision Making |
Selected Publications
Great Models Think Alike and this Undermines AI Oversight
Shashwat Goel, Joschka Strüber, Ilze Amanda Auzina, Karuna Chandra, P. Kumaraguru, Douwe Kiela, Ameya Prabhu, Matthias Bethge, Jonas Geiping
ArXiv preprint, 2025.
[code], [tool], [data]
Corrective Machine Unlearning
Shashwat Goel*, Ameya Prabhu*, Philip Torr, P. Kumaraguru, Amartya Sanyal
Transactions on Machine Learning Research (TMLR) 2024
Workshop on Data-centric Machine Learning (DMLR) - Recommended for Journal (Top 15) at the 12th International Conference on Representation Learning (ICLR), 2024.
[twitter], [code]
The WMDP Benchmark: Measuring and Reducing Malicious Use with Unlearning
Center for AI Safety, Scale AI
International Conference on Machine Learning (ICML), 2024.
[media], [webpage], [code]
Proportional Aggregation of Preferences for Sequential Decision Making
Nikhil Chandak, Shashwat Goel, Dominik Peters
Outstanding Paper Award (top 3 out of 12,000+ submissions) at 38th Annual Conference of the Association for the Advancement of Artificial Intelligence (AAAI), 2024.
[twitter], [talk]
Representation Engineering: A Top-Down Approach to AI Transparency
Center for AI Safety
ArXiv, 2023.
[talk], [webpage], [code]
Probing Negation in Language Models
Shashwat Singh*, Shashwat Goel*, Saujas Vaduguru, Ponnurangam Kumaraguru
8th Workshop on Representation Learning for NLP (RepL4NLP)
61st Annual Meeting of the Association for Computational Linguistics (ACL), 2023.
[code]
Towards Adversarial Evaluations of Inexact Machine Unlearning
Shashwat Goel*, Ameya Prabhu*, Amartya Sanyal, Ser-Nam Lim, Phillip Torr, Ponnurangam Kumaraguru
ArXiv, 2023.
[code]
I have been fortunate to have excellent collaborators from whom I’ve learnt a lot. I’m grateful for mentorship from Amartya Sanyal, Ameya Prabhu, Dan Hendrycks, Dominik Peters, Mikel Forcada, Mukesh Kumar, Jérôme Lang, Jorge Gracia, Ponnurangam Kumaraguru, Saujas Vaduguru, and Tanmoy Bhattacharya. Thanks to them, I have been able to explore a diverse range of research areas, including Machine Learning, Interpretability of LLMs, Social Choice Theory, Machine Translation, Semantic Evolution and Algorithm Design.