Shashwat Goel

I am an AI researcher, currently co-advised by Jonas Geiping and Douwe Kiela through the ELLIS PhD program. I am interested in how to make ML systems learn when humans can’t provide unambgious ground-truth: whether it be handling conflicting preferences, asking information-seeking questions, or solving novel problems. Feel free to get in touch if you want to chat about ideas! I’m excited about:

  • The science of pretraining task design, data, and evaluations
  • Enabling iterative improvement of ML models
  • Using AI to democratize education and research

News

2 September 2024Starting a PhD in Tübingen (Germany), co-advised by Jonas Geiping (ELLIS, MPI-IS) and Douwe Kiela (Contextual AI, Stanford)
24 May 2024Defended my masters thesis on New Frontiers for Machine Unlearning at IIIT Hyderabad
1 May 2024Our paper on measuring and removing dual-use knowledge from Large Language Models accepted at ICML 2024
10 April 2024Recognized as an Exceptional Reviewer at the ICLR Data-Centric ML Workshop
7 March 2024Our paper on Corrective Machine Unlearning accepted at the ICLR Data-Centric ML Workshop 2024
23 February 2024Outstanding Paper Award at AAAI 2024 for our paper on Proportional Aggregation of Preferences for Sequential Decision Making
1 September 2023Granted funding to continue my work with the Center for AI Safety through the SERI MATS program, after my 2 month research visit at Berkeley, California

Publications

Corrective Machine Unlearning
Shashwat Goel*, Ameya Prabhu*, Philip Torr, P. Kumaraguru, Amartya Sanyal
Workshop on Data-centric Machine Learning (DMLR) - Recommended for Journal (Top 15)
12th International Conference on Representation Learning (ICLR), 2024.

[twitter], [code]

The WMDP Benchmark: Measuring and Reducing Malicious Use with Unlearning
Center for AI Safety, Scale AI
International Conference on Machine Learning (ICML), 2024.
[media], [webpage], [code]

Proportional Aggregation of Preferences for Sequential Decision Making
Nikhil Chandak, Shashwat Goel, Dominik Peters
38th Annual Conference of the Association for the Advancement of Artificial Intelligence (AAAI)
Outstanding Paper Award (top 3 out of 12,000+ submissions), 2024.

[twitter], [talk]

Representation Engineering: A Top-Down Approach to AI Transparency
Center for AI Safety
ArXiv, 2023.
[talk], [webpage], [code]

Probing Negation in Language Models
Shashwat Singh*, Shashwat Goel*, Saujas Vaduguru, Ponnurangam Kumaraguru
8th Workshop on Representation Learning for NLP (RepL4NLP)
61st Annual Meeting of the Association for Computational Linguistics (ACL), 2023.

[code]

Towards Adversarial Evaluations of Inexact Machine Unlearning
Shashwat Goel*, Ameya Prabhu*, Amartya Sanyal, Ser-Nam Lim, Phillip Torr, Ponnurangam Kumaraguru
ArXiv, 2023.
[code]

Bilingual Dictionary Generation and Enrichment via Graph Exploration
Shashwat Goel, Jorge Gracia, Mikel L. Forcada
Special Issue on Latest Advancements in Linguistic Linked Data
Semantic Web Journal, 2022.

[code]

Low Impact Agency: Review and Discussion
Danilo Naiff, Shashwat Goel
ArXiv, 2022.

Modelling and Optimizing the Allocation of COVID-19 Swabs to Labs
Nikhil Chandak, Shashwat Goel, Kunal Jain, Arpan Dasgupta
Student Abstract at 18th Mixed Integer Programming Workshop
Winner, Covid-19 Swabs2Labs Hackathon by Ministry of Health Karnataka, 2021.

[code]

From Pivots to Graphs: Augmented Cycle Density as a generalization to One Time Inverse Consultation
Shashwat Goel, Kunwar Shanjeet Grover
4th Shared Task on Translation Inference Across Dictionaries
3rd Conference on Language, Data and Knowledge, 2021.

I have been fortunate to have excellent collaborators from whom I’ve learnt a lot. I’m grateful for mentorship from Amartya Sanyal, Ameya Prabhu, Dan Hendrycks, Dominik Peters, Mikel Forcada, Mukesh Kumar, Jérôme Lang, Jorge Gracia, Ponnurangam Kumaraguru, Saujas Vaduguru, and Tanmoy Bhattacharya. Thanks to them, I have been able to explore a diverse range of research areas, including Machine Learning, Interpretability of LLMs, Social Choice Theory, Machine Translation, Semantic Evolution and Algorithm Design.