Curriculum Vitae

View PDF
shashwatnow@gmail.com

Education

PhD in Artificial Intelligence, 2024-present
ELLIS, Max Planck Institute for Intelligent Systems, Tübingen
Topic: Scaling Supervision for AI Advisors: Jonas Geiping and Douwe Kiela
B.Tech. and M.S. (by Research) in Computer Science Engineering, 2019–2024
International Institute of Information Technology (IIIT), Hyderabad
GPA: 9.60/10
Thesis: New Frontiers for Machine Unlearning, advised by Prof. Ponnurangam K.

Experience

(Incoming) Research Scientist Intern, Meta GenAI, London, June 2025 - October 2025
Project: Scalable Oversight
Researcher, Stanford Existential Risk Institute ML Alignment Theory Scholars (SERI MATS), July–Dec 2023
Mentor: Dan Hendrycks
Quantitative Research Intern, Central Research Team, Millennium India, May–June 2023
Project: AutoML for Tree-based and linear ensembles to find alpha across datasets
Research Intern, Social Choice Theory, LAMSADE, CNRS, May–July 2022
Advisors: Jerome Lang, Dominik Peters
Research Assistant, Language Evolution, Santa Fe Institute, July–Sept 2021
Mentor: Tanmoy Chakroborty
Developer, Distributed Computing Laboratory, Summer@EPFL, May–June 2021
Mentors: Matteo Monti, Rachid Guerraroui
Research Developer, Apertium, Google Summer of Code, April–Aug 2020
Mentors: Mikel Forcada, Jorge Gracia

Publications

  1. Measuring Belief Updates in Curious Agents
    Joschka Strüber, Ilze Amanda Auzina, Shashwat Goel, Susanne Keller, Jonas Geiping, Ameya Prabhu, Matthias Bethge
    ICML Workshop on Assessing World Models, 2025.

  2. Pitfalls in Evaluating Language Model Forecasters
    Daniel Paleka*, Shashwat Goel*, Jonas Geiping, Florian Tramèr
    ICML Workshop on Assessing World Models, 2025.

  3. Answer Matching Outperforms Multiple Choice for Language Model Evaluations
    Nikhil Chandak*, Shashwat Goel*, Ameya Prabhu, Moritz Hardt, Jonas Geiping
    ICML Workshop on Assessing World Models, 2025.

  4. Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation
    Shiven Sinha, Shashwat Goel, Ponnurangam Kumaraguru, Jonas Geiping, Matthias Bethge, Ameya Prabhu
    (Oral) ICLR Scaling Self Improving Foundation Models Workshop, 2025.
    [webpage], [code], [data]

  5. Great Models Think Alike and this Undermines AI Oversight
    Shashwat Goel, Joschka Strüber, Ilze Amanda Auzina, Karuna Chandra, P. Kumaraguru, Douwe Kiela, Ameya Prabhu, Matthias Bethge, Jonas Geiping
    (Spotlight) ICML, 2025.
    [code], [tool], [data]

  6. Corrective Machine Unlearning
    Shashwat Goel*, Ameya Prabhu*, Philip Torr, P. Kumaraguru, Amartya Sanyal
    Transactions on Machine Learning Research (TMLR) 2024
    Workshop on Data-centric Machine Learning (DMLR) - Recommended for Journal (Top 15) at the 12th International Conference on Representation Learning (ICLR), 2024.

    [twitter], [code]

  7. The WMDP Benchmark: Measuring and Reducing Malicious Use with Unlearning
    Center for AI Safety, Scale AI
    International Conference on Machine Learning (ICML), 2024.
    [media], [webpage], [code]

  8. Proportional Aggregation of Preferences for Sequential Decision Making
    Nikhil Chandak, Shashwat Goel, Dominik Peters
    Outstanding Paper Award (top 3 out of 12,000+ submissions) at 38th Annual Conference of the Association for the Advancement of Artificial Intelligence (AAAI), 2024.
    [twitter], [talk]

  9. Representation Engineering: A Top-Down Approach to AI Transparency
    Center for AI Safety
    ArXiv, 2023.
    [talk], [webpage], [code]

  10. Probing Negation in Language Models
    Shashwat Singh*, Shashwat Goel*, Saujas Vaduguru, Ponnurangam Kumaraguru
    8th Workshop on Representation Learning for NLP (RepL4NLP)
    61st Annual Meeting of the Association for Computational Linguistics (ACL), 2023.

    [code]

  11. Towards Adversarial Evaluations of Inexact Machine Unlearning
    Shashwat Goel*, Ameya Prabhu*, Amartya Sanyal, Ser-Nam Lim, Phillip Torr, Ponnurangam Kumaraguru
    ArXiv, 2023.
    [code]

Honours and Awards

Teaching Experience

Academic Service and Outreach

University Groups

last updated: July 23, 2024