Asma Ghandeharioun

Research Scientist, Google Research, NYC

prof_pic.jpg

I am Asma Ghandeharioun, a research scientist at the People + AI research team in Google Research. I work on aligning AI with human values through better undrestanding [1] and controlling (language) models [2], and uniquely by demistifying their inner workings [3] and correcting collective misconseptions along the way [4, 5].

I received my Ph.D. from the Affective Computing Group, MIT Media Lab. I am fortunate to have had Roz as my advisor. In addition, I have had research experiences at Google Research, Microsoft Research, and EPFL, many of which have evolved into exciting long-term collaborations.

You can download my résumé here.

Selected Publications

  1. patchscopes.png
    Patchscopes: A unifying framework for inspecting hidden representations of language models
    Asma Ghandeharioun*, Avi Caciularu* , Adam Pearce , Lucas Dixon , and Mor Geva
    arXiv preprint arXiv:2401.06102, 2024
  2. localization.png
    Does localization inform editing? surprising differences in causality-based localization vs. knowledge editing in language models
    Peter Hase , Mohit Bansal , Been Kim , and Asma Ghandeharioun
    Advances in Neural Information Processing Systems (NeurIPS), 2023
    (Spotlight)
  3. simplification.png
    Interpretability illusions in the generalization of simplified models
    Dan Friedman , Andrew Kyle Lampinen , Lucas Dixon , Danqi Chen , and Asma Ghandeharioun
    arXiv preprint arXiv:2312.03656, 2024
  4. grok.gif
    Do machine learning models memorize or generalize
    Adam Pearce , Asma Ghandeharioun, Nada Hussein , Nithum Thain , Martin Wattenberg , and Lucas Dixon
    In IEEE VISxAI , 2023
    (Best paper)
  5. AMPLIFY.png
    Post Hoc Explanations of Language Models Can Improve Language Models
    Satyapriya Krishna , Jiaqi Ma , Dylan Slack , Asma Ghandeharioun, Sameer Singh , and Himabindu Lakkaraju
    In Advances in Neural Information Processing Systems (NeurIPS) , 2023
  6. dissect.jpg
    DISSECT: Disentangled simultaneous explanations via concept traversals
    Asma Ghandeharioun, Been Kim , Chun-Liang Li , Brendan Jou , Brian Eoff , and Rosalind W Picard
    In International Conference on Learning Representations (ICLR) , 2021
  7. correlations_table.jpg
    Approximating interactive human evaluation with self-play for open-domain dialog systems
    Asma Ghandeharioun*, Judy Hanwen Shen* , Natasha Jaques* , Craig Ferguson , Noah Jones , Agata Lapedriza , and Rosalind W Picard
    In Advances in Neural Information Processing Systems (NeurIPS) , 2019
  8. thesis.png
    Towards Human-Centered Optimality Criteria
    Asma Ghandeharioun
    Massachusetts Institute of Technology , 2021