Skip to main content

Johns Hopkins UniversityEst. 1876

America’s First Research University

CLSP Seminar Series: Auditing Memorization, Dissecting Mechanisms, and Evaluating Behavior of Large Language Models

Sept 26, 2025
12 - 1:15pm EDT
Registration is required
This event is free

Who can attend?

  • Faculty
  • Staff
  • Students

Contact

Center for Language and Speech Processing

Description

Robin Jia, an assistant professor of computer science at the University of Southern California, will give a talk titled "Auditing Memorization, Dissecting Mechanisms, and Evaluating Behavior of Large Language Models" for the Center for Language and Speech Processing.

Abstract:

The widespread adoption of large language models (LLMs) places a responsibility on the AI research community to rigorously study and understand them. In this talk, I will describe my group's research on analyzing LLMs' memorization of pre-training data, their internal mechanisms, and their downstream behavior. First, I will introduce the Hubble project, in which we have pre-trained LLMs (up to 8B parameters) on controlled pre-training corpora to understand when and how they memorize sensitive data related to copyright risks, privacy leakage, and test set contamination; we envision these models as a valuable open-source resource for scientific inquiry into LLM memorization. Next, I will describe my group's work on understanding how language models work internally, including vignettes about how they perform arithmetic with Fourier features and how they can learn optimization subroutines for in-context learning. Finally, I will highlight a recent collaboration with USC oncologists in which we uncover LLM sycophancy issues that arise when patients ask these models for medical advice.

Who can attend?

  • Faculty
  • Staff
  • Students

Registration

Registration is required

Please register in advance

Contact

Center for Language and Speech Processing