ECE Department Seminar: Efficient Finetuning of Large Language Models via Large-Width Analysis
Description
Soufiane Hayou, an assistant professor in the Department of Applied Mathematics and Statistics at Johns Hopkins University and a member of the university's Data Science and AI Institute, will give a talk titled "Efficient Finetuning of Large Language Models via Large-Width Analysis" for the Department of Electrical and Computer Engineering.
Abstract:
Finetuning Large Language Models (LLMs) enhances their performance on downstream tasks—a desirable outcome if the model is used for a specific task. Parameter-efficient finetuning methods such as LoRA (Low-Rank Adaptation) are popular because they allow finetuning large models with relatively low cost. When using LoRA, two hyperparameters critically shape learning: learning rates and initialization. In this talk, I'll present two results. First, we prove and demonstrate that the two "zero-product" initializations (A random/B=0 vs. B random/A=0) are not equivalent: initializing B=0, A random permits larger stable learning rates and yields better performance, with an infinite-width stability analysis explaining the gap and LLM experiments confirming it. Second, LoRA+ shows that using the same learning rate for the A and B matrices is suboptimal at large width; a simple asymmetric LR scheme yields more efficient feature learning and delivers consistent accuracy gains and up to ~2× faster convergence at the same compute. Finally, I will distill these insights into practical defaults.
Who can attend?
- General public
- Faculty
- Staff
- Students