CLSP Fall Seminar Series: Sebastian Nehrdich

Oct 25, 2024
12 - 1:15pm EDT
This event is free

Who can attend?

  • Faculty
  • Staff
  • Students

Contact

Center for Language and Speech Processing

Description

[Sebastian Nehrdich](Sebastian Nehrdich from the University of California, Berkeley) from the University of California, Berkeley, will present a Center for Language and Speech Processing seminar titled "MITRA: Beyond Just Machine Translation for Premodern Asian Low-Resource Languages."

Abstract:

Recent years saw the rise of multilingual language models that achieve high levels of performance for a large number of tasks, with some of them handling hundreds of languages at once. Premodern languages are usually underrepresented in such models, leading to poor performance in downstream applications. In my talk, I will introduce the Dharmamitra project, which aims to develop a diverse set of language models to address these shortcomings for the classical Asian low-resource languages Sanskrit, Tibetan, Classical Chinese, and Pali. These models are providing solutions for low-level NLP tasks such as word segmentation, morpho-syntactic tagging etc., as well as high-level tasks such as semantic search, machine translation, and general chatbot interaction. I will talk about the individual challenges and unique characteristics of the data involved, and what strategies we deploy to address these. I will also demonstrate how these different tools can be combined in an application that goes beyond simple sentence-to-sentence machine translation, but instead provides detailed grammatical explanations and corpus-wide search to provide the users with as much relevant information as possible. This application is helpful for early-stage languages learners on the one hand, as well as experienced researchers with high level of language knowledge and very specific demands on the other hand.

Who can attend?

  • Faculty
  • Staff
  • Students

Contact

Center for Language and Speech Processing