Artificial intelligence is used to solve tough problems such as triaging COVID-19 patients in hospitals or helping law enforcement identify participants in the Jan. 6 insurrection at the U.S. Capitol. But AI also has enabled the proliferation of "deepfakes," manipulated videos and other media created by AI algorithms that are often used to spread disinformation online.
In Machine Learning: Deep Learning, a Johns Hopkins course offered last fall by computer science Assistant Professor Mathias Unberath, undergraduate and graduate students took on the challenge of building AI systems from scratch with an eye toward solving contemporary problems. The course, which teaches students how to design, use, and think about AI systems, is also being offered this spring, and more than 100 students have enrolled so far.
The class exposes students to the basic concepts of deep learning emerging in fields such as computer vision, language processing, and health care, says Unberath, who has taught the course for the past three years.
"Recent advances in deep neural networks have had a sweeping impact on the field of artificial intelligence," Unberath says. "Hopkins students will most certainly play an important role in shaping this technology in the future. We want to provide students with the right skills and conceptual frameworks to be conscientious designers of AI-powered technologies, which is why—in addition to purely technical content—we include lectures on human-centered design, algorithmic bias, and ethical considerations."
More than 90 students and faculty members recently took part in the course's virtual showcase, in which 22 student teams shared their final projects. After the presentations, Intuitive Surgical, a leading surgical robotics company, awarded two teams with the Best Project Award with a cash prize. Below are some of the projects that caught the judges' attention.
Deep Fakes for Good
While popularized by social media and face-swapping apps, deepfake technology has many positive uses for art, education, and even health care. For example, deepfake speech synthesis can benefit patients who have lost their ability to speak, such as patients with locked-in syndrome caused by amyotrophic lateral sclerosis (ALS). These patients sometimes rely on speech brain-computer interfaces to communicate, in which deep neural networks, or DNNs, translate brain activity into speech and generate a generic voice.
Through the use of deepfake techniques, a model can be trained to generate a humanlike custom "voice," says Steve Luo, a biomedical engineering PhD student who researches communication strategies for locked-in syndrome. With this problem in mind, Luo and robotics graduate students Jessica Soong and Cindy Huang built a text-to-speech system using deep convolutional neural networks.
"This problem is challenging because of the limited amount of training data. In cases where a patient has already lost their voice, voice samples just aren't available or good quality," Soong says.
They trained the model using two datasets: publicly available voice samples from 24 speakers, and a custom dataset of voice samples from Unberath, taken from the pre-recorded course lectures. In the final presentation, the team demonstrated that the system can successfully generate deepfake voices that sound much like the real person—including their own professor, even with limited training data.
Face Mask Detection and Classification
Masks and social distancing have proven to be effective methods for decreasing spread of COVID-19. The need to encourage people to abide by such public health measures is more pressing than ever, as the world surpasses 2 million coronavirus-related deaths.
For their project, John Morkos, Andy Ding, and Zach Murphy—all Hopkins medical students interested in health care technology—built an algorithm to automatically assess mask usage from images. The team utilized the Kaggle Face Mask Detection dataset, comprising 853 different images of people wearing, not wearing, or incorrectly wearing face masks. Using deep learning computer vision techniques, the team implemented an object detection model to identify faces in those images and classify mask usage accordingly.
The model demonstrated promising results for identifying correctly masked and non-masked faces; however, the team found that detecting incorrect masks required more detailed images, because the model must also detect distinct features of the face, such as the nose, in relation to the mask. But with more images, they say, the model could be useful for monitoring mask usage in real-time.
"We've all seen people at retail stores screening customers for masks, and that inspired us to pursue this project. An automated vision system like ours would make it easier to screen for mask compliance in stadiums, airports, or other crowded spaces," Ding says.
According to the U.S. Centers for Disease Control and Prevention, 3 million adults 65 and older are treated in emergency departments for fall injuries each year, but many falls will still go unreported, even in care facilities, because care providers don't always know falls happen.
To solve this problem, computer science graduate students Maia Stiber and Catalina Gomez and robotics graduate students Erica Tevere and Kinjal Shah created DetectaTrip, a tool that can pinpoint the location of falls in video images. A major hurdle confronted by the team was a lack of realistic fall videos, as many existing datasets include only simulated falls, said Stiber. Fortunately, the students trained the model on a small dataset of true unexpected falls, which they obtained from YouTube "fail" compilations.
The team reported that DetectaTrip could identify the onset of a fall (in videos) with a high degree of accuracy. Stiber says a possible extension for the project is to use deep learning to identify the cause of a fall. If fully realized, DetectaTrip has the potential to help providers in care facilities and hospitals to detect falls, assess fall severity, and assign urgency of medical intervention.