Chief Science Officer / Fellow
Detecting utterance scenes of a specific person
Author
Sato, Kunihiko and Rekimoto, Jun
Abstract
We propose a system that detects the scene, where a specific speaker is speaking in the video, and displays the site as a heat map in the video's timeline. This system enables users to skip to the timeline they want to hear by detecting scenes in a drama, talk show, or discussion TV program, where a specific speaker is speaking. To detect a specific speaker's utterance, we develop a deep neural network (DNN) to extract only a specific speaker from the original sound source. We also implement the detection algorithm based on the output of the proposed DNN and the interface for displaying the detection result.