SkillsInterpreter: A case study of automatic annotation of flowcharts to support browsing instructional videos in modern martial arts using large language models
Author
Oomori, Kotaro and Ishiguro, Yoshio and Rekimoto, Jun
Abstract
The use of video for learning physical skills such as modern martial arts is becoming popular. Physical skills such as modern martial arts require decisions depending on the situation. An example of these decisions is selecting an appropriate off-balance technique based on the position of the opponent’s feet. However, the existing interface does not support video browsing based on the structure of the physical skills, including situations and the decisions that should be made at that time. We hypothesize browsing based on the structure can help the user’s skill comprehension. In this paper, we propose a structure-based video browsing method, SkillsInterpreter, which automatically generates a flowchart of the speech-contained skill instruction video by large language models (LLMs). The generated flowchart explores desired scenes, checks the current chapter, and reviews the skill structure while watching the video. Our study included interviews with experts and evaluations with learners in modern martial arts. Based on our two studies, it was suggested that SkillsInterpreter can support video-based skill learning in modern martial arts, especially in Brazilian Jiu-Jitsu, which needs situation-specific decision making.