Chief Science Officer / Fellow
MermaidLLM: Dataflow Diagrams for Explainable Skill Formalization and Real-time Support with Multimodal LLMs
Author
Yokoi, Sotaro and Rekimoto, Jun
Abstract
Traditional skill acquisition methods require laborious manual authoring, hindering scalability. In contrast, advances in Multimodal Large Language Models (MLLMs), combined with wearable sensors that capture egocentric vision, gaze, and voice data, create new opportunities for AI to understand and transfer human skills. We propose “MermaidLLM”, where MLLMs process multimodal input to automatically generate explainable dataflow diagrams of the task description in Mermaid notation. This intermediate task representation improves the shared human-AI understanding, facilitates verification of the comprehension of the MLLM, and simplifies analytical comparisons. During skill transfer, MermaidLLM operates as a real-time agent, using these diagrams to offer trainees immediate, context-sensitive guidance through wearable devices. We evaluated this approach through user studies, demonstrating the effectiveness of Mermaid diagrams for skill understanding and the potential of the real-time agent.