inproceedings

MermaidLLM: Dataflow Diagrams for Explainable Skill Formalization and Real-time Support with Multimodal LLMs

Adjunct Proceedings of the 38th Annual ACM Symposium on User Interface Software and Technology | 2025

[URL] https://doi.org/10.1145/3746058.3758449

Author

Yokoi, Sotaro and Rekimoto, Jun

Abstract

Traditional skill acquisition methods require laborious manual authoring, hindering scalability. In contrast, advances in Multimodal Large Language Models (MLLMs), combined with wearable sensors that capture egocentric vision, gaze, and voice data, create new opportunities for AI to understand and transfer human skills. We propose “MermaidLLM”, where MLLMs process multimodal input to automatically generate explainable dataflow diagrams of the task description in Mermaid notation. This intermediate task representation improves the shared human-AI understanding, facilitates verification of the comprehension of the MLLM, and simplifies analytical comparisons. During skill transfer, MermaidLLM operates as a real-time agent, using these diagrams to offer trainees immediate, context-sensitive guidance through wearable devices. We evaluated this approach through user studies, demonstrating the effectiveness of Mermaid diagrams for skill understanding and the potential of the real-time agent.

DOI

https://doi.org/10.1145/3746058.3758449

MermaidLLM: Dataflow Diagrams for Explainable Skill Formalization and Real-time Support with Multimodal LLMs

Author

Abstract

DOI

Related Members