{"id":1557,"date":"2025-08-13T16:21:19","date_gmt":"2025-08-13T07:21:19","guid":{"rendered":"https:\/\/www.sonycsl.co.jp\/kyoto\/?post_type=project&#038;p=1557"},"modified":"2025-08-13T16:21:19","modified_gmt":"2025-08-13T07:21:19","slug":"gazellm_en","status":"publish","type":"project","link":"https:\/\/www.sonycsl.co.jp\/kyoto\/projects\/gazellm_en\/","title":{"rendered":"GazeLLM: Multimodal LLMs incorporating Human Visual Attention"},"content":{"rendered":"\n<h3 class=\"wp-block-heading\"><strong>Member\uff1a Jun Rekimoto<\/strong><\/h3>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p>Multimodal Large Language Models (MLLMs) are capable of understanding human activities through images, audio, and video, and are applicable to a wide range of human-computer interaction scenarios, including activity support, real-world agents, and skill transfer to robots or other individuals. However, processing high-resolution and long-duration videos consumes substantial computational resources. <\/p>\n\n\n\n<p>In this study, we propose a method that segments first-person perspective videos based on eye-tracking data and selectively processes image regions where gaze is concentrated. This approach reduces the number of pixels to approximately one-tenth while maintaining or even improving the model\u2019s comprehension performance.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>Published Paper :&nbsp;<a href=\"https:\/\/arxiv.org\/html\/2504.00221v1\" title=\"\">[Download from here]<\/a><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"314\" src=\"https:\/\/www.sonycsl.co.jp\/kyoto\/wp-content\/uploads\/2025\/08\/image-1-1024x314.png\" alt=\"\" class=\"wp-image-1538\" srcset=\"https:\/\/www.sonycsl.co.jp\/kyoto\/wp-content\/uploads\/2025\/08\/image-1-1024x314.png 1024w, https:\/\/www.sonycsl.co.jp\/kyoto\/wp-content\/uploads\/2025\/08\/image-1-300x92.png 300w, https:\/\/www.sonycsl.co.jp\/kyoto\/wp-content\/uploads\/2025\/08\/image-1-768x236.png 768w, https:\/\/www.sonycsl.co.jp\/kyoto\/wp-content\/uploads\/2025\/08\/image-1.png 1412w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n","protected":false},"template":"","class_list":["post-1557","project","type-project","status-publish","hentry"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.sonycsl.co.jp\/kyoto\/wp-json\/wp\/v2\/project\/1557","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.sonycsl.co.jp\/kyoto\/wp-json\/wp\/v2\/project"}],"about":[{"href":"https:\/\/www.sonycsl.co.jp\/kyoto\/wp-json\/wp\/v2\/types\/project"}],"version-history":[{"count":3,"href":"https:\/\/www.sonycsl.co.jp\/kyoto\/wp-json\/wp\/v2\/project\/1557\/revisions"}],"predecessor-version":[{"id":1560,"href":"https:\/\/www.sonycsl.co.jp\/kyoto\/wp-json\/wp\/v2\/project\/1557\/revisions\/1560"}],"wp:attachment":[{"href":"https:\/\/www.sonycsl.co.jp\/kyoto\/wp-json\/wp\/v2\/media?parent=1557"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}