MLLM-Search: A Zero-Shot Approach to Finding People Using Multimodal Large Language Models

Tallennettuna:
Bibliografiset tiedot
Julkaisussa:Robotics vol. 14, no. 8 (2025), p. 102-120
Päätekijä: Fung, Angus
Muut tekijät: Tan, Aaron Hao, Wang Haitong, Benhabib Bensiyon, Goldie, Nejat
Julkaistu:
MDPI AG
Aiheet:
Linkit:Citation/Abstract
Full Text + Graphics
Full Text - PDF
Tagit: Lisää tagi
Ei tageja, Lisää ensimmäinen tagi!
Kuvaus
Abstrakti:Robotic search of people in human-centered environments, including healthcare settings, is challenging, as autonomous robots need to locate people without complete or any prior knowledge of their schedules, plans, or locations. Furthermore, robots need to be able to adapt to real-time events that can influence a person’s plan in an environment. In this paper, we present MLLM-Search, a novel zero-shot person search architecture that leverages multimodal large language models (MLLM) to address the mobile robot problem of searching for a person under event-driven scenarios with varying user schedules. Our approach introduces a novel visual prompting method to provide robots with spatial understanding of the environment by generating a spatially grounded waypoint map, representing navigable waypoints using a topological graph and regions by semantic labels. This is incorporated into an MLLM with a region planner that selects the next search region based on the semantic relevance to the search scenario and a waypoint planner that generates a search path by considering the semantically relevant objects and the local spatial context through our unique spatial chain-of-thought prompting approach. Extensive 3D photorealistic experiments were conducted to validate the performance of MLLM-Search in searching for a person with a changing schedule in different environments. An ablation study was also conducted to validate the main design choices of MLLM-Search. Furthermore, a comparison study with state-of-the-art search methods demonstrated that MLLM-Search outperforms existing methods with respect to search efficiency. Real-world experiments with a mobile robot in a multi-room floor of a building showed that MLLM-Search was able to generalize to new and unseen environments.
ISSN:2218-6581
DOI:10.3390/robotics14080102
Lähde:Advanced Technologies & Aerospace Database