VIoTGPT: Learning to Schedule Vision Tools in LLMs towards Intelligent Video Internet of Things

Guardado en:
Detalles Bibliográficos
Publicado en:arXiv.org (Dec 22, 2024), p. n/a
Autor principal: Zhong, Yaoyao
Otros Autores: Qi, Mengshi, Wang, Rui, Qiu, Yuhan, Zhang, Yang, Ma, Huadong
Publicado:
Cornell University Library, arXiv.org
Materias:
Acceso en línea:Citation/Abstract
Full text outside of ProQuest
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!

MARC

LEADER 00000nab a2200000uu 4500
001 2897289851
003 UK-CbPIL
022 |a 2331-8422 
035 |a 2897289851 
045 0 |b d20241222 
100 1 |a Zhong, Yaoyao 
245 1 |a VIoTGPT: Learning to Schedule Vision Tools in LLMs towards Intelligent Video Internet of Things 
260 |b Cornell University Library, arXiv.org  |c Dec 22, 2024 
513 |a Working Paper 
520 3 |a Video Internet of Things (VIoT) has shown full potential in collecting an unprecedented volume of video data. How to schedule the domain-specific perceiving models and analyze the collected videos uniformly, efficiently, and especially intelligently to accomplish complicated tasks is challenging. To address the challenge, we build VIoTGPT, the framework based on LLMs to correctly interact with humans, query knowledge videos, and invoke vision models to analyze multimedia data collaboratively. To support VIoTGPT and related future works, we meticulously crafted the VIoT-Tool dataset, including the training dataset and the benchmark involving 11 representative vision models across three categories based on semi-automatic annotations. To guide LLM to act as the intelligent agent towards intelligent VIoT, we resort to the ReAct instruction tuning method based on VIoT-Tool to learn the tool capability. Quantitative and qualitative experiments and analyses demonstrate the effectiveness of VIoTGPT. We believe VIoTGPT contributes to improving human-centered experiences in VIoT applications. The project website is https://github.com/zhongyy/VIoTGPT. 
653 |a Qualitative analysis 
653 |a Datasets 
653 |a Video data 
653 |a Annotations 
653 |a Intelligent agents 
653 |a Internet of Things 
653 |a Schedules 
700 1 |a Qi, Mengshi 
700 1 |a Wang, Rui 
700 1 |a Qiu, Yuhan 
700 1 |a Zhang, Yang 
700 1 |a Ma, Huadong 
773 0 |t arXiv.org  |g (Dec 22, 2024), p. n/a 
786 0 |d ProQuest  |t Engineering Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/2897289851/abstract/embedded/6A8EOT78XXH2IG52?source=fedsrch 
856 4 0 |3 Full text outside of ProQuest  |u http://arxiv.org/abs/2312.00401