VTG-GPT: Tuning-Free Zero-Shot Video Temporal Grounding with GPT

Na minha lista:
Detalhes bibliográficos
Publicado no:Applied Sciences vol. 14, no. 5 (2024), p. 1894
Autor principal: Xu, Yifang
Outros Autores: Sun, Yunzhuo, Xie, Zien, Zhai, Benxiang, Du, Sidan
Publicado em:
MDPI AG
Assuntos:
Acesso em linha:Citation/Abstract
Full Text + Graphics
Full Text - PDF
Tags: Adicionar Tag
Sem tags, seja o primeiro a adicionar uma tag!

MARC

LEADER 00000nab a2200000uu 4500
001 2955469495
003 UK-CbPIL
022 |a 2076-3417 
024 7 |a 10.3390/app14051894  |2 doi 
035 |a 2955469495 
045 2 |b d20240101  |b d20241231 
084 |a 231338  |2 nlm 
100 1 |a Xu, Yifang  |u School of Electronic Science and Engineering, Nanjing University, Nanjing 210093, China; <email>xyf@smail.nju.edu.cn</email> (Y.X.); <email>xze@smail.nju.edu.cn</email> (Z.X.); <email>zbx@smail.nju.edu.cn</email> (B.Z.) 
245 1 |a VTG-GPT: Tuning-Free Zero-Shot Video Temporal Grounding with GPT 
260 |b MDPI AG  |c 2024 
513 |a Journal Article 
520 3 |a Video temporal grounding (VTG) aims to locate specific temporal segments from an untrimmed video based on a linguistic query. Most existing VTG models are trained on extensive annotated video-text pairs, a process that not only introduces human biases from the queries but also incurs significant computational costs. To tackle these challenges, we propose VTG-GPT, a GPT-based method for zero-shot VTG without training or fine-tuning. To reduce prejudice in the original query, we employ Baichuan2 to generate debiased queries. To lessen redundant information in videos, we apply MiniGPT-v2 to transform visual content into more precise captions. Finally, we devise the proposal generator and post-processing to produce accurate segments from debiased queries and image captions. Extensive experiments demonstrate that VTG-GPT significantly outperforms SOTA methods in zero-shot settings and surpasses unsupervised approaches. More notably, it achieves competitive performance comparable to supervised methods. The code is available on GitHub. 
653 |a Language 
653 |a Design 
653 |a Methods 
653 |a Linguistics 
653 |a Annotations 
653 |a Queries 
653 |a Proposals 
653 |a Natural language 
653 |a Bias 
653 |a Prejudice 
700 1 |a Sun, Yunzhuo  |u School of Physics and Electronics, Hubei Normal University, Huangshi 435002, China; <email>sunyunzhuo98@gmail.com</email> 
700 1 |a Xie, Zien  |u School of Electronic Science and Engineering, Nanjing University, Nanjing 210093, China; <email>xyf@smail.nju.edu.cn</email> (Y.X.); <email>xze@smail.nju.edu.cn</email> (Z.X.); <email>zbx@smail.nju.edu.cn</email> (B.Z.) 
700 1 |a Zhai, Benxiang  |u School of Electronic Science and Engineering, Nanjing University, Nanjing 210093, China; <email>xyf@smail.nju.edu.cn</email> (Y.X.); <email>xze@smail.nju.edu.cn</email> (Z.X.); <email>zbx@smail.nju.edu.cn</email> (B.Z.) 
700 1 |a Du, Sidan  |u School of Electronic Science and Engineering, Nanjing University, Nanjing 210093, China; <email>xyf@smail.nju.edu.cn</email> (Y.X.); <email>xze@smail.nju.edu.cn</email> (Z.X.); <email>zbx@smail.nju.edu.cn</email> (B.Z.) 
773 0 |t Applied Sciences  |g vol. 14, no. 5 (2024), p. 1894 
786 0 |d ProQuest  |t Publicly Available Content Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/2955469495/abstract/embedded/L8HZQI7Z43R0LA5T?source=fedsrch 
856 4 0 |3 Full Text + Graphics  |u https://www.proquest.com/docview/2955469495/fulltextwithgraphics/embedded/L8HZQI7Z43R0LA5T?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/2955469495/fulltextPDF/embedded/L8HZQI7Z43R0LA5T?source=fedsrch