All’s Well That FID’s Well? Result Quality and Metric Scores in GAN Models for Lip-Synchronization Tasks

שמור ב:
מידע ביבליוגרפי
הוצא לאור ב:Electronics vol. 14, no. 17 (2025), p. 3487-3506
מחבר ראשי: Geldhauser Carina
מחברים אחרים: Liljegren Johan, Nordqvist Pontus
יצא לאור:
MDPI AG
נושאים:
גישה מקוונת:Citation/Abstract
Full Text + Graphics
Full Text - PDF
תגים: הוספת תג
אין תגיות, היה/י הראשונ/ה לתייג את הרשומה!

MARC

LEADER 00000nab a2200000uu 4500
001 3249684759
003 UK-CbPIL
022 |a 2079-9292 
024 7 |a 10.3390/electronics14173487  |2 doi 
035 |a 3249684759 
045 2 |b d20250101  |b d20251231 
084 |a 231458  |2 nlm 
100 1 |a Geldhauser Carina  |u Department Mathematik, ETH Zurich, 8092 Zurich, Switzerland 
245 1 |a All’s Well That FID’s Well? Result Quality and Metric Scores in GAN Models for Lip-Synchronization Tasks 
260 |b MDPI AG  |c 2025 
513 |a Journal Article 
520 3 |a This exploratory study investigates the usability of performance metrics for generative adversarial network (GAN)-based models for speech-driven facial animation. These models focus on the transfer of speech information from an audio file to a still image to generate talking-head videos in a small-scale “everyday usage” setting. Two models, LipGAN and a custom implementation of a Wasserstein GAN with gradient penalty (L1WGAN-GP), are examined for their visual performance and scoring according to commonly used metrics: Quantitative comparisons using FID, SSIM, and PSNR metrics on the GRIDTest dataset show mixed results, and metrics fail to capture local artifacts crucial for lip synchronization, pointing to limitations in their applicability for video animation tasks. The study points towards the inadequacy of current quantitative measures and emphasizes the continued necessity of human qualitative assessment for evaluating talking-head video quality. 
653 |a Speech 
653 |a Machine learning 
653 |a Computer-generated imagery 
653 |a Performance measurement 
653 |a Deep learning 
653 |a Computer vision 
653 |a Video recordings 
653 |a Neural networks 
653 |a Generative adversarial networks 
653 |a Synchronism 
653 |a Audio data 
653 |a Animation 
653 |a Realism 
700 1 |a Liljegren Johan  |u Centre for Mathematical Sciences, Lund University, P.O. Box 118, 22100 Lund, Sweden 
700 1 |a Nordqvist Pontus  |u Centre for Mathematical Sciences, Lund University, P.O. Box 118, 22100 Lund, Sweden 
773 0 |t Electronics  |g vol. 14, no. 17 (2025), p. 3487-3506 
786 0 |d ProQuest  |t Advanced Technologies & Aerospace Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3249684759/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text + Graphics  |u https://www.proquest.com/docview/3249684759/fulltextwithgraphics/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3249684759/fulltextPDF/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch