All’s Well That FID’s Well? Result Quality and Metric Scores in GAN Models for Lip-Synchronization Tasks
שמור ב:
| הוצא לאור ב: | Electronics vol. 14, no. 17 (2025), p. 3487-3506 |
|---|---|
| מחבר ראשי: | |
| מחברים אחרים: | , |
| יצא לאור: |
MDPI AG
|
| נושאים: | |
| גישה מקוונת: | Citation/Abstract Full Text + Graphics Full Text - PDF |
| תגים: |
אין תגיות, היה/י הראשונ/ה לתייג את הרשומה!
|
MARC
| LEADER | 00000nab a2200000uu 4500 | ||
|---|---|---|---|
| 001 | 3249684759 | ||
| 003 | UK-CbPIL | ||
| 022 | |a 2079-9292 | ||
| 024 | 7 | |a 10.3390/electronics14173487 |2 doi | |
| 035 | |a 3249684759 | ||
| 045 | 2 | |b d20250101 |b d20251231 | |
| 084 | |a 231458 |2 nlm | ||
| 100 | 1 | |a Geldhauser Carina |u Department Mathematik, ETH Zurich, 8092 Zurich, Switzerland | |
| 245 | 1 | |a All’s Well That FID’s Well? Result Quality and Metric Scores in GAN Models for Lip-Synchronization Tasks | |
| 260 | |b MDPI AG |c 2025 | ||
| 513 | |a Journal Article | ||
| 520 | 3 | |a This exploratory study investigates the usability of performance metrics for generative adversarial network (GAN)-based models for speech-driven facial animation. These models focus on the transfer of speech information from an audio file to a still image to generate talking-head videos in a small-scale “everyday usage” setting. Two models, LipGAN and a custom implementation of a Wasserstein GAN with gradient penalty (L1WGAN-GP), are examined for their visual performance and scoring according to commonly used metrics: Quantitative comparisons using FID, SSIM, and PSNR metrics on the GRIDTest dataset show mixed results, and metrics fail to capture local artifacts crucial for lip synchronization, pointing to limitations in their applicability for video animation tasks. The study points towards the inadequacy of current quantitative measures and emphasizes the continued necessity of human qualitative assessment for evaluating talking-head video quality. | |
| 653 | |a Speech | ||
| 653 | |a Machine learning | ||
| 653 | |a Computer-generated imagery | ||
| 653 | |a Performance measurement | ||
| 653 | |a Deep learning | ||
| 653 | |a Computer vision | ||
| 653 | |a Video recordings | ||
| 653 | |a Neural networks | ||
| 653 | |a Generative adversarial networks | ||
| 653 | |a Synchronism | ||
| 653 | |a Audio data | ||
| 653 | |a Animation | ||
| 653 | |a Realism | ||
| 700 | 1 | |a Liljegren Johan |u Centre for Mathematical Sciences, Lund University, P.O. Box 118, 22100 Lund, Sweden | |
| 700 | 1 | |a Nordqvist Pontus |u Centre for Mathematical Sciences, Lund University, P.O. Box 118, 22100 Lund, Sweden | |
| 773 | 0 | |t Electronics |g vol. 14, no. 17 (2025), p. 3487-3506 | |
| 786 | 0 | |d ProQuest |t Advanced Technologies & Aerospace Database | |
| 856 | 4 | 1 | |3 Citation/Abstract |u https://www.proquest.com/docview/3249684759/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch |
| 856 | 4 | 0 | |3 Full Text + Graphics |u https://www.proquest.com/docview/3249684759/fulltextwithgraphics/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch |
| 856 | 4 | 0 | |3 Full Text - PDF |u https://www.proquest.com/docview/3249684759/fulltextPDF/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch |