All’s Well That FID’s Well? Result Quality and Metric Scores in GAN Models for Lip-Synchronization Tasks
Guardado en:
| Publicado en: | Electronics vol. 14, no. 17 (2025), p. 3487-3506 |
|---|---|
| Autor principal: | |
| Otros Autores: | , |
| Publicado: |
MDPI AG
|
| Materias: | |
| Acceso en línea: | Citation/Abstract Full Text + Graphics Full Text - PDF |
| Etiquetas: |
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
| Resumen: | This exploratory study investigates the usability of performance metrics for generative adversarial network (GAN)-based models for speech-driven facial animation. These models focus on the transfer of speech information from an audio file to a still image to generate talking-head videos in a small-scale “everyday usage” setting. Two models, LipGAN and a custom implementation of a Wasserstein GAN with gradient penalty (L1WGAN-GP), are examined for their visual performance and scoring according to commonly used metrics: Quantitative comparisons using FID, SSIM, and PSNR metrics on the GRIDTest dataset show mixed results, and metrics fail to capture local artifacts crucial for lip synchronization, pointing to limitations in their applicability for video animation tasks. The study points towards the inadequacy of current quantitative measures and emphasizes the continued necessity of human qualitative assessment for evaluating talking-head video quality. |
|---|---|
| ISSN: | 2079-9292 |
| DOI: | 10.3390/electronics14173487 |
| Fuente: | Advanced Technologies & Aerospace Database |