Human-like monocular depth biases in deep neural networks

Guardado en:
Detalles Bibliográficos
Publicado en:PLoS Computational Biology vol. 21, no. 8 (Aug 2025), p. e1013020-e1013063
Autor principal: Kubota, Yuki
Otros Autores: Fukiage, Taiki
Publicado:
Public Library of Science
Materias:
Acceso en línea:Citation/Abstract
Full Text
Full Text - PDF
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!

MARC

LEADER 00000nab a2200000uu 4500
001 3270579503
003 UK-CbPIL
022 |a 1553-734X 
022 |a 1553-7358 
024 7 |a 10.1371/journal.pcbi.1013020  |2 doi 
035 |a 3270579503 
045 2 |b d20250801  |b d20250831 
084 |a 174831  |2 nlm 
100 1 |a Kubota, Yuki 
245 1 |a Human-like monocular depth biases in deep neural networks 
260 |b Public Library of Science  |c Aug 2025 
513 |a Journal Article 
520 3 |a Human depth perception from 2D images is systematically distorted, yet the nature of these distortions is not fully understood. By examining error patterns in depth estimation for both humans and deep neural networks (DNNs), which have shown remarkable abilities in monocular depth estimation, we can gain insights into constructing functional models of this human 3D vision and designing artificial models with improved interpretability. Here, we propose a comprehensive human-DNN comparison framework for a monocular depth judgment task. Using a novel human-annotated dataset of natural indoor scenes and a systematic analysis of absolute depth judgments, we investigate error patterns in both humans and DNNs. Employing exponential-affine fitting, we decompose depth estimation errors into depth compression, per-image affine transformations (including scaling, shearing, and translation), and residual errors. Our analysis reveals that human depth judgments exhibit systematic and consistent biases, including depth compression, a vertical bias (perceiving objects in the lower visual field as closer), and consistent per-image affine distortions across participants. Intriguingly, we find that DNNs with higher accuracy partially recapitulate these human biases, demonstrating greater similarity in affine parameters and residual error patterns. This suggests that these seemingly suboptimal human biases may reflect efficient, ecologically adapted strategies for depth inference from inherently ambiguous monocular images. However, while DNNs capture metric-level residual error patterns similar to humans, they fail to reproduce human-level accuracy in ordinal depth perception within the affine-invariant space. These findings underscore the importance of evaluating error patterns beyond raw accuracy, providing new insights into how humans and computational models resolve depth ambiguity. Our dataset and methodology provide a framework for evaluating the alignment between computational models and human perceptual biases, thereby advancing our understanding of visual space representation and guiding the development of models that more faithfully capture human depth perception. 
610 4 |a CNN 
653 |a Accuracy 
653 |a Human bias 
653 |a Datasets 
653 |a Image compression 
653 |a Investigations 
653 |a Perceptions 
653 |a Affine transformations 
653 |a Depth perception 
653 |a Artificial neural networks 
653 |a Shearing 
653 |a Neural networks 
653 |a Visual fields 
653 |a Data collection 
653 |a Computer applications 
653 |a Perception 
653 |a Visual field 
653 |a Compression 
653 |a Space perception 
653 |a Geometry 
653 |a Mathematical models 
653 |a Environmental 
700 1 |a Fukiage, Taiki 
773 0 |t PLoS Computational Biology  |g vol. 21, no. 8 (Aug 2025), p. e1013020-e1013063 
786 0 |d ProQuest  |t Health & Medical Collection 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3270579503/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text  |u https://www.proquest.com/docview/3270579503/fulltext/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3270579503/fulltextPDF/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch