Silhouette-Based Evaluation of PCA, Isomap, and t-SNE on Linear and Nonlinear Data Structures
Guardado en:
| Publicado en: | Stats vol. 8, no. 4 (2025), p. 105-157 |
|---|---|
| Autor principal: | |
| Otros Autores: | |
| Publicado: |
MDPI AG
|
| Materias: | |
| Acceso en línea: | Citation/Abstract Full Text + Graphics Full Text - PDF |
| Etiquetas: |
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
MARC
| LEADER | 00000nab a2200000uu 4500 | ||
|---|---|---|---|
| 001 | 3286352619 | ||
| 003 | UK-CbPIL | ||
| 022 | |a 2571-905X | ||
| 024 | 7 | |a 10.3390/stats8040105 |2 doi | |
| 035 | |a 3286352619 | ||
| 045 | 2 | |b d20251001 |b d20251231 | |
| 100 | 1 | |a Zahed Mostafa | |
| 245 | 1 | |a Silhouette-Based Evaluation of PCA, Isomap, and t-SNE on Linear and Nonlinear Data Structures | |
| 260 | |b MDPI AG |c 2025 | ||
| 513 | |a Journal Article | ||
| 520 | 3 | |a Dimensionality reduction is fundamental for analyzing high-dimensional data, supporting visualization, denoising, and structure discovery. We present a systematic, large-scale benchmark of three widely used methods—Principal Component Analysis (PCA), Isometric Mapping (Isomap), and t-Distributed Stochastic Neighbor Embedding (t-SNE)—evaluated by average silhouette scores to quantify cluster preservation after embedding. Our full factorial simulation varies sample size <inline-formula>n∈{100,200,300,400,500}</inline-formula>, noise variance <inline-formula>σ2∈{0.25,0.5,0.75,1,1.5,2}</inline-formula>, and feature count <inline-formula>p∈{20,50,100,200,300,400}</inline-formula> under four generative regimes: (1) a linear Gaussian mixture, (2) a linear Student-t mixture with heavy tails, (3) a nonlinear Swiss-roll manifold, and (4) a nonlinear concentric-spheres manifold, each replicated 1000 times per condition. Beyond empirical comparisons, we provide mathematical results that explain the observed rankings: under standard separation and sampling assumptions, PCA maximizes silhouettes for linear, low-rank structure, whereas Isomap dominates on smooth curved manifolds; t-SNE prioritizes local neighborhoods, yielding strong local separation but less reliable global geometry. Empirically, PCA consistently achieves the highest silhouettes for linear structure (Isomap second, t-SNE third); on manifolds the ordering reverses (Isomap > t-SNE > PCA). Increasing <inline-formula>σ2</inline-formula> and adding uninformative dimensions (larger p) degrade all methods, while larger n improves levels and stability. To our knowledge, this is the first integrated study combining a comprehensive factorial simulation across linear and nonlinear regimes with distribution-based summaries (density and violin plots) and supporting theory that predicts method orderings. The results offer clear, practice-oriented guidance: prefer PCA when structure is approximately linear; favor manifold learning—especially Isomap—when curvature is present; and use t-SNE for the exploratory visualization of local neighborhoods. Complete tables and replication materials are provided to facilitate method selection and reproducibility. | |
| 653 | |a Radiomics | ||
| 653 | |a Machine learning | ||
| 653 | |a Accuracy | ||
| 653 | |a Principal components analysis | ||
| 653 | |a Data visualization | ||
| 653 | |a Datasets | ||
| 653 | |a Bioinformatics | ||
| 653 | |a Data mining | ||
| 653 | |a Signal processing | ||
| 653 | |a Data analysis | ||
| 653 | |a Natural language processing | ||
| 653 | |a Algorithms | ||
| 653 | |a Genomics | ||
| 653 | |a Genes | ||
| 653 | |a Neighborhoods | ||
| 653 | |a Geometry | ||
| 700 | 1 | |a Skafyan Maryam | |
| 773 | 0 | |t Stats |g vol. 8, no. 4 (2025), p. 105-157 | |
| 786 | 0 | |d ProQuest |t ABI/INFORM Global | |
| 856 | 4 | 1 | |3 Citation/Abstract |u https://www.proquest.com/docview/3286352619/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch |
| 856 | 4 | 0 | |3 Full Text + Graphics |u https://www.proquest.com/docview/3286352619/fulltextwithgraphics/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch |
| 856 | 4 | 0 | |3 Full Text - PDF |u https://www.proquest.com/docview/3286352619/fulltextPDF/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch |