Fast and Provable Algorithms for Sparse Principal Component Analysis

Na minha lista:
Detalhes bibliográficos
Publicado no:PQDT - Global (2025)
Autor principal: Xian, Zhuozhi
Publicado em:
ProQuest Dissertations & Theses
Assuntos:
Acesso em linha:Citation/Abstract
Full Text - PDF
Full text outside of ProQuest
Tags: Adicionar Tag
Sem tags, seja o primeiro a adicionar uma tag!

MARC

LEADER 00000nab a2200000uu 4500
001 3273603877
003 UK-CbPIL
020 |a 9798263313760 
035 |a 3273603877 
045 2 |b d20250101  |b d20251231 
084 |a 189128  |2 nlm 
100 1 |a Xian, Zhuozhi 
245 1 |a Fast and Provable Algorithms for Sparse Principal Component Analysis 
260 |b ProQuest Dissertations & Theses  |c 2025 
513 |a Dissertation/Thesis 
520 3 |a Principal component analysis (PCA) is a well-known statistical method for feature extraction and dimension reduction, widely used for data analysis. However, traditional PCA encounters problems of overfitting and loss of explainability in high-dimensional settings, particularly when the number of variables exceeds the sample size. Sparse PCA overcomes these limitations by introducing sparsity into the principal components, offering a robust alternative to PCA and obtaining more interpretable results. In this thesis, we investigate the spiked covariance model and the spiked Wigner model in sparse PCA.We first explore the spiked covariance model, which aims to recover a sparse unit vector from noisy samples. From an information-theoretic perspective, Ω(k log p) observations are sufficient to recover a k-sparse p-dimensional vector v. However, existing polynomial-time methods require at least Ω(k 2 ) samples for successful recovery, highlighting a significant gap in sample efficiency. To bridge this gap, we introduce a novel thresholding-based algorithm that requires only Ω(k log p) samples, provided the signal strength λ = Ω(∥v∥ −1 ∞ ). We also propose a two-stage nonconvex algorithm that further enhances estimation performance. This approach integrates our thresholding algorithm with the truncated power iteration, achieving the minimax optimal rate of statistical error under the desired sample complexity. Numerical experiments validate the superior performance of our algorithms in terms of estimation accuracy and computational efficiency.Secondly, we study the spiked Wigner model, which aims to recover a s-sparse d-dimensional unit vector u from a d×d noisy matrix. The information theoretical lower bound of the signal strength required to estimate u is β = Ω(√ s log d). In contrast, the signal strength required for existing polynomial-time methods is at least Ω( e s), leading to a notable gap. To close this gap, we propose a new thresholding-based algorithm that requires only Ω(√ s log d) signal strength, given ∥u∥∞ = Ω(1). We also design a two-stage nonconvex method that further improves estimation accuracy. This approach combines our thresholding algorithm with the truncated power iteration, achieving the constant error in limited iterations under the desired signal strength. Empirical results show the advanced performance of our algorithms in terms of the estimation error and computational cost. 
653 |a Sparsity 
653 |a Big Data 
653 |a Sample size 
653 |a Costs 
653 |a Normal distribution 
653 |a Signal processing 
653 |a Eigenvalues 
653 |a Eigenvectors 
653 |a Mathematics 
773 0 |t PQDT - Global  |g (2025) 
786 0 |d ProQuest  |t ProQuest Dissertations & Theses Global 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3273603877/abstract/embedded/6A8EOT78XXH2IG52?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3273603877/fulltextPDF/embedded/6A8EOT78XXH2IG52?source=fedsrch 
856 4 0 |3 Full text outside of ProQuest  |u https://doi.org/10.14711/thesis-hdl152460