Computational Methods for Analyzing Complex Data: Advances in Kernel Techniques and Applications in Molecular Biology

Guardado en:
Detalles Bibliográficos
Publicado en:ProQuest Dissertations and Theses (2025)
Autor principal: Zong, Zixiao
Publicado:
ProQuest Dissertations & Theses
Materias:
Acceso en línea:Citation/Abstract
Full Text - PDF
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!

MARC

LEADER 00000nab a2200000uu 4500
001 3228609321
003 UK-CbPIL
020 |a 9798288800528 
035 |a 3228609321 
045 2 |b d20250101  |b d20251231 
084 |a 66569  |2 nlm 
100 1 |a Zong, Zixiao 
245 1 |a Computational Methods for Analyzing Complex Data: Advances in Kernel Techniques and Applications in Molecular Biology 
260 |b ProQuest Dissertations & Theses  |c 2025 
513 |a Dissertation/Thesis 
520 3 |a This dissertation explores computational methods for analyzing complex data, presenting advances in kernel techniques for nonlinear data analysis, and demonstrating the application of network-based modeling and machine learning to molecular biology. The first contribution addresses privacy challenges in crowdsourced location data. We leverage kernel methods in federated analytics, and develop a Random Fourier Feature (RFF) based framework for population density estimation that protects user location privacy. Each user’s location is irreversibly projected onto a small number of spatially delocalized basis functions, making precise localization impossible while still allowing population density estimation. The framework is evaluated on both synthetic and real-world datasets, and achieves a better utility-vs-privacy tradeoff than the state-of-the-art noise-adding based method. The second contribution aims to improve the scalability of approximated kernel methods. While random feature models offer a way to identify non-linear patterns with linear complexity, one may need a very large number of random features to achieve a reliable kernel approximation, and space complexity is again a limitation for large data sets. Here, we propose the frequent direction Kernel PCA (FD-KPCA) that combines the advantages of random features with a deterministic matrix sketching technique, which enhances computational efficiency and reduces space complexity. Error bounds of the proposed method measured by Frobenius and Spectral norms are provided. Experiments on real-world datasets show that the proposed framework outperforms the random feature model with the same space costs, and can outperform Nyström approximation at fixed space cost when the number of random features is large. Finally, we extend the application of computational techniques to molecular biology, employing Network Hamiltonian Models (NHMs) and machine learning to analyze amyloid fibril formation and infer fibril topology and kinetics from simulation data. This work demonstrates the power of combining statistical network modeling, machine learning, and Markov chain Monte Carlo (MCMC) sampling to address challenges in scientific research. 
653 |a Computer science 
653 |a Computer engineering 
653 |a Molecular biology 
773 0 |t ProQuest Dissertations and Theses  |g (2025) 
786 0 |d ProQuest  |t ProQuest Dissertations & Theses Global 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3228609321/abstract/embedded/L8HZQI7Z43R0LA5T?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3228609321/fulltextPDF/embedded/L8HZQI7Z43R0LA5T?source=fedsrch