Computational Methods for Analyzing Complex Data: Advances in Kernel Techniques and Applications in Molecular Biology

Zapisane w:
Opis bibliograficzny
Wydane w:ProQuest Dissertations and Theses (2025)
1. autor: Zong, Zixiao
Wydane:
ProQuest Dissertations & Theses
Hasła przedmiotowe:
Dostęp online:Citation/Abstract
Full Text - PDF
Etykiety: Dodaj etykietę
Nie ma etykietki, Dołącz pierwszą etykiete!
Opis
Streszczenie:This dissertation explores computational methods for analyzing complex data, presenting advances in kernel techniques for nonlinear data analysis, and demonstrating the application of network-based modeling and machine learning to molecular biology. The first contribution addresses privacy challenges in crowdsourced location data. We leverage kernel methods in federated analytics, and develop a Random Fourier Feature (RFF) based framework for population density estimation that protects user location privacy. Each user’s location is irreversibly projected onto a small number of spatially delocalized basis functions, making precise localization impossible while still allowing population density estimation. The framework is evaluated on both synthetic and real-world datasets, and achieves a better utility-vs-privacy tradeoff than the state-of-the-art noise-adding based method. The second contribution aims to improve the scalability of approximated kernel methods. While random feature models offer a way to identify non-linear patterns with linear complexity, one may need a very large number of random features to achieve a reliable kernel approximation, and space complexity is again a limitation for large data sets. Here, we propose the frequent direction Kernel PCA (FD-KPCA) that combines the advantages of random features with a deterministic matrix sketching technique, which enhances computational efficiency and reduces space complexity. Error bounds of the proposed method measured by Frobenius and Spectral norms are provided. Experiments on real-world datasets show that the proposed framework outperforms the random feature model with the same space costs, and can outperform Nyström approximation at fixed space cost when the number of random features is large. Finally, we extend the application of computational techniques to molecular biology, employing Network Hamiltonian Models (NHMs) and machine learning to analyze amyloid fibril formation and infer fibril topology and kinetics from simulation data. This work demonstrates the power of combining statistical network modeling, machine learning, and Markov chain Monte Carlo (MCMC) sampling to address challenges in scientific research.
ISBN:9798288800528
Źródło:ProQuest Dissertations & Theses Global