Theoretical Study of Neural Network Models: Error Bounds on Approximation, Generalization, and Optimization With Applications to Regression and Partial Differential Equations

Guardat en:
Dades bibliogràfiques
Publicat a:PQDT - Global (2025)
Autor principal: Lai, Yanming
Publicat:
ProQuest Dissertations & Theses
Matèries:
Accés en línia:Citation/Abstract
Full Text - PDF
Full text outside of ProQuest
Etiquetes: Afegir etiqueta
Sense etiquetes, Sigues el primer a etiquetar aquest registre!

MARC

LEADER 00000nab a2200000uu 4500
001 3273657646
003 UK-CbPIL
020 |a 9798263313067 
035 |a 3273657646 
045 2 |b d20250101  |b d20251231 
084 |a 189128  |2 nlm 
100 1 |a Lai, Yanming 
245 1 |a Theoretical Study of Neural Network Models: Error Bounds on Approximation, Generalization, and Optimization With Applications to Regression and Partial Differential Equations 
260 |b ProQuest Dissertations & Theses  |c 2025 
513 |a Dissertation/Thesis 
520 3 |a The remarkable success of machine learning in computer vision, natural language processing, and related domains has spurred vigorous development of its theoretical foundations. Neural networks are a core model of machine learning, achieving complex function approximation and automatic feature learning by simulating the connections between neurons in the human brain. As a nonparametric estimation method, the error analysis of machine learning typically comprises three fundamental components: approximation error, generalization error, and optimization error. This thesis presents a comprehensive investigation of these three error types for neural networks, along with convergence rate analysis when applied to solving both regression problems and partial differential equations (PDEs). In the first part of this thesis, we derive the convergence rate for solving regression problems using three-layer logistic overparameterized feedforward neural networks (FNNs) trained with gradient descent (GD). In the second part, within the framework of the Deep Ritz Method (DRM), we derive the convergence rate for solving second-order elliptic equations with three different types of boundary conditions using three-layer overparameterized tanh FNNs trained with projected gradient descent (PGD). In both parts, following the tradition of nonparametric estimation, our error bounds are expressed in terms of the sample size n. Our results also provide a quantitative description of various parameters, including the depth and width of the neural network, the training step size and number of iterations of the optimization algorithm. In the third part of this thesis, we focus on the expressive power (approximation capability) of the Transformer model, which has recently demonstrated strong performance in large language models but remain theoretically underexplored compared to classical FNNs with extensive literature. Specifically, we investigate the approximation of the Hölder continuous function class by Transformers and construct several Transformers that can overcome the curse of dimensionality. These results demonstrate that Transformers possess super expressive power. 
653 |a Language 
653 |a Machine learning 
653 |a Sample size 
653 |a Neurons 
653 |a Partial differential equations 
653 |a Deep learning 
653 |a Investigations 
653 |a Artificial intelligence 
653 |a Network management systems 
653 |a Computer vision 
653 |a Power 
653 |a Neural networks 
653 |a Brain 
653 |a Numerical analysis 
653 |a Eigenvalues 
653 |a Natural language processing 
653 |a Error analysis 
653 |a Boundary conditions 
653 |a Optimization algorithms 
653 |a Data compression 
653 |a Mathematics 
773 0 |t PQDT - Global  |g (2025) 
786 0 |d ProQuest  |t ProQuest Dissertations & Theses Global 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3273657646/abstract/embedded/L8HZQI7Z43R0LA5T?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3273657646/fulltextPDF/embedded/L8HZQI7Z43R0LA5T?source=fedsrch 
856 4 0 |3 Full text outside of ProQuest  |u https://doi.org/10.14711/thesis-hdl152378