Statistical and Computational Analysis of Adversarial Training
Furkejuvvon:
| Publikašuvnnas: | ProQuest Dissertations and Theses (2025) |
|---|---|
| Váldodahkki: | |
| Almmustuhtton: |
ProQuest Dissertations & Theses
|
| Fáttát: | |
| Liŋkkat: | Citation/Abstract Full Text - PDF |
| Fáddágilkorat: |
Eai fáddágilkorat, Lasit vuosttaš fáddágilkora!
|
| Abstrákta: | Adversarial training is a powerful tool to hedge against data perturbations and distributional shifts, and has been widely used in Large Language Models [1, 2], computer vision [3], cybersecurity [4], etc. While the empirical risk minimization procedure optimizes the empirical loss, the adversarial training procedure seeks conservative solutions that optimize the worst-case loss. In general, there are two ways to define worst-case loss: Wassersteindistance-based [5] and perturbation-based [6, 7].In this thesis, we present a statistical and computational analysis of adversarial training. For the Wasserstein-distance-based adversarial training problem—also known as Wasserstein distributionally robust optimization—we explore both the computational aspects of the Wasserstein distance and the statistical properties of this framework. In the case of perturbation-based adversarial training, our focus is primarily on its statistical properties. Importantly, we establish computational and statistical foundations of adversarial training, including computational complexity, convergence rates, asymptotic distributions, and minimax optimality. Building on these insights, we propose potential improvements with provable theoretical guarantees.The following are more detailed descriptions for each chapter.In Chapter 1, we focus on the Wasserstein distance. It can be shown that computing the empirical Wasserstein distance in the Wasserstein-distance-based independence test is an optimal transport (OT) problem with a special structure. This observation inspires us to study a special type of OT problem and propose a modified Hungarian algorithm to solve it exactly. For the OT problem involving two marginals with m and n atoms (m ≥ n), respectively, the computational complexity of the proposed algorithm is O(m2n). The experiment results demonstrate that the proposed modified Hungarian algorithm compares favorably with the Hungarian algorithm, the well-known Sinkhorn algorithm, and the network simplex algorithm.In Chapter 2, we focus on the Wasserstein-distance-based adversarial training. We propose an adjusted Wasserstein distributionally robust estimator—based on a nonlinear transformation of the Wasserstein distributionally robust (WDRO) estimator in statistical learning. The classic WDRO estimator is asymptotically biased, while our adjusted WDRO estimator is asymptotically unbiased, resulting in a smaller asymptotic mean squared error. Further, under certain conditions, our proposed adjustment technique provides a general principle to de-bias asymptotically biased estimators. Specifically, we will investigate how the adjusted WDRO estimator is developed in the generalized linear model, including logistic regression, linear regression, and Poisson regression. Numerical experiments demonstrate the favorable practical performance of the adjusted estimator over the classic one. In Chapter 3 and Chapter 4, we focus on the perturbation-based adversarial training. In Chapter 3, we focus on adversarial training under ℓ∞-perturbation, which has recently attracted much research attention. The asymptotic behavior of the adversarial training estimator is investigated in the generalized linear model. The results imply that the asymptotic distribution of the adversarial training estimator under ℓ∞-perturbation could put a positive probability mass at 0 when the true parameter is 0, providing a theoretical guarantee of the associated sparsity-recovery ability. Alternatively, a two-step procedure is proposed— adaptive adversarial training, which could further improve the performance of adversarial training under ℓ∞-perturbation. Specifically, the proposed procedure could achieve asymptotic variable-selection consistency and unbiasedness. Numerical experiments are conducted to show the sparsity-recovery ability of adversarial training under ℓ∞-perturbation and to compare the empirical performance between classic adversarial training and adaptive adversarial training. In Chapter 4, we deliver a non-asymptotic consistency analysis of the adversarial training procedure under ℓ∞-perturbation in high-dimensional linear regression. It will be shown that, under the restricted eigenvalue condition, the associated convergence rate of prediction error can achieve the minimax rate up to a logarithmic factor in the highdimensional linear regression on the class of sparse parameters. Additionally, the group xvii adversarial training procedure is analyzed. Compared with classic adversarial training, it will be proved that the group adversarial training procedure enjoys a better prediction error upper bound under certain group-sparsity patterns. |
|---|---|
| ISBN: | 9798263326234 |
| Gáldu: | ProQuest Dissertations & Theses Global |