Automated Spectral Preprocessing via Bayesian Optimization for Chemometric Analysis of Milk Constituents

Uloženo v:
Podrobná bibliografie
Vydáno v:Foods vol. 14, no. 17 (2025), p. 2996-3024
Hlavní autor: Babatunde Habeeb Abolaji
Další autoři: McDougal, Owen M, Andersen, Timothy
Vydáno:
MDPI AG
Témata:
On-line přístup:Citation/Abstract
Full Text + Graphics
Full Text - PDF
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

MARC

LEADER 00000nab a2200000uu 4500
001 3249681403
003 UK-CbPIL
022 |a 2304-8158 
024 7 |a 10.3390/foods14172996  |2 doi 
035 |a 3249681403 
045 2 |b d20250101  |b d20251231 
084 |a 231462  |2 nlm 
100 1 |a Babatunde Habeeb Abolaji  |u Computer Science, Boise State University, Boise, ID 83725, USA; habeebbabatunde@u.boisestate.edu 
245 1 |a Automated Spectral Preprocessing via Bayesian Optimization for Chemometric Analysis of Milk Constituents 
260 |b MDPI AG  |c 2025 
513 |a Journal Article 
520 3 |a The preprocessing of infrared spectra can significantly improve predictive accuracy for protein, carbohydrate, lipid, or other nutrition components, yet optimal preprocessing selection is typically empirical, tedious, and dataset specific. This study introduces a Bayesian optimization-based framework designed for the automated selection of optimal spectral preprocessing pipelines within a chemometric modeling context. The framework was applied to mid-infrared spectra of milk to predict compositional parameters for fat, protein, lactose, and total solids. A total of 385 averaged spectra corresponding to 198 unique samples was split into a 70/30 ratio (training/test) using a group-aware Kennard-Stone algorithm, resulting in 269 averaged spectra (135 unique samples) for training and 116 spectra (58 unique samples) for testing. Six regression models: Elastic Net, Gradient Boosting Machines (GBM), Partial Least Squares (PLS), RidgeCV Regression, LassoLarsCV, and Support Vector Regression (SVR) were evaluated across three preprocessing conditions: (1) no preprocessing, (2) literature-derived custom preprocessing (e.g., MSC, SNV, and first and second derivatives), and (3) optimized preprocessing via the proposed Bayesian framework. Optimized preprocessing consistently outperformed other methods, with RidgeCV achieving the best performance for all components except lactose, where PLS slightly outperformed it. Improvements in predictive accuracy, particularly in terms of RMSEP were observed across all milk components. The best RMSEP results were achieved for protein (RMSEP = 0.054, <inline-formula>R2=0.981</inline-formula>) and lactose (RMSEP = 0.026, <inline-formula>R2=0.917</inline-formula>), followed by fat (RMSEP = 0.139, <inline-formula>R2=0.926</inline-formula>) and total solids (RMSEP = 0.154, <inline-formula>R2=0.960</inline-formula>). Literature-based pipelines demonstrated inconsistent effectiveness, highlighting the limitations of transferring preprocessing methods between datasets. The Bayesian optimization approach identified relatively simple yet highly effective preprocessing pipelines, typically involving few steps. By eliminating manual trial and error, this data-driven strategy offers a robust and generalizable solution that streamlines spectral modeling in dairy analysis and can be readily applied to other types of spectroscopic data across various domains. 
653 |a Accuracy 
653 |a Datasets 
653 |a Regression analysis 
653 |a Regression models 
653 |a Optimization 
653 |a Infrared analysis 
653 |a Homogenization 
653 |a Training 
653 |a Automation 
653 |a Peptides 
653 |a Oils & fats 
653 |a Infrared spectra 
653 |a Dietary minerals 
653 |a Proteins 
653 |a Milk 
653 |a Carbohydrates 
653 |a Preprocessing 
653 |a Bayesian analysis 
653 |a Spectrum analysis 
653 |a Support vector machines 
653 |a Lactose 
653 |a Pipelines 
653 |a Process controls 
653 |a Effectiveness 
653 |a Lipids 
653 |a Meat quality 
653 |a Mathematical models 
653 |a Infrared radiation 
653 |a Chemometrics 
653 |a Uniqueness 
700 1 |a McDougal, Owen M  |u Department of Chemistry and Biochemistry, Boise State University, Boise, ID 83725, USA; owenmcdougal@boisestate.edu 
700 1 |a Andersen, Timothy  |u Computer Science, Boise State University, Boise, ID 83725, USA; habeebbabatunde@u.boisestate.edu 
773 0 |t Foods  |g vol. 14, no. 17 (2025), p. 2996-3024 
786 0 |d ProQuest  |t Agriculture Science Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3249681403/abstract/embedded/L8HZQI7Z43R0LA5T?source=fedsrch 
856 4 0 |3 Full Text + Graphics  |u https://www.proquest.com/docview/3249681403/fulltextwithgraphics/embedded/L8HZQI7Z43R0LA5T?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3249681403/fulltextPDF/embedded/L8HZQI7Z43R0LA5T?source=fedsrch