A Machine Learning-based Approach for Quantification of Protein Secondary Structures from Discrete Frequency Infrared Images

Guardado en:
Detalles Bibliográficos
Publicado en:bioRxiv (Jan 13, 2025)
Autor principal: Edmonds, Harrison
Otros Autores: Mukherjee, Sudipta, Holcombe, Brooke, Yeh, Kevin, Bhargava, Rohit, Ghosh, Ayanjeet
Publicado:
Cold Spring Harbor Laboratory Press
Materias:
Acceso en línea:Citation/Abstract
Full text outside of ProQuest
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!

MARC

LEADER 00000nab a2200000uu 4500
001 3154980646
003 UK-CbPIL
022 |a 2692-8205 
024 7 |a 10.1101/2025.01.08.632028  |2 doi 
035 |a 3154980646 
045 0 |b d20250113 
100 1 |a Edmonds, Harrison 
245 1 |a A Machine Learning-based Approach for Quantification of Protein Secondary Structures from Discrete Frequency Infrared Images 
260 |b Cold Spring Harbor Laboratory Press  |c Jan 13, 2025 
513 |a Working Paper 
520 3 |a Discrete frequency infrared (IR) imaging is an exciting experimental technique that has shown promise in various applications in biomedical science. This technique often involves acquiring IR absorptive images at specific frequencies of interest that enable pathologically relevant chemical contrast. However, certain applications, such as tracking the spatial variations in protein secondary structure of tissue specimens, necessary for the characterization of neurodegenerative diseases, require deeper analysis of spectral data. In such cases, the conventional analytical approach involves band fitting the hyperspectral data to extract the relative populations of different structures through their fitted areas under the curve (AUCs). While Gaussian spectral fitting for one spectrum is viable, expanding that to an image with millions of pixels, as often applicable for tissue specimens, becomes a computationally expensive process. Alternatives like Principal Component Analysis (PCA) are less structurally interpretable and incompatible with sparsely sampled data. Furthermore, this detracts from the key advantages of discrete frequency imaging by necessitating acquisition of a more finely sampled spectral data that is optimal for curve fitting, resulting in significantly longer data acquisition times, larger datasets and additional computational overhead. In this work we demonstrate that a simple two-step regressive neural network model can be utilized to mitigate these challenges and employ discrete frequency imaging for retrieving the results from band fitting without significant loss of fidelity. Our model reduces the data acquisition time nearly 6-fold by requiring only seven wavenumbers to accurately interpolate spectral information at a higher resolution, and subsequently using the upscaled spectra to accurate predict the component AUCs, which is more than 3000 times faster than spectral fitting. Our approach thus drastically cuts down the data acquisition and analysis time and predicts key differences in protein structure that can be vital towards broadening potential applications of discrete frequency imaging.Competing Interest StatementThe authors have declared no competing interest. 
653 |a Image processing 
653 |a Spatial variations 
653 |a Protein structure 
653 |a Information processing 
653 |a Protein folding 
653 |a Data acquisition 
653 |a Neurodegenerative diseases 
653 |a Secondary structure 
653 |a Principal components analysis 
653 |a Neural networks 
653 |a Proteins 
700 1 |a Mukherjee, Sudipta 
700 1 |a Holcombe, Brooke 
700 1 |a Yeh, Kevin 
700 1 |a Bhargava, Rohit 
700 1 |a Ghosh, Ayanjeet 
773 0 |t bioRxiv  |g (Jan 13, 2025) 
786 0 |d ProQuest  |t Biological Science Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3154980646/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full text outside of ProQuest  |u https://www.biorxiv.org/content/10.1101/2025.01.08.632028v1