Each type of material or molecule interacts with light differently, producing a unique optical signature. Through optical spectroscopy, scientists can observe the distinct pattern that is created when light interacts with a specific material. Spectroscopic data can be used to characterize materials, identify molecules, and analyze biological samples. Interpreting the rich information available through optical spectra can be tricky, especially when the differences between spectral features are slight. To improve the speed and precision of spectral analysis, researchers at Rice University developed a machine learning algorithm tailored to this task. The new algorithm excels at interpreting the light signatures, or optical spectra, of molecules, materials, and disease biomarkers, potentially enabling faster, more accurate medical diagnoses and sample analysis. “Our tool is able to parse light-based data for very subtle signals that are usually hard to pick up on using traditional methods,” professor Shengxi Huang said. The algorithm, which is called Peak-Sensitive Elastic-net Logistic Regression (PSE-LR), produces a peak-sensitive feature importance map that makes the algorithm’s decision-making process transparent to the user. The peak-sensitive feature importance map highlights which parts of the spectrum contribute to a classification decision. This makes the data easier to interpret and verify, so appropriate action can be taken quickly. Researcher Ziyang Wang (left) and professor Shengxi Huang developed a machine learning algorithm to analyze spectral data and detect subtle spectral features in the data. Courtesy of Jeff Fitlow/Rice University. “Our algorithm was designed to focus on the most important parts of the signal — the peaks that matter most,” researcher Ziyang Wang said. The researchers compared PSE-LR to other machine learning models, including k-nearest neighbors (KNN), elastic-net logistic regression (E-LR), support vector machine (SVM), principal component analysis followed by linear discriminant analysis (PCA-LDA), XGBoost, and neural network (NN). PSE-LR demonstrated improved performance, especially in identifying subtle or overlapping spectral features, and achieved an F1-score of 0.93 and a feature sensitivity of 1.0. “Most models either miss the tiny details or are too complex to understand,” Wang said. “We aimed to fix that by building something both smart and explainable.” When the researchers applied PSE-LR to Raman and photoluminescence spectra, the algorithm showed that it could detect subtle spectral features in a variety of materials, molecules, and biosamples. For example, the algorithm identified the receptor-binding domain of SARS-CoV-2 spike protein in ultralow concentrations, and identified neuroprotective solutions in mouse brain tissue. It also classified samples of Alzheimer’s disease and suggested potential disease biomarkers. The algorithm also was able to distinguish between 2D semiconductors, classifying a WS2 monolayer and a WSe2/WS2 heterobilayer. “Imagine being able to detect early signs of diseases like Alzheimer’s or COVID-19 just by shining a light on a drop of fluid or a tissue sample,” Wang said. “Our work makes this possible by teaching computers how to better ‘read’ the signal of light scattered from tiny molecules.” The PSE-LR algorithm could enable the development of new diagnostics for identifying and treating disease. “The optical spectra of a tissue or other biological sample can reveal a lot about what’s happening inside the body,” Wang said. “This is important because faster and more accurate disease detection can lead to better treatments and save lives.” In addition to serving as a tool for medical analysis, PSE-LR could be used to analyze new materials for smart sensors and small diagnostic devices. It could also facilitate the development of nanodevices such as nanosensors and miniaturized spectrometers based on nanomaterials. The algorithm can be used with various spectroscopic methods. “These findings could help transform medical diagnostics and materials science, bringing us closer to a world where smart technologies help detect and respond to health problems faster and more effectively,” Wang said. The research was published in ACS Nano (www.doi.org/10.1021/acsnano.4c16037).