2025
Belenguer-Llorens, Albert; Sevilla-Salcedo, Carlos; Tohka, Jussi; Gómez-Verdejo, Vanessa
Unified Bayesian representation for high-dimensional multi-modal biomedical data for small-sample classification Artículo de revista
En: Engineering Applications of Artificial Intelligence, vol. 160, pp. 111887, 2025, ISSN: 0952-1976.
Resumen | Enlaces | BibTeX | Etiquetas: Bayesian modeling, Machine learning health applications, Multi-modal data, Wide-data
@article{BELENGUERLLORENS2025111887,
title = {Unified Bayesian representation for high-dimensional multi-modal biomedical data for small-sample classification},
author = {Albert Belenguer-Llorens and Carlos Sevilla-Salcedo and Jussi Tohka and Vanessa G\'{o}mez-Verdejo},
url = {https://www.sciencedirect.com/science/article/pii/S0952197625018895},
doi = {https://doi.org/10.1016/j.engappai.2025.111887},
issn = {0952-1976},
year = {2025},
date = {2025-01-01},
journal = {Engineering Applications of Artificial Intelligence},
volume = {160},
pages = {111887},
abstract = {The increasing availability of multi-modal medical data, including neuroimaging, genetic profiles, and clinical measurements, offers unprecedented opportunities for advancing disease diagnosis and prognosis. However, integrating these heterogeneous data sources poses significant challenges due to their high dimensionality, redundancy, and small sample sizes, which hinder the effectiveness of traditional machine learning models. To overcome these challenges, we present the BAyesian Latent Data Unified Representation model (BALDUR), a novel Bayesian algorithm designed to deal with multi-modal datasets and small sample sizes in high-dimensional settings while providing explainable solutions. To do so, the proposed model combines within a common latent space the different data views to extract the relevant information to solve the classification task and prune out the irrelevant/redundant features/data views. Furthermore, to provide generalizable solutions in small sample size scenarios, BALDUR efficiently integrates dual kernels over the views with a small sample-to-feature ratio. Finally, its linear nature ensures the explainability of the model outcomes, allowing its use for biomarker identification. This model was tested over two different neurodegeneration datasets, outperforming the state-of-the-art models and detecting features aligned with markers already described in the scientific literature.},
keywords = {Bayesian modeling, Machine learning health applications, Multi-modal data, Wide-data},
pubstate = {published},
tppubtype = {article}
}
The increasing availability of multi-modal medical data, including neuroimaging, genetic profiles, and clinical measurements, offers unprecedented opportunities for advancing disease diagnosis and prognosis. However, integrating these heterogeneous data sources poses significant challenges due to their high dimensionality, redundancy, and small sample sizes, which hinder the effectiveness of traditional machine learning models. To overcome these challenges, we present the BAyesian Latent Data Unified Representation model (BALDUR), a novel Bayesian algorithm designed to deal with multi-modal datasets and small sample sizes in high-dimensional settings while providing explainable solutions. To do so, the proposed model combines within a common latent space the different data views to extract the relevant information to solve the classification task and prune out the irrelevant/redundant features/data views. Furthermore, to provide generalizable solutions in small sample size scenarios, BALDUR efficiently integrates dual kernels over the views with a small sample-to-feature ratio. Finally, its linear nature ensures the explainability of the model outcomes, allowing its use for biomarker identification. This model was tested over two different neurodegeneration datasets, outperforming the state-of-the-art models and detecting features aligned with markers already described in the scientific literature.