Original Article | Open Access

Benchmarking and optimizing microbiome-based bioinformatics workflow for non-invasive detection of intestinal tumors

Views:  15
Microbiome Res Rep 2025;4:[Accepted].
Author Information
Article Notes
Cite This Article

Abstract

Background: The human gut microbiome is closely linked to disease states, offering substantial potential for novel disease detection tools based on machine learning (ML). However, variations in feature types, data preprocessing strategies, feature selection strategies, and classification algorithms can all impact the model’s predictive performance and robustness.

Methods: To develop an optimized and systematically evaluated workflow, we conducted a comprehensive evaluation of ML methods for classifying colorectal cancer (CRC) and adenoma (ADA) using 4,217 fecal samples from diverse global regions. The area under the receiver operating characteristic curve was used to quantify model performance. We benchmarked 6,468 unique analytical pipelines, defined by distinct tools, parameters, and algorithms, utilizing a dual validation strategy that included both cross-validation and leave-one-dataset-out validation.

Results: Our findings revealed that shotgun metagenomic (WGS) data generally outperformed 16S rRNA gene (16S) sequencing data, with features at the species-level genome bin (SGB), species, and genus levels demonstrating the greatest discriminatory power. For 16S data, ASV-based features yielded the best disease classification performance. Furthermore, the application of specific feature selection tools, such as the Wilcoxon rank-sum test method, combined with appropriate data normalization, also optimized model performance. Finally, in the algorithm selection phase, we identified ensemble learning models (XGBoost and Random Forest) as the best-performing classifier algorithms.

Conclusion: Based on the comprehensive evaluation results, we developed an optimized microbiome detection framework (MiDx) and validated its robust generalizability on an independent dataset, offering a systematic and practical framework for future 16S and WGS-based intestinal disease detection.

Keywords

Colorectal cancer, adenoma, machine learning, benchmarking

Cite This Article

Sun Y, Huang Y, Li R, Zhang J, Fan X, Su X. Benchmarking and optimizing microbiome-based bioinformatics workflow for non-invasive detection of intestinal tumors. Microbiome Res Rep 2025;4:[Accept]. http://dx.doi.org/10.20517/mrr.2025.75

Copyright

...
© The Author(s) 2025. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, sharing, adaptation, distribution and reproduction in any medium or format, for any purpose, even commercially, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Cite This Article 0 clicks
Share This Article
Scan the QR code for reading!
See Updates
Hot Topics
bifidobacteria | gut microbiota | microbiome | probiotics | bacteriophage | phages | antibiotics | microbial | infant gut | microbial ecology | intestinal microbiome | host-microbe interactions | intestinal fungi | microbial ecosystems | metagenomics | microbial DNA sequencing | bifidobacterium | genomic | irritable bowel syndrome |
Microbiome Research Reports
ISSN 2771-5965 (Online)

Portico

All published articles are preserved here permanently:

https://www.portico.org/publishers/oae/

Portico

All published articles are preserved here permanently:

https://www.portico.org/publishers/oae/