The times they are AI-changing: AI-powered advances in the application of extracellular vesicles to liquid biopsy in breast cancer
Abstract
Artificial intelligence (AI) is revolutionizing scientific research by facilitating a paradigm shift in data analysis and discovery. This transformation is characterized by a fundamental change in scientific methods and concepts due to AI’s ability to process vast datasets with unprecedented speed and accuracy. In breast cancer research, AI aids in early detection, prognosis, and personalized treatment strategies. Liquid biopsy, a noninvasive tool for detecting circulating tumor traits, could ideally benefit from AI’s analytical capabilities, enhancing the detection of minimal residual disease and improving treatment monitoring. Extracellular vesicles (EVs), which are key elements in cell communication and cancer progression, could be analyzed with AI to identify disease-specific biomarkers. AI combined with EV analysis promises an enhancement in diagnosis precision, aiding in early detection and treatment monitoring. Studies show that AI can differentiate cancer types and predict drug efficacy, exemplifying its potential in personalized medicine. Overall, the integration of AI in biomedical research and clinical practice promises significant changes and advancements in diagnostics, personalized medicine-based approaches, and our understanding of complex diseases like cancer.
Keywords
ARTIFICIAL INTELLIGENCE: THE OLD ROAD IS RAPIDLY AGING
The application of artificial intelligence (AI) in science is triggering a real-time paradigm shift in how research and discovery are conducted. A paradigm shift in science philosophy, a term coined by Thomas Kuhn in “The Structure of Scientific Revolutions”[1], refers to a “fundamental change in the basic concepts and experimental practices of a scientific discipline”. Paradigm shifts occur when the accumulation of anomalies - observations or problems that cannot be explained by the current paradigm - reaches a critical point. Such shifts are revolutionary rather than evolutionary, leading to a profound rethinking of scientific concepts and methods. As of 2024, AI enhances data analysis by processing vast datasets with unprecedented speed and accuracy, uncovering patterns and insights otherwise invisible[2,3]. This capacity of AI to reveal novel, previously undetected scientific pathways is where the potential for a paradigm shift lies[4]. Its ramifications are growing every day. AI is the general concept for the establishment of intelligent computational agents that carry out tasks demanding human-level intelligence[5]. There are different subsets of AI. In particular, machine learning (ML) focuses on algorithms that make systems improve themselves from output. ML algorithms, such as decision trees or support vector machines (SVMs), learn from data and make predictions or decisions[6]. While ML encompasses a wide range of techniques for teaching computers to learn from data, deep learning (DL) is a specialized subset that utilizes artificial neural networks with multiple layers to model complex patterns and search for concise outputs[7]. ML algorithms predict outcomes and optimize experimental conditions. AI-driven simulations and models enable complex systems exploration and facilitate interdisciplinary collaboration by integrating diverse data sources and methodologies. Data preparation is a very significant stage in ML, which includes cleaning, transformation, and organization of data to make it appropriate for model training[8]. It involves handling missing values, outliers, and noise (irrelevant or inaccurate information that could negatively impact a model’s performance); feature creation and transformation; normalization or standardization of data; class imbalance; and splitting data into training and testing. Once the data are prepared, model selection becomes crucial. This involves comparing different algorithms and tuning their hyperparameters using techniques like grid search and cross-validation. Grid search systematically explores different combinations of hyperparameters, while cross-validation helps to assess a model’s generalization performance. By carefully selecting the appropriate model and tuning its parameters, one can achieve optimal performance on unseen data.
Medical practice will not be an exception to the AI revolution. AI is enhancing diagnostic accuracy, treatment personalization, and healthcare efficiency[9]. AI-driven algorithms can analyze vast amounts of medical data to identify patterns and predict disease outcomes with greater precision than traditional methods. AI will soon be capable of designing personalized therapies, including drug repurposing, a promise that has been in development for decades. The new scenario connecting AI and biomedicine is a blooming field with new and exciting applications. Recent advances in predicting structure-activity relationships[10], designing self-assembly nanoparticles[11], or highly accurately predicting protein structure[12,13] are transforming our expectations on how computer science can be applied to biomedicine.
DL: ML’S NEXT EVOLUTION
ML transforms the inputs of an algorithm into outputs by using statistical, data-driven rules that are automatically inferred from a large set of examples, rather than being specified by humans[7]. DL is a form of representation learning, a ML technique that automatically discovers meaningful patterns, in which the input is raw data, and it is able to develop its own representations needed for pattern recognition[14]. The current revolution associated with the improved performance of DL vs a broader ML approach relies on DL’s capacity to accept multiple data types as input. Many biomedical datasets are composed of input data points (e.g., skin lesion images) and corresponding output data labels (e.g., “benign” or “malignant”). Neural networks, particularly DL, excel at processing complex biomedical data, such as genomics and proteomics, enabling the discovery of new biomarkers and therapeutic targets[15]. We can classify DL architectures into four groups: convolutional neural networks (CNNs), recurrent neural networks (RNNs), deep neural networks (DNNs), and emergent architectures.
CNNs are a type of neural network designed to process visual data, making them ideal for tasks like image and video recognition. CNNs are architectures particularly capable of processing image recognition tasks and consist of convolution layers, non-linear layers, and pooling layers[16]. RNNs are neural networks that are well-suited for processing sequential data, such as time series data. RNNs are designed to use sequential information of input data with cyclic connections among building blocks[17]. DNNs are renowned for their suitability in analyzing high-dimensional data. Their potential covers hierarchical representation learning methods, and could discover previously unknown patterns and correlations, providing a revolutionary way of looking at the data. However, their capabilities are not fully exploited due to a lack of data standardization and persistent challenges that hinder AI applications. One of the earliest DNN applications was on breast imaging, circa 1996[18]. Unfortunately, large public digital databases, crucial for algorithm training, are unavailable, further contributing to slow advances.
Diverse datasets require distinct approaches to achieve expected results. As mentioned, image data are best handled by CNNs due to their ability to recognize spatial hierarchies, while sequential data like time series or text are better processed by RNNs[9]. Some AI techniques, such as DL, require substantial computational power and large datasets, whereas others, like SVMs or k-nearest neighbors (KNN), can be effective with fewer data and lower computational requirements[19]. Other methods, like reinforcement learning, are particularly useful in scenarios involving decision making, whereas clustering algorithms are essential for segmenting datasets into meaningful groups. ML algorithms analyze datasets, identifying patterns and predicting outcomes, which is crucial for disease diagnosis and treatment planning.
BIOMEDICINE IN AI TIMES: DO NOT CRITICIZE WHAT YOU CANNOT UNDERSTAND
AI technologies are increasingly transforming the field of biomedicine, offering a range of benefits that enhance both healthcare delivery and patient outcomes. The advancements offer many opportunities for healthcare professionals and patients alike[20]. DL-related methods have become AI dominant applications in AI-based diagnosis and are used in multiple tasks, such as disease classification[21,22], Region Of Interest segmentation[23], medical object or specific cell subtype detection[24,25], and image registration[26]. DL applications to predict breast cancer (BC) from mammograms are leading to AI implementation in cancer diagnosis. McKinney et al. (2020) evaluated how good DL models could get, complementing mammography-based diagnosis[27]. They found that modeling could correctly identify many undiagnosed cancers. Even more, DL is able to classify histopathological images in BC[28] carcinoma and non-carcinoma with a 97.73% sensitivity for carcinoma classification, with an overall accuracy of 95.29%[29]. However, new ML avenues are yet to be explored. A combination of ML with an AdaBoost algorithm presents higher sensitivity (98.3%) and accuracy (97.2%), reporting 96.5% specificity and increased patient survival rate[30]. Boosting is a general ensemble method that creates a strong classifier built from several weak ones with an increased prediction capacity and minimized error. Genome-wide association studies demand algorithms due to large patient cohorts and confounders, many times unknown. Stochastic optimization of algorithms[31] adapted for DL provides new insights that, combined with other bioinformatics tools, would identify disease-associated causal mutations and help clarify confounder influence[32]. Determination of pathogenic variants in genetic diagnosis also benefits from DL, as the prediction of protein structure has transformed medical biochemistry[33,12]. DL systems could enhance targeted biomarker assays, for example, for gene expression profiles[12]. However, it is the combination of the previously described AI tools that will further advance its applicability. Functional enrichment analysis and Gene Set Enrichment Analysis are also AI techniques. They can analyze if a certain protein or gene expression contributes to statistically significant alterations in their expression, regulation pattern, or specific biological function. These are enrichment score analyses and rely on known biological pathways from existing databases such as the Kyoto Encyclopedia of Genes and Genomes (KEGG)[34] or Gene Ontological Resource (GO)[35]. Once DL organizes data patterns, functional enrichment tools map them to known biological pathways and functions. This integration allows researchers to gain deeper insights into the underlying biological mechanisms and potential associations with diseases. By bridging advanced data analysis with biological context, this approach enhances our understanding of complex biological systems and supports the discovery of novel therapeutic targets or biomarkers in biomedicine. Complemented with AI machinery, enrichment analysis provides a new direction to integrative “multiomics”.
Supervised learning models are trained on labeled medical images to detect abnormalities like tumors with high accuracy[36]. Other applications include natural language processing (NLP), which facilitates the extraction of valuable insights from unstructured clinical notes and research papers. Random forest (RF), created by Leo Breinman in 2001[37], is an ensemble learning method that combines the predictions of multiple decision trees, each trained on a different data subset and considering a random selection of features. The RF strategy enhances the accuracy and robustness of the model by reducing overfitting through the combination of multiple trees. RF is widely applied in biomedicine for its robustness and ability to handle high-dimensional data[38]. It is used in disease diagnosis and classification, helping to identify diseases such as cancer[39], diabetes[40], and cardiovascular disease[41] based on clinical and genomic data. RF also assists in biomarker discovery by identifying important variables from large datasets, enhancing personalized medicine[42]. Its versatility and accuracy make RF a valuable tool in biomedical research and clinical practice. Together, these AI techniques, summarized in Figure 1, are revolutionizing biomedicine by advancing diagnostics, personalized medicine, and biomedical research.
BC AND LIQUID BIOPSY
BC is one of the most common cancers in women, with 2.3 million new cases diagnosed globally in 2022[43]. BC shows different forms of presentation. Roughly 70% of all cases of BC are classified as sporadic, 20% as familial BC, and 10% as hereditary BC. A large proportion of BCs occur in a small population percentage, who are at increased risk of developing the disease[44]. Susceptibility to BC is multifactorial and many genetic variants and reproductive, hormonal, anthropomorphic, and lifestyle factors are considered associated risk factors. Each of these factors might have a modest effect on cancer risk, but when considered together with family history and known genetic factors, they can improve patient risk stratification. In a large study evaluating variation in putative BC susceptibility genes, authors found strong evidence of an association with BC for protein-truncating variants in 9 genes, including ATM, BRCA1, BRCA2, CHEK2, and others[45]. Patients carrying these mutations are considered at high risk of developing hereditary BC.
BC is a heterogeneous disease classified into three major intrinsic subtypes[46] based on immune-histochemical expression of hormone receptors (estrogen/progesterone receptor positive), and human epidermal growth factor receptor amplification (HER2). Hormone receptor-positive (luminal) BC accounts for 65% of tumors, and they show a low rate of distant relapse. Nearly 15% of all the tumors are classified as triple-negative breast cancer (TNBC), which lacks the expression of HER2 and hormone receptors, and 15-20% are HER2+ (non-luminal) subtype[47], both of which are associated with a worse prognosis. In addition to the outcome, treatment is determined by subtype. Thus, therapies are mainly based on endocrine therapy for luminal tumors, HER2 inhibitors for HER2+ tumors, and chemotherapy (CT) for TNBC. Despite the improvement in early detection and therapies, a group of patients will relapse over the years. Prognostic and predictive factors are useful tools to predict relapse risk and individualize anticancer drug therapies. The integration and AI shows promise in early diagnosis and in identifying patients at high risk of relapse during adjuvant and neoadjuvant treatments (NAT)[48]. Although adjuvant hormone therapy is the standard of care in early Luminal A BC, a subset of high-risk patients could also benefit from additional adjuvant CT for the prevention of recurrence. However, CT toxicity has a negative impact on the quality of life of these patients. Therefore, identifying these patients is essential to deciding the best therapeutic strategy, sparing unnecessary over-treatment and CT toxicity in patients with low risk, while increasing the lifesaving potential of CT treatment in high-risk patients. In order to accurately identify the genomic risk, several multigene expression profile platforms have been developed, such as Oncotype DX, Prosigna, or MammaPrint[49]. Oncotype DX is also a predictive test that provides information on the benefits of CT in adjuvant treatment[50]. Despite efforts to improve the adjuvant treatment management of these patients, between 25% and 40% of them will present a locoregional or distant relapse. In the same way, the benefit of NAT in locally advanced diseases varies based on the individual risk of relapse. HER2+ and TNBC subtypes show high sensitivity from NAT. Two out of three patients achieve pathological complete response (pCR) in TNBC, while only nearly half of patients in HER2+ BC[47]. Achieving pCR after NAT identifies patients with a lower risk of relapse, as indicated by improved relapse-free survival, both in HER2+ and TNBC[51]. However, non-pCR patients are at a higher risk of experiencing cancer recurrences, and these cohorts should be treated with additional adjuvant therapies. New strategies to predict and stratify patients based on their prognosis are highly demanded by patients and clinicians.
Liquid biopsy is a noninvasive diagnosis tool that detects circulating tumor components, including circulating tumor cells, circulating tumor DNA (ctDNA), circulating miRNAs, extracellular vesicles (EVs) and particles, soluble proteins, mRNAs, and other elements present in patients’ peripheral blood or other biological fluids. This biopsy strategy provides a comprehensive view of the disease status, enabling continuing analysis and aiding clinical decision making throughout specific therapeutic approaches[52]. Monitoring any dynamic alterations in liquid biopsy during treatment may report efficacy information to facilitate personalized medicine. Tumor mutations in ctDNA are being used to identify minimal residual disease in different types of cancer, including BC, with promising results[53,54,55]. So far, only a few studies based on deep sequencing and ctDNA dynamics have addressed liquid biopsy efficacy in patients receiving NAT for TNBC and HER2+ subtypes. Butler et al. analyzed ctDNA in three TNBC and three HER2+ patients, detecting ctDNA prior to the start of NAT based only on CT, decreasing during treatment in patients with pCR and increasing in rapid recurrences[56]. McDonald et al. demonstrated a decrease in ctDNA concentrations in patients with pCR during NAT, but only 7 and 9 patients with HER2+ and TNBC, respectively, were included in the series[57]. Similarly, Li et al. observed an association between response to NAT and ctDNA detection after two cycles in 44 BC patients (of whom 6 were TNBC and 9 HER2+) by Li and colleagues[58]. Magbanua et al. evaluated ctDNA in neoadjuvant-treated high-risk early BC patients (MammaPrint high score) included in the I-SPY 2 trial[59]. Eighty-four patients treated with a standard NAT alone or in combination with an AKT inhibitor were analyzed using an NGS panel (16 highly ranked somatic mutations). Probably, analysis of one alteration per patient impairs the real percentage of detection of molecular disease in liquid biopsy, making room for novel strategies.
This is where AI enters the stage. The scientific community is taking multiple approaches to integrate the potential of AI applications into BC research. Combined strategies aim to distinguish healthy and cancer cells[60]. Multi-modal and “-omics” ML integration aims to enhance drug-response prediction in BC patients, distinguishing non-responders and variable responders[61]. Some strategies try to improve the prediction accuracy of magnetic resonance imaging[62] or tissue morphology correlations[63]. Retrospective studies to expand the validity of mammograms could improve diagnosis and include accurate predictive risk[64,65]. Other applications aim to address tumor evolution based on oscillating gene expression[66], improve tumor subclassification[67], or integrate genotype-phenotype association[68]. ML has also been applied to prediction models for the impact of CT. In a cohort of 4,696 patients, the propensity-score-matched method was utilized to reduce covariable imbalance. Univariable and multivariable analyses were used to compare BC-specific survival and overall survival[69].
EVs AND AI
EVs are key factors for cell-to-cell communication, playing a role in metastatic dissemination and cancer progression[70,71]. These vesicles, released from almost all cell types and organisms studied, bear a resemblance to their cell of origin in terms of their protein, lipid, and nucleic acid content[70]. Several works have reported that EV-shed DNA allows the detection of mutations that reliably reflect the mutational state in the tumor of origin[72,73,74]. Even more, EV-associated DNA shows higher accordance with conventional tissue biopsy compared to the liquid biopsy of cfDNA[75]. According to their physical functions, EVs have been used in therapeutic agents, vaccination trials, regenerative medicine, and drug delivery[76,77,78]. The application of knowledge about EVs in liquid biopsy strategies spans various biological fluids[79]. For example, serum, plasma, and cerebrospinal fluid were EV-DNA sources for predictive detection of BRAF mutations in pediatric central nervous system tumors[80]. BRAF mutation was also the target of post-lymphadenectomy seroma analysis of EV-DNA obtained from melanoma patients[81]. The results from this study indicate that EV-derived DNA from seroma fluid may provide a promising tool for the detection of minimal residual disease in BC and melanoma, where lymph node removal is frequently performed. Moreover, the investigation into circulating EV-DNA in conjunction with other factors may provide a prognosis for various types of cancer. The fact that EVs contain a diverse array of contents suggests that the development of assays that analyze DNA alongside other biomolecules may enable personalized treatments for cancer. Traditionally, the validity of EVs as biomarkers has been hampered by sample and patient heterogeneity. A typical EV proteomic data set generally includes thousands of protein identifications, with low uniqueness. Therefore, the lack of appropriate tools to analyze and correlate their content with other parameters has limited EV application to clinical practice. The combination of AI and the study of EVs is on the rise as it could fill that gap. As we have discussed above, the multiple strategies encompassed by the general term AI engulf a variety of exciting new roads. Recently, Greenberg et al. reviewed EVs’ role in emerging drug delivery approaches and how AI could contribute to the field expansion[82]. The authors actively recommend following International Society for Extracellular Vesicles (ISEV) guidelines(MISEV)[83] to improve standardization and decrease the confusing information associated with particle heterogeneity. MISEV guidelines briefly discuss AI, but the society’s effort to standardize EV research would foster EV-related data applicability to AI research. AI application to basic research is in its early stages, and poor model standardization could delay effective implementation. One of the highlights of the novel relationship between AI and EVs was produced by Hoshino et al.[42]. The authors applied proteomics to investigate tumor EV heterogeneity to define marker targets using RF and principal component analysis. In their study, they standardized and analyzed hundreds of EV proteomic data sets using ML to identify EV markers and populations corresponding to specific tumor origins. RF implementation could potentially enable faster cancer patient diagnoses when applied to liquid biopsy. Employing datasets of EV-containing proteins from human cell lines, tissue, plasma, serum, and urine samples from a variety of cancers, other groups propose three panels of pan-cancer EVs proteins that distinguish cancer EVs from other vesicles and aid in classifying cancer subtypes employing RF models[84]. Using CNN, it is now possible to detect and profile apoptotic events, aiming to overcome the limitations associated with unspecific staining, poor timing in biological process measurements, or inconsistent and late indication of programmed cell death onset[85]. Considering the long conflict involving the differentiation of EVs and apoptotic bodies, the application of AI presents promising opportunities. Multi-omics strategies provide complex information that could help to elucidate interactions contributing to pathological states. For example, this approach was used to investigate the heterogeneous and hierarchical organization of lung adenocarcinoma using EVs. Their integrative analysis identified specific roles of RNA-enriched EVs in tumorigenesis, offering a new perspective on liquid biopsy biomarkers for lung adenocarcinoma diagnosis[86]. Integration of ML algorithms with DNA cascade reaction-triggered individual EV nanoencapsulation resulted in differential diagnosis accuracy, effectively distinguishing pathological and healthy liver conditions[87]. Combinatory strategies like all the previously mentioned illustrate the possibilities for cancer research.
EVS AND BC DIAGNOSIS WITH AN AI TOUCH
Since their inception, AI algorithms have been paving the way for new approaches to medical diagnosis for oncology. Techniques that range from classification to regression come into play, aided by ML for detecting patterns in medical data, while DL has great performances for medical imaging by CNNs and for time-series data based on RNNs[88]. NLP facilitates clinical text analysis with a wealth of useful information in the decision-making process. By leveraging these powerful algorithms, AI empowers oncologists to make more accurate and timely diagnoses, leading to improved patient outcomes. The paradigm shift has already been applied to BC research[89]. AI algorithms review a large volume of medical data, including mammograms[90,91] and genetic information, searching for patterns that could predict the disease more accurately[92]. This enables earlier detection and more personalized treatment[93]. The AI-powered drug discovery accelerates the development of new therapies by screening potential drug candidates at an unprecedented rate. By automating routine tasks and providing informed insight, AI thus enables researchers and clinicians to make better-informed decisions that will bring about real improvements in patient outcomes.
There is potential for large datasets analysis based on EVs that would help to identify patterns and BC biomarkers. This would enhance early detection, prognosis, and personalized treatment strategies by improving the accuracy and efficiency of diagnostic processes. The integration of AI with EV analysis represents a promising frontier in the fight against BC, offering the potential for more precise and noninvasive diagnostic options. Very few studies published in recent years have already combined AI and EVs to improve BC diagnosis, monitoring, or to advance our knowledge of drug efficacy [Figure 2]. However, there are reasonable expectations on how they will impact patient outcomes and prognosis. Total internal reflection fluorescence imaging combined with CNN enables simultaneous and accurate detection of multi-miRNAs at a single EV level. Through the evaluation of three miRNAs using this methodology, Zhang et al. confirmed the heterogeneity of EV miRNA expression, revealing that the main variation between EVs from five cancer cells and normal plasma is the triple-positive EV subpopulation[94]. The classification accuracy of single triple-positive EVs from six sources can reach above 95%. In the clinical cohort, 20 patients (Breast, lung, cervical and colon cancer, 5 patients each) and five healthy controls are predicted with an overall accuracy of 100%. Using a combination of DL and surface-enhanced Raman spectroscopy (SERS) immunoassay against HER2-overexpressing urinary EVs, some authors are trying to improve treatment efficacy monitoring in metastatic BC. SERS is a technique that significantly amplifies the Raman scattering signal of molecules adsorbed on certain metal surfaces. SERS is widely used in chemical and biological sensing, material science, and environmental monitoring due to its high sensitivity and specificity[95]. Drug efficacy was monitored via SERS-DL analysis using urinary EVs from trastuzumab-treated mice[96]. Although this is a preclinical application, it clearly illuminates new possible strategies. The combination of label-free EVs SERS and ML could serve as an innovative strategy for medical diagnosis and therapeutic intervention, as some authors are trying to implement it in various scenarios, such as renal injury induced by cisplatin[97]. DL algorithm trained with SERS spectra of EVs derived from cancer cells presents high prediction accuracy for human patients with different BC subtypes who do not undergo surgery[98].
Figure 2. Graphic summary of EV-based liquid biopsy studies applied to breast cancer clinical practice. EV: Extracellular vesicles.
Hoshino et al. studied characterized EVs isolated from cancer patient-resected tissue and plasma samples including BC samples. They showed that different cancer types, including pancreas, lung, or BC could be distinguished through specific combinations of EV proteins. These cancer-type-specific EV protein signatures could be used as a liquid biopsy tool to help diagnose and guide treatments for these patients. The usage of RF reduced the risk of over-fitting and made the method robust to outliers and noise in the input data[42].
SUMMARIZING THE FUTURE DIRECTIONS AND DOWNSIDES OF AI APPLICATION TO EV USAGE IN BC LIQUID BIOPSY
There is a bright, exciting and uncertain future for AI in BC research. With enhanced quality and quantity of data, refining DL architectures, and the development of explainable AI, researchers would make AI models more accurate and interpretable. These advances are poised to enable early detection with AI-enhanced imaging[90] and risk prediction, precise diagnosis using digital pathology and molecular subtyping, optimized treatment planning, and better monitoring and prognosis. Future AI research would be oriented toward improving model intelligibility to gain clinical trust, integrating multi-data platforms such as genomics, pathology, and radiology for personalized medicine, and employing federated learning to safeguard data privacy while enabling the use of larger, collaborative datasets. The implementation of AI-driven analysis of EVs in BC diagnosis faces several challenges. First, the collection and isolation of EVs from biological fluids requires standardization to ensure consistency and reliability. Advances in EV isolation, high-throughput technologies, and nanoscale engineering could facilitate the efficient and consistent collection of EV samples. Second, the development and validation of AI algorithms require extensive, high-quality datasets. Integration into clinical practice demands significant investments in infrastructure and training for healthcare professionals: clinicians need to understand and trust these systems before they can be widely adopted. Educational initiatives to train professionals in AI and bioinformatics would help integrate these technologies into clinical practice. However, it is not only a human problem. One major concern is the quality and representativeness of the training data used to generate AI applications, as biased or incomplete datasets can lead to inaccurate and non-generalizable AI models. High-quality datasets are the foundation and a critical point of robust and reliable AI models in medical diagnosis. To ensure the accuracy and generalizability of these models, careful attention must be paid to data collection, annotation, cleaning, and augmentation. Data should be collected from diverse populations to minimize bias and ensure the model’s ability to perform well in different patient groups. Accurate and consistent annotation of data is essential for training effective AI models. Data cleaning techniques, such as handling missing values and outliers, can significantly improve model performance. Data augmentation techniques, such as rotation, flipping, and adding noise, can help address the challenges of small and imbalanced datasets. Additionally, sharing and collaborating on datasets can facilitate the development of more powerful and reliable AI models. By addressing these key considerations, researchers can develop AI models that have the potential to improve patient outcomes. Algorithms being developed and used in health - most of them using patient data - pose critical ethical concerns[99]. This is important for compliance with applicable laws and regulations, including but not limited to Health Insurance Portability and Accountability Act (HIPAA) in the US and General Data Protection Regulation (GDPR) in the EU, for maintaining legal and ethical standards. It is crucial to balance data sharing with privacy and ethical considerations to protect patient confidentiality[100]. AI models often rely on vast amounts of patient data for training and operation, necessitating compliance with these regulations to ensure the ethical and secure use of sensitive health information. The arcane nature of many AI algorithms poses another challenge, as the lack of transparency in decision making could feed the already detectable trust and acceptance problems among clinicians and patients. As advancements and downsides converge and evolve, AI-driven EV analysis offers a promise to become a routine, noninvasive diagnostic tool that significantly improves early detection, patient outcomes, and personalized treatment strategies in BC treatment. The times, they are changing.
DECLARATIONS
Authors’ contributions
Elaborated the figures: Gomez Del Pulgar ME, Guamán HM
Wrote the manuscript, reviewed the literature, and conceptualized the aim: Benito-Martin A,
All authors reviewed and edited the manuscript.
Availability of data and materials
Not applicable.
Financial support and sponsorship
This work was funded by Instituto de Salud Carlos III (ISCIII) and by Fundación Universidad Alfonso X El Sabio, which support Benito-Martin A. Benito-Martin A is supported by (Miguel Servet ProgramCP23/00046) from the Instituto de Salud Carlos III and co-financed by the European Development Regional Fund (FEDER) “A way to achieve Europe” (ERDF). García-Barberán V is supported by (Grant PID2022-142361OB-I00) funded by MCIN/AEI/10.13039/501100011033/ERDF,EU. The authors would like to thank all the personnel at IDISCC for providing administrative support.
Conflicts of interest
All authors declared that there are no conflicts of interest.
Ethical approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Copyright
© The Author(s) 2025.
REFERENCES
2. Yin J, Ngiam KY, Teo HH. Role of artificial intelligence applications in real-life clinical practice: systematic review. J Med Internet Res. 2021;23:e25759.
3. Bhattamisra SK, Banerjee P, Gupta P, Mayuren J, Patra S, Candasamy M. Artificial intelligence in pharmaceutical and healthcare research. BDCC. 2023;7:10.
4. Nagendran M, Chen Y, Lovejoy CA, et al. Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies. BMJ. 2020;368:m689.
5. Mueller ST, Hoffman RR, Clancey W, Emrey A, Klein G. Explanation in human-AI systems: a literature meta-review, synopsis of key ideas and publications, and bibliography for explainable AI. arXiv 2019;arXiv:1902.01876. Available from https://arxiv.org/abs/1902.01876 [accessed 11 February 2025].
6. Greener JG, Kandathil SM, Moffat L, Jones DT. A guide to machine learning for biologists. Nat Rev Mol Cell Biol. 2022;23:40-55.
7. Choi RY, Coyner AS, Kalpathy-Cramer J, Chiang MF, Campbell JP. Introduction to machine learning, neural networks, and deep learning. Transl Vis Sci Technol. 2020;9:14.
8. Nayarisseri A, Khandelwal R, Tanwar P, et al. Artificial intelligence, big data and machine learning approaches in precision medicine & drug discovery. Curr Drug Targets. 2021;22:631-55.
9. Haug CJ, Drazen JM. Artificial intelligence and machine learning in clinical medicine, 2023. N Engl J Med. 2023;388:1201-8.
10. Tropsha A, Golbraikh A. Predictive QSAR modeling workflow, model applicability domains, and virtual screening. Curr Pharm Des. 2007;13:3494-504.
11. Reker D, Rybakova Y, Kirtane AR, et al. Computationally guided high-throughput design of self-assembling drug nanoparticles. Nat Nanotechnol. 2021;16:725-33.
12. Tunyasuvunakool K, Adler J, Wu Z, et al. Highly accurate protein structure prediction for the human proteome. Nature. 2021;596:590-6.
13. Cheng J, Novati G, Pan J, et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science. 2023;381:eadg7492.
17. Das S, Tariq A, Santos T, Kantareddy SS, Banerjee I. Recurrent neural networks (RNNs): architectures, training tricks, and introduction to influential research. In: Colliot O, Editor. Machine learning for brain disorders. New York: Springer US; 2023. pp. 117-38.
18. Sahiner B, Chan HP, Petrick N, et al. Classification of mass and normal breast tissue: a convolution neural network classifier with spatial domain and texture images. IEEE Trans Med Imaging. 1996;15:598-610.
19. Dou B, Zhu Z, Merkurjev E, et al. Machine learning methods for small data challenges in molecular science. Chem Rev. 2023;123:8736-80.
20. Alowais SA, Alghamdi SS, Alsuhebany N, et al. Revolutionizing healthcare: the role of artificial intelligence in clinical practice. BMC Med Educ. 2023;23:689.
21. Li X, Jia M, Islam MT, Yu L, Xing L. Self-supervised feature learning via exploiting multi-modal data for retinal disease diagnosis. IEEE Trans Med Imaging. 2020;39:4023-33.
22. Shorfuzzaman M, Hossain MS. MetaCOVID: a Siamese neural network framework with contrastive loss for n-shot diagnosis of COVID-19 patients. Pattern Recognit. 2021;113:107700.
23. Fan DP, Zhou T, Ji GP, et al. Inf-Net: automatic COVID-19 lung infection segmentation from CT images. IEEE Trans Med Imaging. 2020;39:2626-37.
24. Swiderska-Chadaj Z, Pinckaers H, van Rijthoven M, et al. Learning to detect lymphocytes in immunohistochemistry with deep learning. Med Image Anal. 2019;58:101547.
25. Mei J, Cheng MM, Xu G, Wan LR, Zhang H. SANet: a slice-aware network for pulmonary nodule detection. IEEE Trans Pattern Anal Mach Intell. 2022;44:4374-87.
26. Esteva A, Kuprel B, Novoa RA, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542:115-8.
27. McKinney SM, Sieniek M, Godbole V, et al. International evaluation of an AI system for breast cancer screening. Nature. 2020;577:89-94.
28. Chan RC, To CKC, Cheng KCT, Yoshikazu T, Yan LLA, Tse GM. Artificial intelligence in breast cancer histopathology. Histopathology. 2023;82:198-210.
29. Nomani A, Ansari Y, Nasirpour MH, Masoumian A, Pour ES, Valizadeh A. PSOWNNs-CNN: a computational radiology for breast cancer diagnosis improvement based on image processing using machine learning methods. Comput Intell Neurosci. 2022;2022:5667264.
30. Zheng J, Lin D, Gao Z, Wang S, He M, Fan J. Deep learning assisted efficient AdaBoost algorithm for breast cancer detection and early diagnosis. IEEE Access. 2020;8:96946-54.
31. Loh PR, Tucker G, Bulik-Sullivan BK, et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat Genet. 2015;47:284-90.
32. Lee SI, Dudley AM, Drubin D, et al. Learning a prior on regulatory potential from eQTL data. PLoS Genet. 2009;5:e1000358.
33. Kircher M, Witten DM, Jain P, O’Roak BJ, Cooper GM, Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet. 2014;46:310-5.
34. Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28:27-30.
35. Ashburner M, Ball CA, Blake JA, et al. Gene ontology: tool for the unification of biology. The gene ontology consortium. Nat Genet. 2000;25:25-9.
36. Litjens G, Kooi T, Bejnordi BE, et al. A survey on deep learning in medical image analysis. Med Image Anal. 2017;42:60-88.
38. Camacho DM, Collins KM, Powers RK, Costello JC, Collins JJ. Next-generation machine learning for biological networks. Cell. 2018;173:1581-92.
39. Li Y, Huang C, Ding L, Li Z, Pan Y, Gao X. Deep learning in bioinformatics: introduction, application, and perspective in the big data era. Methods. 2019;166:4-21.
40. Kavakiotis I, Tsave O, Salifoglou A, Maglaveras N, Vlahavas I, Chouvarda I. Machine learning and data mining methods in diabetes research. Comput Struct Biotechnol J. 2017;15:104-16.
41. Ribeiro AH, Ribeiro MH, Paixão GMM, et al. Automatic diagnosis of the 12-lead ECG using a deep neural network. Nat Commun. 2020;11:1760.
42. Hoshino A, Kim HS, Bojmar L, et al. Extracellular vesicle and particle biomarkers define multiple human cancers. Cell. 2020;182:1044-61.e18.
43. World Health Organization. Obesity and overweight. Available from https://www.who.int/news-room/fact-sheets/detail/obesity-and-overweight [accessed 11 February 2025].
44. Garcia-Closas M, Gunsoy NB, Chatterjee N. Combined associations of genetic and environmental risk factors: implications for prevention of breast cancer. J Natl Cancer Inst. 2014;106:dju305.
45. Dorling L, Carvalho S, Allen J, et al; Breast Cancer Association Consortium. Breast cancer risk genes - association analysis in more than 113,000 women. N Engl J Med. 2021;384:428-39.
46. Dai X, Li T, Bai Z, Yang Y, Liu X, Zhan J, et al. Breast cancer intrinsic subtype classification, clinical use and future trends. Am J Cancer Res. 2015;5:2929-43.
47. Korde LA, Somerfield MR, Carey LA, et al. Neoadjuvant chemotherapy, endocrine therapy, and targeted therapy for breast cancer: ASCO guideline. J Clin Oncol. 2021;39:1485-505.
48. Keup C, Kimmig R, Kasimir-Bauer S. The diversity of liquid biopsies and their potential in breast cancer management. Cancers. 2023;15:5463.
49. Munkácsy G, Santarpia L, Győrffy B. Gene expression profiling in early breast cancer-patient stratification based on molecular and tumor microenvironment features. Biomedicines. 2022;10:248.
50. Kalinsky K, Barlow WE, Gralow JR, et al. 21-gene assay to inform chemotherapy benefit in node-positive breast cancer. N Engl J Med. 2021;385:2336-47.
51. Yee D, DeMichele AM, Yau C, et al; I-SPY2 Trial Consortium. Association of event-free and distant recurrence-free survival with individual-level pathologic complete response in neoadjuvant treatment of stages 2 and 3 breast cancer: three-year follow-up analysis for the I-SPY2 adaptively randomized clinical trial. JAMA Oncol. 2020;6:1355-62.
52. Arneth B. Update on the types and usage of liquid biopsies in the clinical setting: a systematic review. BMC Cancer. 2018;18:527.
53. Li J, Jiang W, Wei J, et al. Patient specific circulating tumor DNA fingerprints to monitor treatment response across multiple tumors. J Transl Med. 2020;18:293.
54. Cescon DW, Kalinsky K, Parsons HA, et al. Therapeutic targeting of minimal residual disease to prevent late recurrence in hormone-receptor positive breast cancer: challenges and new approaches. Front Oncol. 2021;11:667397.
55. Parsons HA, Rhoades J, Reed SC, et al. Sensitive detection of minimal residual disease in patients treated for early-stage breast cancer. Clin Cancer Res. 2020;26:2556-64.
56. Butler TM, Boniface CT, Johnson-Camacho K, et al. Circulating tumor DNA dynamics using patient-customized assays are associated with outcome in neoadjuvantly treated breast cancer. Cold Spring Harb Mol Case Stud. 2019;5:a003772.
57. McDonald BR, Contente-Cuomo T, Sammut SJ, et al. Personalized circulating tumor DNA analysis to detect residual disease after neoadjuvant therapy in breast cancer. Sci Transl Med. 2019;11:eaax7392.
58. Li S, Lai H, Liu J, et al. Circulating tumor DNA predicts the response and prognosis in patients with early breast cancer receiving neoadjuvant chemotherapy. JCO Precis Oncol. 2020;4:PO.19.00292.
59. Magbanua MJM, Swigart LB, Wu HT, et al. Circulating tumor DNA in neoadjuvant-treated breast cancer reflects response and survival. Ann Oncol. 2021;32:229-39.
60. Hua H, Deng Y, Zhang J, Zhou X, Zhang T, Khoo BL. AIEgen-deep: deep learning of single AIEgen-imaging pattern for cancer cell discrimination and preclinical diagnosis. Biosens Bioelectron. 2024;253:116086.
61. Rashid MM, Selvarajoo K. Advancing drug-response prediction using multi-modal and -omics machine learning integration (MOMLIN): a case study on breast cancer clinical data. Brief Bioinform. 2024;25:bbae300.
62. Chen Q, Zhang J, Meng R, et al. Modality-specific information disentanglement from multi-parametric MRI for breast tumor segmentation and computer-aided diagnosis. IEEE Trans Med Imaging. 2024;43:1958-71.
63. Li J, Cheng J, Meng L, et al. DeepTree: pathological image classification through imitating tree-like strategies of pathologists. IEEE Trans Med Imaging. 2024;43:1501-12.
64. Liu Y, Sorkhei M, Dembrower K, Azizpour H, Strand F, Smith K. Use of an AI score combining cancer signs, masking, and risk to select patients for supplemental breast cancer screening. Radiology. 2024;311:e232535.
65. Donnelly J, Moffett L, Barnett AJ, et al. AsymMirai: interpretable mammography-based deep learning model for 1-5-year breast cancer risk prediction. Radiology. 2024;310:e232780.
66. Hossain I, Fanfani V, Fischer J, Quackenbush J, Burkholz R. Biologically informed NeuralODEs for genome-wide regulatory dynamics. Genome Biol. 2024;25:127.
67. Xi Y, Zheng K, Deng F, et al. Themis: advancing precision oncology through comprehensive molecular subtyping and optimization. Brief Bioinform. 2024;25:bbae261.
68. Yao X, Ouyang S, Lian Y, et al. PheSeq, a Bayesian deep learning model to enhance and interpret the gene-disease association studies. Genome Med. 2024;16:56.
69. Huang K, Zhang J, Yu Y, Lin Y, Song C. The impact of chemotherapy and survival prediction by machine learning in early Elderly Triple Negative Breast Cancer (eTNBC): a population based study from the SEER database. BMC Geriatr. 2022;22:268.
70. Couch Y, Buzàs EI, Di Vizio D, et al. A brief history of nearly EV-erything - the rise and rise of extracellular vesicles. J Extracell Vesicles. 2021;10:e12144.
71. Wortzel I, Dror S, Kenific CM, Lyden D. Exosome-mediated metastasis: communication from a distance. Dev Cell. 2019;49:347-60.
72. Kahlert C, Melo SA, Protopopov A, et al. Identification of double-stranded genomic DNA spanning all chromosomes with mutated KRAS and p53 DNA in the serum exosomes of patients with pancreatic cancer. J Biol Chem. 2014;289:3869-75.
73. Vagner T, Spinelli C, Minciacchi VR, et al. Large extracellular vesicles carry most of the tumour DNA circulating in prostate cancer patient plasma. J Extracell Vesicles. 2018;7:1505403.
74. García-Romero N, Madurga R, Rackov G, et al. Polyethylene glycol improves current methods for circulating extracellular vesicle-derived DNA isolation. J Transl Med. 2019;17:75.
75. Che H, Stanley K, Jatsenko T, Thienpont B, Vermeesch JR. Expanded knowledge of cell-free DNA biology: potential to broaden the clinical utility. Extracell Vesicles Circ Nucl Acids. 2022;3:216-34.
76. Fais S, O’Driscoll L, Borras FE, et al. Evidence-based clinical use of nanoscale extracellular vesicles in nanomedicine. ACS Nano. 2016;10:3886-99.
77. Hou C, Wu Q, Xu L, et al. Exploiting the potential of extracellular vesicles as delivery vehicles for the treatment of melanoma. Front Bioeng Biotechnol. 2022;10:1054324.
78. Tamura T, Yoshioka Y, Sakamoto S, Ichikawa T, Ochiya T. Extracellular vesicles as a promising biomarker resource in liquid biopsy for cancer. Extracell Vesicles Circ Nucl Acids. 2021;2:148-74.
79. González E, Falcón-Pérez JM. Cell-derived extracellular vesicles as a platform to identify low-invasive disease biomarkers. Expert Rev Mol Diagn. 2015;15:907-23.
80. García-Romero N, Carrión-Navarro J, Areal-Hidalgo P, et al. BRAF V600E detection in liquid biopsies from pediatric central nervous system tumors. Cancers. 2019;12:66.
81. García-Silva S, Benito-Martín A, Sánchez-Redondo S, et al. Use of extracellular vesicles from lymphatic drainage as surrogate markers of melanoma progression and BRAFV600E mutation. J Exp Med. 2019;216:1061-70.
82. Greenberg ZF, Graim KS, He M. Towards artificial intelligence-enabled extracellular vesicle precision drug delivery. Adv Drug Deliv Rev. 2023;199:114974.
83. Welsh JA, Goberdhan DCI, O’Driscoll L, et al; MISEV Consortium. Minimal information for studies of extracellular vesicles (MISEV2023): from basic to advanced approaches. J Extracell Vesicles. 2024;13:e12404.
84. Li B, Kugeratski FG, Kalluri R. A novel machine learning algorithm selects proteome signature to specifically identify cancer exosomes. Elife. 2024;12:RP90390.
85. Wu KL, Martinez-Paniagua M, Reichel K, et al. Automated detection of apoptotic bodies and cells in label-free time-lapse high-throughput video microscopy using deep convolutional neural networks. Bioinformatics. 2023;39:btad584.
86. Luo HT, Zheng YY, Tang J, et al. Dissecting the multi-omics atlas of the exosomes released by human lung adenocarcinoma stem-like cells. NPJ Genom Med. 2021;6:48.
87. Li X, Liu Y, Fan Y, et al. Advanced nanoencapsulation-enabled ultrasensitive analysis: unraveling tumor extracellular vesicle subpopulations for differential diagnosis of hepatocellular carcinoma via DNA cascade reactions. ACS Nano. 2024;18:11389-403.
88. Sheth D, Giger ML. Artificial intelligence in the interpretation of breast cancer on MRI. J Magn Reson Imaging. 2020;51:1310-24.
89. Nicolis O, De Los Angeles D, Taramasco C. A contemporary review of breast cancer risk factors and the role of artificial intelligence. Front Oncol. 2024;14:1356014.
90. Le EPV, Wang Y, Huang Y, Hickman S, Gilbert FJ. Artificial intelligence in breast imaging. Clin Radiol. 2019;74:357-66.
91. Balkenende L, Teuwen J, Mann RM. Application of deep learning in breast cancer imaging. Semin Nucl Med. 2022;52:584-96.
92. Mahichi H, Ghods V, Sohrabi MK, Sabbaghi A. BreastCDNet: breast cancer detection neural network, classification and localization.
93. Ahn JS, Shin S, Yang SA, et al. Artificial intelligence in breast cancer diagnosis and personalized medicine. J Breast Cancer. 2023;26:405-35.
94. Zhang XW, Qi GX, Liu MX, et al. Deep learning promotes profiling of multiple miRNAs in single extracellular vesicles for cancer diagnosis. ACS Sens. 2024;9:1555-64.
95. Zhang Y, Chang K, Ogunlade B, et al. From genotype to phenotype: raman spectroscopy and machine learning for label-free single-cell analysis. ACS Nano. 2024;18:18101-17.
96. Kim J, Son HY, Lee S, et al. Deep learning-assisted monitoring of trastuzumab efficacy in HER2-overexpressing breast cancer via SERS immunoassays of tumor-derived urinary exosomal biomarkers. Biosens Bioelectron. 2024;258:116347.
97. Zhuang Y, Ouyang Y, Ding L, et al. Source tracing of kidney injury via the multispectral fingerprint identified by machine learning-driven surface-enhanced raman spectroscopic analysis. ACS Sens. 2024;9:2622-33.
98. Xie Y, Su X, Wen Y, Zheng C, Li M. Artificial Intelligent label-free SERS profiling of serum exosomes for breast cancer diagnosis and postoperative assessment. Nano Lett. 2022;22:7910-8.
99. Elendu C, Amaechi DC, Elendu TC, et al. Ethical implications of AI and robotics in healthcare: a review. Medicine. 2023;102:e36671.
Cite This Article

How to Cite
García-Barberán, V.; Gómez Del Pulgar, M. E.; Guamán H. M.; Benito-Martin, A. The times they are AI-changing: AI-powered advances in the application of extracellular vesicles to liquid biopsy in breast cancer. Extracell. Vesicles. Circ. Nucleic. Acids. 2025, 6, 128-40. http://dx.doi.org/10.20517/evcna.2024.51
Download Citation
Export Citation File:
Type of Import
Tips on Downloading Citation
Citation Manager File Format
Type of Import
Direct Import: When the Direct Import option is selected (the default state), a dialogue box will give you the option to Save or Open the downloaded citation data. Choosing Open will either launch your citation manager or give you a choice of applications with which to use the metadata. The Save option saves the file locally for later use.
Indirect Import: When the Indirect Import option is selected, the metadata is displayed and may be copied and pasted as needed.
About This Article
Copyright
Data & Comments
Data

Comments
Comments must be written in English. Spam, offensive content, impersonation, and private information will not be permitted. If any comment is reported and identified as inappropriate content by OAE staff, the comment will be removed without notice. If you have any queries or need any help, please contact us at [email protected].