INTRODUCTION
Lower-grade gliomas (LGGs), comprising grade I and II gliomas according to the World Health Organization (WHO) classification, constitute approximately 7.6% of all brain tumors and 31.8% of gliomas[1]. Within five years, recurrence occurs in about 52%-62% of patients[2-4]. Among these recurrences, a subset progresses to high-grade gliomas (HGGs, WHO III-IV), estimated at 17%-32%[5-7]. The poorer prognosis of recurrent LGGs primarily stems from their malignant transformation. The primary treatment modality for LGG entails surgical resection, followed by radiation and chemotherapy; nevertheless, the prognosis remains unfavorable, with an average survival of seven years[8,9]. A significant disparity exists between clinical intervention and molecular pathological understanding. Therefore, investigating the intricate molecular pathology of LGG patients is imperative to impede or halt progression to HGG, deter recurrence, and enhance clinical outcomes. With the advancement of bioinformatics and machine learning, there is promising potential to comprehensively decipher molecular pathology and predict prognosis in oncological research[10,11].
Glycosylation stands as the most prevalent and well-recognized post-translational modification reaction, likely surpassing phosphorylation in frequency, the latter being involved in numerous cellular processes[12]. Within cancer, aberrant glycosylation is a common occurrence, representing not only a hallmark of cancer cells but also a feature of the tumor microenvironment[13]. Among glycosylation types, asparagine-linked (N-linked) protein glycosylation holds paramount importance. The N-glycan biosynthesis (NGB) entails multiple steps spanning different cellular compartments. After transcription and translation, proteins translocate to the endoplasmic reticulum (ER) lumen to acquire specific glycan chains, typically a tetradecasaccharide. Subsequent modifications occur in the ER lumen and Golgi apparatus before proteins reach their intended destinations[13]. Dysregulated NGB has been associated with various facets of cancer progression, including cell adhesion, migration, and immune evasion[14-16]. However, the precise involvement of N-glycan biosynthesis in LGG pathogenesis and its clinical ramifications remain inadequately elucidated.
In this study, we explore the prognostic significance of NGB signature in LGG using integrative bioinformatics approaches. By analyzing large-scale transcriptomic data from The Cancer Genome Atlas (TCGA), we identify a prognostic NGB signature (pNGB) comprising genes associated with patient survival. Leveraging machine learning algorithms, we construct and validate pNGB-based survival models to improve prognostic prediction in LGG. Patients with a higher risk score according to NGB-based survival model were characterized by enhanced proliferation and inflammation. Additionally, we investigate the association between NGB and clinical parameters, including treatment response, tumor recurrence, and the immune microenvironment. By integrating multi-omics data from TCGA and Chinese Glioma Genome Atlas (CGGA), we delineate the relationship between NGB dysregulation and LGG progression, providing insights into potential prognostic biomarkers and therapeutic targets. Thus, our study unraveled the intricate interplay between NGB and LGG biology, offering novel avenues for prognostic assessment and therapeutic intervention in this challenging disease context.
MATERIALS AND METHODS
LGG transcriptome data acquisition and processing
The RNA sequencing data of the TCGA LGG study was obtained through the use of the R package TCGABiolinks[17]. Additional clinical information, such as survival statistics and tumor stage, was obtained from the Genomic Data Commons (GDC) Data Portal (https://gdc.cancer.gov/about-data/publications/pancanatlas). The transcriptome datasets of the Chinese Glioma Genome Atlas (CGGA) study were acquired from the CGGA website (http://www.cgga.org.cn) using the dataset IDs mRNA_seq325 and mRNAseq_693. To harmonize the three LGG datasets, all of them were log2(x + 1) transformed, checking missing values and quartile normalized. Hierarchical clustering was employed to detect outliers, and no obvious outliers were identified. We used common genomic annotations and platforms to align the datasets, ensuring that the same genes are consistently labeled across both datasets. Batch correlation was applied by the R package limma normalizeBetweenArrays function. The R package biomaRt was utilized to convert ENSEMBLE IDs to gene symbols[18].
Single sample gene set enrichment analysis
R package GSVA was employed to perform Single sample gene set enrichment analysis (ssGSEA) analysis with parameters of method = “ssgsea”, kcdf = “Gaussian”, and abs.ranking = TRUE[19]. We chose the “ssgsea" method because it is well-suited for analyzing individual samples in our study, allowing for the quantification of gene set enrichment scores on a per-sample basis. This is particularly useful for identifying pathway activation patterns within individual samples, which is crucial for our analysis. The Gaussian option is appropriate for continuous expression data, such as log2(x + 1)-transformed RNA-seq data. Hallmark gene sets were chosen because they represent well-defined biological states and processes that are curated to minimize redundancy and enhance interpretability. Moreover, Hallmark gene sets are particularly relevant as they include pathways critical to cancer biology, such as MYC targets, E2F targets, epithelial-mesenchymal transition, and various inflammation pathways. These pathways are integral to understanding tumor behavior, progression, and response to therapy, aligning with our study. Hallmark gene sets were downloaded from the GSEA website (https://www.gsea-msigdb.org/gsea/index.jsp).
Survival analysis
The Cox hazard analysis and Kaplan-Meier (KM) survival analysis were carried out using the R package survival. The high and low groups in the KM analysis were determined by the median value as well as surv_cutpoint suggested optimal cutpoint of relative condition. The surv_cutpoint function from R package survminer is an outcome-oriented method providing a value of a cutpoint that corresponds to the most significant relation with survival. The results were comparable and the median value was selected as the final optimal cutpoint. The KM survival plot was then visualized using the “ggsurvplot” function in the R package survminer.
Cells infiltration estimation
The immune cell relative infiltration in LGG transcriptome, including TCGA and CGGA datasets, was quantified by R package xCell[20].
Consensus clustering
The 22 significant survival genes of NGB were utilized in uncovering sub-clusters within the TCGA LGG cohort through the use of a resampling method known as consensus clustering. The optimal number of clusters was determined through a synthesis of the consensus score matrix, CDF curve, PAC score, and Nbclust. The R package ConsensusClusterPlus was employed for this analysis[21]. The differential expression genes (DEGs) were determined by R package limma[22].
NGB-based machine learning survival model construction and validation
The concept originated from a previous study[23]. We integrated 10 machine learning algorithms and 22 algorithm combinations, including random survival forest (RSF), elastic network (Enet), Lasso, Ridge, stepwise Cox, CoxBoost, partial least squares regression for Cox (plsRcox), supervised principal components (SuperPC), generalized boosted regression modeling (GBM), and survival support vector machine (survival-SVM). The survival model construction and validation procedure was as follows: (1) 22 NGB genes identified in the TCGA LGG cohort by survival analysis were selected as features in machine learning survival model; (2) 22 algorithm combinations were applied to build survival models within the TCGA LGG cohort; (3) All models were tested in TCGA LGG training set and two CGGA validation sets; and (4) The Harrell's Concordance Index (C-index) and Area Under the Curve (AUC) for 1, 3, and 5 years were calculated for each model across both training and validation datasets. Based on the top values of C-index and AUCs and considering the principle of Occam's razor, the Lasso model was chosen as the most optimal solution.
Statistical analyses
R (Version 4.1.2) was used for all statistical tests. Spearman correlation analyses were conducted using R. The statistical analysis of ssGSEA scores between different conditions was performed using the Wilcoxon test. P-values of multiple testing were adjusted by false discovery rate (FDR). All P-values or FDR were considered significant if less than 0.05.
RESULTS
Identify prognostic NGB signature in the TCGA LGG cohort
The study's overall design is illustrated in Figure 1. Initially, we established a prognostic NGB signature through survival analysis of the TCGA LGG transcriptome. Among 50 NGB genes examined, 22 exhibited significance in both Kaplan-Meier (KM) and Cox tests, constituting the prognostic NGB (pNGB) signature [Figure 2A and B]. Notably, MGAT1 and TUSC3 displayed the highest and lowest hazard ratios (HR), respectively [Figure 2B]. Utilizing the pNGB genes, we conducted a consensus cluster analysis, categorizing TCGA LGG tumor subjects into k clusters, with a value of k = 2 being optimal based on the proportion of ambiguous clustering (PAC) statistic [Figure 2C and D]. The resulting two consensus clusters (C1 and C2) exhibited significant differences in pNGB expression, with C2 displaying notably higher overall infiltration abundance than C1 [Figure 2E]. Additionally, Kaplan-Meier analysis indicated significantly poorer overall survival in cluster C2 compared to C1 [Figure 2F]. Furthermore, the NGB pathway demonstrated a generally poor prognosis, except for MGAT4C and TUSC3, with MGAT4C showing a favorable prognosis and promoting N-Glycan degradation, suggesting a pivotal role for N-Glycan biosynthesis in LGG progression [Figure 2G].
Construct and validate the NGB-based intergraded machine learning survival models
Subsequently, we aimed to develop a pNGB-based survival model to enhance LGG outcomes. Twenty-two pNGBs underwent a machine learning-based integrative procedure to construct 22 candidate survival models. The initial application of the procedure in the TCGA LGG dataset involved training via the leave-one-out cross-validation (LOOCV) framework, followed by calculation of the C-index for each model in the TCGA LGG training set and two CGGA validation datasets [Figure 3A]. Intriguingly, 15 out of 22 candidate models demonstrated good discriminatory power (C-index > 0.7), predominantly Enet-related models [Figure 3A]. Further evaluation through ROC analysis revealed favorable performance of the Enet (α = 0.4) model, with higher ranks in both C-index and AUCs across training and validation datasets [Figure 3B]. Since the parsimonious nature of the Enet model aligns with Occam's razor principle, Enet can perform variable selection by driving some coefficients to exactly zero. This helps to identify the most relevant features in the model. Therefore, Enet (α= 0.4) survival model was picked up for further analysis for its simplicity and predictive accuracy. Within this model, optimal λ was determined via LOOCV, with seven pNGBs employed, notably ALG6 showing the highest coefficient, indicating its importance as a feature [Figure 3C and D]. The 1-, 3-, and 5-year AUCs for the training TCGA dataset were 0.852, 0.848, and 0.75, respectively [Figure 3E-G]. Comparable AUCs were observed for the CGGA325 and CGGA639 datasets, indicating a high degree of reliability in the survival model across training and validation sets [Figure 3E-G]. Additionally, patients in the high-risk group demonstrated significant reliability in the survival model across both training and validation sets (P < 0.0001), confirming its robustness [Figure 3H-J].
NGB high-risk patients were enriched in cell proliferation and inflammation
To further elucidate the NGB-related molecular signature, DEG analysis was conducted between high- and low-risk LGG patients across three datasets. A total of 349 and 399 DEGs were identified as enriched and suppressed in high-risk patients, respectively [Figure 4A and B]. Upregulated DEGs in high-risk patients were enriched in immune response pathways, such as Th1, Th2, and Th17 cell differentiation, antigen processing and presentation, and ECM-receptor interaction, whereas downregulated DEGs were primarily associated with normal neuroactive processes [Figure 4C and D]. Subsequent evaluation of hallmark signaling heterogeneity between high- and low-risk groups revealed higher levels of proliferation, inflammatory response, and epithelial-mesenchymal transition in high-risk patients [Figure 4E]. These results underscored the significant association between NGB and cancer cell proliferation, as well as the inflammatory tumor microenvironment.
NGB risk score is associated with first treatment response, tumor recurrence and immune microenvironment remodeling
We further investigated the NGB risk score with other clinical and immune signatures of LGG. Interestingly, compared to the progressive disease, the response and stable disease state had a significantly lower risk score (P < 0.0001, Figure 5A), highlighting that NGB might play an important role in first-treatment outcomes. Moreover, in TCGA and CGGA datasets, the NGB risk score was all higher in recurrent tumors compared to primary tumors [Figure 5B-D]. Last but not least, we applied the co-analysis of the NGB risk score with the immune microenvironment. It was worth noting that the NGB risk score was consistently positively correlated with macrophages, DC, monocytes, mast cells, CD4+ memory T cells, Tgd cells, Th2 cells, and B cells except regulatory T cells (Tregs) [Figure 5E]. These results implied that the NGB might affect the LGG progress in various aspects.
DISCUSSION
Our study offers a comprehensive investigation into the prognostic significance and functional implications of NGB in LGG, a topic not previously explored. N-glycosylation significantly impacts cancer by initiating oncogenic signaling pathways, including Wnt/β-catenin, Hippo, PI3K/Akt, JAK/STAT, TGF-β/Smad, and Notch. Evidence suggests that abnormal modifications of cell surface proteins, such as transmembrane proteins and growth factor receptors, promote tumor growth, invasion, and metastasis through the activation of these signaling cascades[24]. Through integrated analyses utilizing three transcriptomic datasets from the TCGA and CGGA LGG cohorts, we have delineated the prognostic role of NGB in LGG, established an NGB-based prognostic model, and characterized the molecular and tumor microenvironment heterogeneity between high- and low-risk groups. The prognostic relevance of NGB in LGG is highlighted by the identification of a 22-gene signature (pNGB), demonstrating significant associations with patient survival outcomes. Notably, among the pNGB genes, MGAT1, the most malignant, has been reported to be enriched in glioblastoma and promote glioma cell proliferation through the upregulation of glucose transportation[25]. In contrast, TUSC3 sensitizes glioblastoma to temozolomide by epigenetic reprogramming, and reverses the effects of miR-UL112-3p on GBM progression via the AKT signaling pathway[26,27]. Importantly, our study also revealed that patients at higher risk exhibited enhanced glycolysis, further supporting the role of NGB in metabolic alterations associated with LGG pathogenesis. Furthermore, other pNGB genes, such as MGAT4B, ALG3, and DDOST, have been identified as playing significant roles in shaping the immunosuppressive microenvironment[28-30]. Our study also revealed a correlation between the NGB risk score and tumor microenvironment remodeling, specifically highlighting associations with inflammatory phenotypes (M1 macrophages, cDCs) and immune regulatory phenotypes (Th2 cells and Tregs). Notably, Tregs showed a negative correlation with the NGB risk score, indicating their potential role as immune inhibitory components in the tumor microenvironment, warranting further investigation. These findings suggest that NGB is closely related to LGG metabolism and the immune microenvironment, potentially serving as metabolic checkpoints in both molecular pathology and clinical practice.
Building upon these prognostic insights, we developed and validated machine learning-based survival models integrating pNGB signatures, demonstrating robust discriminatory power across multiple datasets. Surprisingly, the pNGB signature exhibited high accuracy and sensitivity among various machine learning models, supporting the notion that NGB is a critical signature in LGG prognosis. Moreover, the parsimonious nature of the Enet model aligns with Occam's razor principle, emphasizing its simplicity and predictive accuracy. The superior performance of the Enet survival model underscores its potential clinical utility in predicting LGG patient outcomes.
To further elucidate the molecular underpinnings of NGB in LGG, we compared molecular, tumor microenvironment, and clinical features between high- and low-risk LGG patients. Functional analyses revealed the enrichment of pathways related to immune response, cell proliferation, and inflammatory signaling in high-risk patients. Specifically, TNF, IL6, TGF-β, and IFN signaling were all upregulated in the high-risk group. Previous studies have demonstrated that inflammatory responses, such as IFN-γ and TGF-β, are associated with unfavorable prognosis in LGG and are related to specific mutation profiles[31,32]. Moreover, the relationship of inflammatory response with epithelial-mesenchymal transition (EMT), hypoxia, and angiogenesis was observed in our study. The modification of key proteins, such as E-cadherin, N-cadherin, and epithelial cell adhesion molecule (EpCAM), plays crucial roles in the transition of normal liver cells to mesenchymal cells, contributing to cancer metastasis[33,34]. Additionally, lower risk scores were associated with better first treatment response, while higher scores were linked to tumor recurrence. Furthermore, the NGB risk score positively correlated with the infiltration of specific immune cell populations, including macrophages, DCs, and memory T cells, suggesting a potential role of NGB in shaping the LGG immune landscape. Both macrophages and DCs are recognized as being reprogrammed within the glioma microenvironment, contributing to the suppression of the immune response and facilitating tumor progression[35,36]. These findings highlight a complex interplay between NGB dysregulation and tumor microenvironment dynamics, providing potential therapeutic targets and avenues for further investigation.
To further extend the clinical translation value of our study, we systematically searched DrugBank, ApexBio, DGIdb, HMDB, and Tocris, identifying several candidates targeting the relevant genes. The details are listed in the Supplementary Table 1. Among these, Asparagine demonstrated the most credible treatment potential, targeting RPN2, DDOST, STT3A, DPAGT1, and MAN2A1. Additionally, Acarbose and Beta-D-Glucose also showed some approved treatment potential [Supplementary Table 1].
However, our research has some limitations. Firstly, while the study identified an NGB signature associated with prognosis, in vitro and in vivo functional validation is needed to elucidate the mechanistic link between NGB dysregulation and LGG progression. Secondly, the cellular sources of NGB were not investigated in the current study, requiring advanced research methods such as single-cell RNA sequencing or experiments. Lastly, the NGB-based model requires validation in larger and more diverse patient cohorts to ensure generalizability.
In conclusion, our study provides comprehensive insights into the prognostic significance and functional implications of NGB in LGG. By integrating multi-omics data and machine learning approaches, we have elucidated the intricate interplay between N-Glycan biosynthesis, tumor biology, and clinical outcomes, paving the way for further research and therapeutic interventions in LGG management.
DECLARATIONS
Acknowledgments
We extend our appreciation to all the researchers who generously shared the data utilized in this study. We also express our gratitude to TCGA and CGGA for their invaluable contribution in providing the data to the public.
Authors' contributions
Designed the study, wrote and revise the manuscript: Zhu Y, Qian C, Zhang Y
Ran the bioinformatics analyses: Zhu Y, Geng L
provided clinical specimens and data, technical support, and conceptual advice: Bo F, Xu Y, Wei J
Availability of data and materials
Publicly available datasets were utilized in this study. The TCGA LGG transcriptome data can be accessed from the GDC Data Portal (https://gdc.cancer.gov/about-data/publications/pancanatlas). Additionally, the transcriptome datasets from the Chinese Glioma Genome Atlas (CGGA) study were obtained from the CGGA website (http://www.cgga.org.cn) using the dataset IDs mRNA_seq325 and mRNAseq_693.
Financial support and sponsorship
The present study was supported by the General Project of Nanjing Medical Science and Technology Development (Grant No. YKK22142) and the Youth Talent Project of Nanjing Brain Hospital (Grant No. 23-25-2R7).
Conflicts of interest
All authors declared that there are no conflicts of interest.
Ethical approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Copyright
© The Author(s) 2024.
Supplementary Materials
Comments
Comments must be written in English. Spam, offensive content, impersonation, and private information will not be permitted. If any comment is reported and identified as inappropriate content by OAE staff, the comment will be removed without notice. If you have any queries or need any help, please contact us at support@oaepublish.com.