fig4
Figure 4. Statistical Validation of the Private Clinical Dataset. This figure provides a comprehensive statistical analysis of the clinical records curated for domain-specific fine-tuning, confirming the dataset’s clinical validity and representativeness. (A) The age distribution of SpA patients shows a peak incidence in the 30-39 age group, aligning with the known epidemiology of disease onset[2]; (B) The gender ratio indicates a male predominance (62% vs. 38%), reflecting the established higher prevalence of ankylosing spondylitis in males[2]; (C) The disease composition of the fine-tuning dataset is intentionally diverse, dominated by SpA subtypes but critically including key differential diagnoses such as rheumatoid arthritis. This case mix mirrors a typical rheumatology clinic’s diagnostic challenge; (D) The positivity rates for key diagnostic biomarkers, Human Leukocyte Antigen B27 (HLA-B27) at 88% and elevated C-Reactive Protein (CRP) at 65%, are consistent with reference values for SpA patient cohorts[28]. Collectively, these analyses demonstrate that our private dataset is a high-fidelity representation of the real-world clinical scenarios SpAD-LLM is designed to address. SpA: Spondyloarthritis; SpAD-LLM: Spondyloarthritis Diagnosis Large Language Model.







