Background/Objectives: Iron deficiency remains a prevalent condition, needing specific laboratory tests for diagnosis. This study aimed to evaluate whether routine complete blood cell count (CBC) parameters can be used within a machine learning framework to predict low ferritin and low transferrin saturation, used as biochemical markers of altered iron status, potentially supporting more targeted laboratory test utilization. Methods: In this single-center retrospective outpatient study, we analyzed 32,437 records from subjects undergoing both complete blood cell count and iron metabolism testing between 2023 and 2026. Low ferritin and low transferrin saturation were defined using sex-specific thresholds. Low ferritin was present in 14,344 subjects (44.2%), whereas low transferrin saturation was present in 7791 subjects (24.0%). After cleaning data and excluding incomplete records, demographic variables and CBC indices were tested as potential predictors. The dataset was split into training and test sets with stratified sampling. Multiple supervised machine learning models, including logistic regression, decision tree, random forest, XGBoost, support vector machine, k-nearest neighbors, and Naive Bayes, were trained. Hyperparameter tuning and model selection were performed using repeated stratified 10-fold cross-validation, optimizing the area under the curve (AUC). Model performance was assessed by AUC, sensitivity, and specificity, and validated on an independent test set. Results: All models showed predictive capability for low ferritin and low transferrin saturation using CBC parameters alone. Ensemble methods, especially random forest and XGBoost, reached the best performance (AUC values of 0.80–0.87 for ferritin and 0.85–0.96 for transferrin saturation). Sensitivity and specificity were balanced, supporting clinical screening applicability. Results were maintained across validation and confirmed in the test set. Prediction of transferrin saturation showed slightly higher accuracy than ferritin. Feature importance analysis identified mean corpuscular volume (MCV), mean corpuscular hemoglobin (MCH), and red blood cell distribution width (RDW) as key predictors. Conclusions: CBC-based machine learning models may help identify subjects with low ferritin or low transferrin saturation, supporting subsequent targeted assessment of iron status.
Predicting Iron Deficiencies Using Routine Complete Blood Cell Count Parameters: A Machine Learning Approach and Evaluation
Negrini, Davide;Pighi, Laura;Salvagno, Gian Luca;Lippi, Giuseppe
2026-01-01
Abstract
Background/Objectives: Iron deficiency remains a prevalent condition, needing specific laboratory tests for diagnosis. This study aimed to evaluate whether routine complete blood cell count (CBC) parameters can be used within a machine learning framework to predict low ferritin and low transferrin saturation, used as biochemical markers of altered iron status, potentially supporting more targeted laboratory test utilization. Methods: In this single-center retrospective outpatient study, we analyzed 32,437 records from subjects undergoing both complete blood cell count and iron metabolism testing between 2023 and 2026. Low ferritin and low transferrin saturation were defined using sex-specific thresholds. Low ferritin was present in 14,344 subjects (44.2%), whereas low transferrin saturation was present in 7791 subjects (24.0%). After cleaning data and excluding incomplete records, demographic variables and CBC indices were tested as potential predictors. The dataset was split into training and test sets with stratified sampling. Multiple supervised machine learning models, including logistic regression, decision tree, random forest, XGBoost, support vector machine, k-nearest neighbors, and Naive Bayes, were trained. Hyperparameter tuning and model selection were performed using repeated stratified 10-fold cross-validation, optimizing the area under the curve (AUC). Model performance was assessed by AUC, sensitivity, and specificity, and validated on an independent test set. Results: All models showed predictive capability for low ferritin and low transferrin saturation using CBC parameters alone. Ensemble methods, especially random forest and XGBoost, reached the best performance (AUC values of 0.80–0.87 for ferritin and 0.85–0.96 for transferrin saturation). Sensitivity and specificity were balanced, supporting clinical screening applicability. Results were maintained across validation and confirmed in the test set. Prediction of transferrin saturation showed slightly higher accuracy than ferritin. Feature importance analysis identified mean corpuscular volume (MCV), mean corpuscular hemoglobin (MCH), and red blood cell distribution width (RDW) as key predictors. Conclusions: CBC-based machine learning models may help identify subjects with low ferritin or low transferrin saturation, supporting subsequent targeted assessment of iron status.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



