Background and Aim Over 10% of hepatocellular carcinoma (HCC) cases recur each year, even after surgical resection. Currently, there is a lack of knowledge about the causes of recurrence and the effective prevention. Prediction of HCC recurrence requires diagnostic markers endowed with high sensitivity and specificity. This study aims to identify new key proteins for HCC recurrence and to build machine learning algorithms for predicting HCC recurrence. Methods The proteomics data for analysis in this study were obtained from the Clinical Proteomics Tumor Analysis Consortium (CPTAC) database. We analyzed different proteins based on cases with or without recurrence of HCC. Survival analysis, Cox regression analysis, and area under the ROC curves (AUROC > 0.7) were used to screen for more significant differential proteins. Predictive models for HCC recurrence were developed using four machine learning algorithms. Results A total of 690 differentially expressed proteins between 50 relapsed and 77 non-relapsed hepatitis B-related HCC patients were identified. Seven of these proteins had an AUROC > 0.7 for 5-year survival in HCC, including BAHCC1, ESF1, RAP1GAP, RUFY1, SCAMP3, STK3, and TMEM230. Among the machine learning algorithms, the random forest algorithm showed the highest AUROC values (AUROC: 0.991, 95% CI 0.962-0.999) for identifying HCC recurrence, followed by the support vector machine (AUROC: 0.893, 95% Cl 0.824-0.956), the logistic regression (AUROC: 0.774, 95% Cl 0.672-0.868), and the multi-layer perceptron algorithm (AUROC: 0.571, 95% Cl 0.459-0.682). Conclusions Our study identifies seven novel proteins for predicting HCC recurrence and the random forest algorithm as the most suitable predictive model for HCC recurrence.
Machine learning algorithms based on proteomic data mining accurately predicting the recurrence of hepatitis B-related hepatocellular carcinoma
Targher, GiovanniWriting – Review & Editing
;
2022-01-01
Abstract
Background and Aim Over 10% of hepatocellular carcinoma (HCC) cases recur each year, even after surgical resection. Currently, there is a lack of knowledge about the causes of recurrence and the effective prevention. Prediction of HCC recurrence requires diagnostic markers endowed with high sensitivity and specificity. This study aims to identify new key proteins for HCC recurrence and to build machine learning algorithms for predicting HCC recurrence. Methods The proteomics data for analysis in this study were obtained from the Clinical Proteomics Tumor Analysis Consortium (CPTAC) database. We analyzed different proteins based on cases with or without recurrence of HCC. Survival analysis, Cox regression analysis, and area under the ROC curves (AUROC > 0.7) were used to screen for more significant differential proteins. Predictive models for HCC recurrence were developed using four machine learning algorithms. Results A total of 690 differentially expressed proteins between 50 relapsed and 77 non-relapsed hepatitis B-related HCC patients were identified. Seven of these proteins had an AUROC > 0.7 for 5-year survival in HCC, including BAHCC1, ESF1, RAP1GAP, RUFY1, SCAMP3, STK3, and TMEM230. Among the machine learning algorithms, the random forest algorithm showed the highest AUROC values (AUROC: 0.991, 95% CI 0.962-0.999) for identifying HCC recurrence, followed by the support vector machine (AUROC: 0.893, 95% Cl 0.824-0.956), the logistic regression (AUROC: 0.774, 95% Cl 0.672-0.868), and the multi-layer perceptron algorithm (AUROC: 0.571, 95% Cl 0.459-0.682). Conclusions Our study identifies seven novel proteins for predicting HCC recurrence and the random forest algorithm as the most suitable predictive model for HCC recurrence.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.