Background: The unstructured nature of medical data from Real- World (RW) patients and the scarce accessibility for researchers to inte- grated systems restrain the use of RW information for clinical and translational research purposes. Natural Language Processing (NLP) might help in transposing unstructured reports in standardized electronic case report forms (eCRFs). We aimed at designing a tool to capture pathological features directly from hemo-lymphopathology reports and automatically record them into eCRFs. Method:By Optical Character Recognition and NLP techniques, we built up a tool, named ARGO (Automatic Record Generator for Oncol- ogy), and measured its efficiency in recognizing unstructured informa- tion from diagnostic paper-based reports of diffuse large B-cell lymphomas (DLBCL), follicular lymphomas (FL), and mantle cell lym- phomas (MCL). ARGO was programmed to match data with standard diagnostic criteria, automatically assign diagnosis according to the In- ternational Classification of Diseases 10th Revision (ICD10) and popu- late eCRFs on the REDCap platform. A selection of 239 reports (n. 106 DLBCL, n. 79 FL, and n. 54 MCL) from the Pathology Unit at the IRCCS – Istituto Tumori “Giovanni Paolo II” of Bari and of 93 external reports (n. 49 DLBCL, n. 24 FL, and n. 20 MCL) from other six Italian centers was used to assess ARGO performance in terms of accuracy (A), precision (P), recall (R) and F1-score (F1). Results:We successfully converted 326 (98.2%) paper-based reports into structured eCRFs incorporating information about diagnosis and tis- sue of origin of samples (lymph-node, extra-nodal, medullary, and pe- ripheral blood), immunohistochemistry expression of major molecular markers (MYC, BCL2, BCL6, CD10, CD20, Cyclin D1, and the quan- titative assessment of Ki-67/MIB1 proliferation index) and DLBCL cell- of-origin subtype [Hans et al., Blood, 2007]. Overall, ARGO showed high performance (nearly 90% of A, P, R and F1 from 7/8 data fields an- alyzed from internal and external series of reports) in capturing identifi- cation report number, biopsy date, specimen type, diagnosis, and additional molecular features (Figure 1A-H). Conclusions. We developed and validated an easy-to-use tool that converts RW paper-based diagnos- tic reports of major lymphoma subtypes into structured eCRFs. ARGO is cheap, feasible, and easily transferable into the daily practice to gen- erate REDCap-based eCRFs for clinical and translational research pur- poses.
ARGO, AUTOMATIC RECORD GENERATOR IN ONCOLOGY: MULTICENTRIC VALIDATION OF A NEW TOOL FOR AUTO- MATIC CONVERSION OF “REAL-LIFE” HEMOLYMPHOPATHOL- OGY REPORTS IN STANDARDIZED ECRF
F. M. Quaglia;
2021-01-01
Abstract
Background: The unstructured nature of medical data from Real- World (RW) patients and the scarce accessibility for researchers to inte- grated systems restrain the use of RW information for clinical and translational research purposes. Natural Language Processing (NLP) might help in transposing unstructured reports in standardized electronic case report forms (eCRFs). We aimed at designing a tool to capture pathological features directly from hemo-lymphopathology reports and automatically record them into eCRFs. Method:By Optical Character Recognition and NLP techniques, we built up a tool, named ARGO (Automatic Record Generator for Oncol- ogy), and measured its efficiency in recognizing unstructured informa- tion from diagnostic paper-based reports of diffuse large B-cell lymphomas (DLBCL), follicular lymphomas (FL), and mantle cell lym- phomas (MCL). ARGO was programmed to match data with standard diagnostic criteria, automatically assign diagnosis according to the In- ternational Classification of Diseases 10th Revision (ICD10) and popu- late eCRFs on the REDCap platform. A selection of 239 reports (n. 106 DLBCL, n. 79 FL, and n. 54 MCL) from the Pathology Unit at the IRCCS – Istituto Tumori “Giovanni Paolo II” of Bari and of 93 external reports (n. 49 DLBCL, n. 24 FL, and n. 20 MCL) from other six Italian centers was used to assess ARGO performance in terms of accuracy (A), precision (P), recall (R) and F1-score (F1). Results:We successfully converted 326 (98.2%) paper-based reports into structured eCRFs incorporating information about diagnosis and tis- sue of origin of samples (lymph-node, extra-nodal, medullary, and pe- ripheral blood), immunohistochemistry expression of major molecular markers (MYC, BCL2, BCL6, CD10, CD20, Cyclin D1, and the quan- titative assessment of Ki-67/MIB1 proliferation index) and DLBCL cell- of-origin subtype [Hans et al., Blood, 2007]. Overall, ARGO showed high performance (nearly 90% of A, P, R and F1 from 7/8 data fields an- alyzed from internal and external series of reports) in capturing identifi- cation report number, biopsy date, specimen type, diagnosis, and additional molecular features (Figure 1A-H). Conclusions. We developed and validated an easy-to-use tool that converts RW paper-based diagnos- tic reports of major lymphoma subtypes into structured eCRFs. ARGO is cheap, feasible, and easily transferable into the daily practice to gen- erate REDCap-based eCRFs for clinical and translational research pur- poses.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



