Background. The rapid advancement of artificial intelligence (AI) tools, especially in natural language processing, is transforming scientific writing by improving efficiency, consistency and accessibility, especially for non-native English speakers and early-career researchers. This study aimed to evaluate the effectiveness of Compilatio, a widely used plagiarism detection software, in identifying AI-generated scientific content. Materials and Methods. Four commonly used and freely available AI tools [ChatGPT, Gemini, Perplexity, and synthesis of topic outlines through retrieval and multi-perspective question asking (STORM)] were prompted to generate introductory texts on the burden of diabetes. Each output was copied into a Word document, uploaded and analyzed by Compilatio, which provided integrity score, similarity index, and likelihood of AI-generated content. Results. Integrity scores varied substantially, ranging from 32% (STORM) to 100% (Gemini), while similarity indices remained consistently low (0-6%), indicating minimal direct text overlap with existing sources. The likelihood of AI authorship also varied, with STORM yielding the lowest detection rate (27%) while Gemini yielded the highest (100%). Conclusion. These findings highlight the distinct textual characteristics produced by different AI models and demonstrate the overall effectiveness of Compilatio in identifying AI-generated content from three out of four tools. However, the limited performance observed with STORM-generated text underscores the need for more sophisticated and adaptable detection systems to uphold academic integrity in the evolving landscape of AI-supported scientific writing.
Evaluating the Accuracy of AI-Generated Text Detection in Scientific Writing
Lippi, Giuseppe
;Mattiuzzi, Camilla
2025-01-01
Abstract
Background. The rapid advancement of artificial intelligence (AI) tools, especially in natural language processing, is transforming scientific writing by improving efficiency, consistency and accessibility, especially for non-native English speakers and early-career researchers. This study aimed to evaluate the effectiveness of Compilatio, a widely used plagiarism detection software, in identifying AI-generated scientific content. Materials and Methods. Four commonly used and freely available AI tools [ChatGPT, Gemini, Perplexity, and synthesis of topic outlines through retrieval and multi-perspective question asking (STORM)] were prompted to generate introductory texts on the burden of diabetes. Each output was copied into a Word document, uploaded and analyzed by Compilatio, which provided integrity score, similarity index, and likelihood of AI-generated content. Results. Integrity scores varied substantially, ranging from 32% (STORM) to 100% (Gemini), while similarity indices remained consistently low (0-6%), indicating minimal direct text overlap with existing sources. The likelihood of AI authorship also varied, with STORM yielding the lowest detection rate (27%) while Gemini yielded the highest (100%). Conclusion. These findings highlight the distinct textual characteristics produced by different AI models and demonstrate the overall effectiveness of Compilatio in identifying AI-generated content from three out of four tools. However, the limited performance observed with STORM-generated text underscores the need for more sophisticated and adaptable detection systems to uphold academic integrity in the evolving landscape of AI-supported scientific writing.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



