Background. The rapid advancement of artificial intelligence (AI) tools, especially in natural language processing, is transforming scientific writing by improving efficiency, consistency and accessibility, especially for non-native English speakers and early-career researchers. This study aimed to evaluate the effectiveness of Compilatio, a widely used plagiarism detection software, in identifying AI-generated scientific content. Materials and Methods. Four commonly used and freely available AI tools [ChatGPT, Gemini, Perplexity, and synthesis of topic outlines through retrieval and multi-perspective question asking (STORM)] were prompted to generate introductory texts on the burden of diabetes. Each output was copied into a Word document, uploaded and analyzed by Compilatio, which provided integrity score, similarity index, and likelihood of AI-generated content. Results. Integrity scores varied substantially, ranging from 32% (STORM) to 100% (Gemini), while similarity indices remained consistently low (0-6%), indicating minimal direct text overlap with existing sources. The likelihood of AI authorship also varied, with STORM yielding the lowest detection rate (27%) while Gemini yielded the highest (100%). Conclusion. These findings highlight the distinct textual characteristics produced by different AI models and demonstrate the overall effectiveness of Compilatio in identifying AI-generated content from three out of four tools. However, the limited performance observed with STORM-generated text underscores the need for more sophisticated and adaptable detection systems to uphold academic integrity in the evolving landscape of AI-supported scientific writing.

Evaluating the Accuracy of AI-Generated Text Detection in Scientific Writing

Lippi, Giuseppe
;
Mattiuzzi, Camilla
2025-01-01

Abstract

Background. The rapid advancement of artificial intelligence (AI) tools, especially in natural language processing, is transforming scientific writing by improving efficiency, consistency and accessibility, especially for non-native English speakers and early-career researchers. This study aimed to evaluate the effectiveness of Compilatio, a widely used plagiarism detection software, in identifying AI-generated scientific content. Materials and Methods. Four commonly used and freely available AI tools [ChatGPT, Gemini, Perplexity, and synthesis of topic outlines through retrieval and multi-perspective question asking (STORM)] were prompted to generate introductory texts on the burden of diabetes. Each output was copied into a Word document, uploaded and analyzed by Compilatio, which provided integrity score, similarity index, and likelihood of AI-generated content. Results. Integrity scores varied substantially, ranging from 32% (STORM) to 100% (Gemini), while similarity indices remained consistently low (0-6%), indicating minimal direct text overlap with existing sources. The likelihood of AI authorship also varied, with STORM yielding the lowest detection rate (27%) while Gemini yielded the highest (100%). Conclusion. These findings highlight the distinct textual characteristics produced by different AI models and demonstrate the overall effectiveness of Compilatio in identifying AI-generated content from three out of four tools. However, the limited performance observed with STORM-generated text underscores the need for more sophisticated and adaptable detection systems to uphold academic integrity in the evolving landscape of AI-supported scientific writing.
2025
Aritficial Intelligence; Detection; Scientific Writing
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1172347
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact