This paper tests the ability of large language models to deceive stylometric approaches in authorship attribution. A corpus of ten English authors is used as a reference point, while GPT-3 is asked to generate texts that imitate their style. After having defined a baseline for the efficiency of stylometric methods on human-generated texts, a series of analysis is performed on the artificially generated texts. Results show the inability of GPT-3 to deceive stylometry and allow a quantitative analysis of its distinctive linguistic features. Preliminary results are also presented for ChatGPT, indicating the efficiency of stylometry in detecting its authorial fingerprint.

GPT-3 vs. Delta. Applying Stylometry to Large Language Models

Simone Rebora
2023-01-01

Abstract

This paper tests the ability of large language models to deceive stylometric approaches in authorship attribution. A corpus of ten English authors is used as a reference point, while GPT-3 is asked to generate texts that imitate their style. After having defined a baseline for the efficiency of stylometric methods on human-generated texts, a series of analysis is performed on the artificially generated texts. Results show the inability of GPT-3 to deceive stylometry and allow a quantitative analysis of its distinctive linguistic features. Preliminary results are also presented for ChatGPT, indicating the efficiency of stylometry in detecting its authorial fingerprint.
2023
978-88-942535-7-3
Stylometry, Authorship attribution, Large language models, GPT-3, ChatGPT
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1115882
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact