In this paper, we apply the ChatGPT Large Language Model (gpt-3.5-turbo) to the 4books dataset, a German language collection of children’s and young adult novels comprising a total of 22,860 sentences annotated for valence by 80 human raters. We verify if ChatGPT can (a) compare to the behaviour of human raters and/or (b) outperform state of the art sentiment analysis tools. Results show that, while inter-rater agreement with human readers is low (independently from the inclusion/exclusion of context), efficiency scores are comparable to the most advanced sentiment analysis tools.

Comparing ChatGPT to Human Raters and Sentiment Analysis Tools for German Children’s Literature

Simone Rebora
;
Gerhard Lauer
2023-01-01

Abstract

In this paper, we apply the ChatGPT Large Language Model (gpt-3.5-turbo) to the 4books dataset, a German language collection of children’s and young adult novels comprising a total of 22,860 sentences annotated for valence by 80 human raters. We verify if ChatGPT can (a) compare to the behaviour of human raters and/or (b) outperform state of the art sentiment analysis tools. Results show that, while inter-rater agreement with human readers is low (independently from the inclusion/exclusion of context), efficiency scores are comparable to the most advanced sentiment analysis tools.
2023
Large Language Models, ChatGPT, 4books dataset, sentiment analysis, inter-rater agreement
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1115888
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact