In this paper, we investigate the problem of automatically detecting explicit song lyrics, i.e., determining if the lyrics of a given song could be offensive or unsuitable for children. The problem can be framed as a binary classification task, and in this work we propose to tackle it with the FASTTEXT classifier, an efficient linear classification model leveraging a peculiar distributional text representation that, by exploiting subword information in building the embeddings of the words, enables to cope with words not seen at training time. We assess the performance of the FASTTEXT classifier and word representations with a lyrics dataset of over 800K songs, annotated with explicit information, that we assembled from publicly available resources. The evaluation shows that the FASTTEXT classifier is effective for explicit lyrics detection, substantially outperforming a reference approach for the task, and that the subword information effectively contributes to this result. (C) 2020 Elsevier Ltd. All rights reserved.

Explicit song lyrics detection with subword-enriched word embeddings

Rospocher, M
2021-01-01

Abstract

In this paper, we investigate the problem of automatically detecting explicit song lyrics, i.e., determining if the lyrics of a given song could be offensive or unsuitable for children. The problem can be framed as a binary classification task, and in this work we propose to tackle it with the FASTTEXT classifier, an efficient linear classification model leveraging a peculiar distributional text representation that, by exploiting subword information in building the embeddings of the words, enables to cope with words not seen at training time. We assess the performance of the FASTTEXT classifier and word representations with a lyrics dataset of over 800K songs, annotated with explicit information, that we assembled from publicly available resources. The evaluation shows that the FASTTEXT classifier is effective for explicit lyrics detection, substantially outperforming a reference approach for the task, and that the subword information effectively contributes to this result. (C) 2020 Elsevier Ltd. All rights reserved.
2021
Word embeddings
Text classification
Explicit content detection
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1031060
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 11
  • ???jsp.display-item.citation.isi??? 7
social impact