This annotated dataset consists of 807,707 English language song lyrics, each tagged with the information on whether the lyrics contains explicit content, i.e., unsuitable for children. The dataset, built starting from Spotify and LyricWiki content, was developed to support the training and evaluation of automatic tools for detecting explicit song lyrics. The construction of the dataset is described in the following associated publication (c.f. Section 4.1): Marco Rospocher. Explicit song lyrics detection with subword-enriched word embeddings. In Expert Systems with Applications, Volume 163, January 2021, 113749 DOI: 10.1016/j.eswa.2020.113749

Dataset for explicit lyrics detection

Marco Rospocher
2021-01-01

Abstract

This annotated dataset consists of 807,707 English language song lyrics, each tagged with the information on whether the lyrics contains explicit content, i.e., unsuitable for children. The dataset, built starting from Spotify and LyricWiki content, was developed to support the training and evaluation of automatic tools for detecting explicit song lyrics. The construction of the dataset is described in the following associated publication (c.f. Section 4.1): Marco Rospocher. Explicit song lyrics detection with subword-enriched word embeddings. In Expert Systems with Applications, Volume 163, January 2021, 113749 DOI: 10.1016/j.eswa.2020.113749
2021
machine learning, explicit content, text classification
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1059799
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact