This annotated dataset consists of 807,707 English language song lyrics, each tagged with the information on whether the lyrics contains explicit content, i.e., unsuitable for children. The dataset, built starting from Spotify and LyricWiki content, was developed to support the training and evaluation of automatic tools for detecting explicit song lyrics. The construction of the dataset is described in the following associated publication (c.f. Section 4.1): Marco Rospocher. Explicit song lyrics detection with subword-enriched word embeddings. In Expert Systems with Applications, Volume 163, January 2021, 113749 DOI: 10.1016/j.eswa.2020.113749
Dataset for explicit lyrics detection
Marco Rospocher
2021-01-01
Abstract
This annotated dataset consists of 807,707 English language song lyrics, each tagged with the information on whether the lyrics contains explicit content, i.e., unsuitable for children. The dataset, built starting from Spotify and LyricWiki content, was developed to support the training and evaluation of automatic tools for detecting explicit song lyrics. The construction of the dataset is described in the following associated publication (c.f. Section 4.1): Marco Rospocher. Explicit song lyrics detection with subword-enriched word embeddings. In Expert Systems with Applications, Volume 163, January 2021, 113749 DOI: 10.1016/j.eswa.2020.113749I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.