Skip to the content.

EMOPIA

EMOPIA (pronounced ‘yee-mò-pi-uh’) dataset is a shared multi-modal (audio and MIDI) database focusing on perceived emotion in pop piano music, to facilitate research on various tasks related to music emotion. The dataset contains 1,087 music clips from 387 songs and clip-level emotion labels annotated by four dedicated annotators. Since the clips are not restricted to one clip per song, they can also be used for song-level analysis.

The detail of the methodology for building the dataset please refer to our paper.

Example of the dataset

Low Valence High Valence
High Arousal
Q2
Q1
Low Arousal
Q3
Q4

Number of clips

The following table shows the number of clips and their average length for each quadrant in Russell’s valence-arousal emotion space, in EMOPIA.

Quadrant # clips Avg. length (in sec / #tokens)
Q1 250 31.9 / 1,065
Q2 265 35.6 / 1,368
Q3 253 40.6 / 771
Q4 310 38.2 / 729

Pipeline of data collection

Fig.1

Dataset Analysis

Fig.2 Violin plots of the distribution in (a) note density, (b) length, and (c) velocity for clips from different classes.
Fig.3 Histogram of the keys (left / right: major / minor 249 keys) for clips from different emotion classes.

Emotion Classification

Fig.4 symbolic-domain and audio domain representation.
Fig.5 results of emotion classification.
Fig.6 inference example of Sakamoto: Merry Christmas Mr. Lawrence

For the classification codes, please refer to SeungHeon’s repository.

The pre-trained model weights are also in the repository.

Conditional Generation

Fig.7 Compound word with emotion token


Q1 (High valence, high arousal)

Baseline
Transformer w/o pre-training
Transformer w/ pre-training

Q2 (Low valence, high arousal)

Baseline
Transformer w/o pre-training
Transformer w/ pre-training

Q3 (Low valence, low arousal)

Baseline
Transformer w/o pre-training
Transformer w/ pre-training

Q4 (High valence, low arousal)

Baseline
Transformer w/o pre-training
Transformer w/ pre-training

Authors and Affiliations

Cite this dataset

@inproceedings{
         {EMOPIA},
         author = {Hung, Hsiao-Tzu and Ching, Joann and Doh, Seungheon and Kim, Nabin and Nam, Juhan and Yang, Yi-Hsuan},
         title = {{EMOPIA}: A Multi-Modal Pop Piano Dataset For Emotion Recognition and Emotion-based Music Generation},
         booktitle = {Proc. Int. Society for Music Information Retrieval Conf.},
         year = {2021}
          }