The emotion cube

There is a multitude of ways to model emotions, and some of them are collected in the EmotionML vacabularies.
Really popular with engineers and non-psychologists are two approaches:

  • discreet categories like anger, sadness, fear or joy, often associated with an intensity.
  • continuous dimensions like valence/pleasure, arousal or dominance

The emotion cube maps the emotional categories to a three dimensional space:

Nkululeko: how to augment the training set

To do data augmentation with Nkululeko, you can use the augment interface.
In the DATA section of your configuration file, you specify the name of the output list of files like so

[DATA]
augment = my_augmentations.csv

and then call the interface:

python - nkululeko.augment --config myconfig.ini

Currently, Nkululeko simply uses the augmentations that are specified as a demo in the audiomentations documentation, i.e.:

self.audioment = Compose([
    AddGaussianNoise(min_amplitude=0.001, max_amplitude=0.015, p=0.5),
    TimeStretch(min_rate=0.8, max_rate=1.25, p=0.5),
    PitchShift(min_semitones=-4, max_semitones=4, p=0.5),
    Shift(min_fraction=-0.5, max_fraction=0.5, p=0.5),
])

These manipulations are applied randomly to your training set.

You should find the augmented files in the storage folder of the result folder of your experiment and could listen to them there.

Once you augmentations have been processed, you can add them to the training in a new experiment:

[DATA]
databases = ['original data', 'augment']
augment = my_augmentations.csv
augment.type = csv
augment.absolute_path = True
augment.split_strategy = train

Supervised vs. unsupervised

Supervised vs. unsupervised

means the distinction whether your training data is annotated (or labeled) with respect to your task. An example: If you want to build a machine learner for human age estimation based on speech, you might give an algorithm a lot of examples of human speech annotated with the age of the person. This would be your training data and the approach would be supervised (by the age annotations). With unsupervised learning, you would give an algorithm simply a lot of human speech data and might ask it to cluster the data, based on differences. And might hope that the resulting clusters coincide with age.

Nkululeko exercise

-> Nkululeko: install the Berlin Emodb

This database contains examples of labels:

  • emotion and gender labels as categorical data, for classification
  • age labels as numerical data, for regression