All posts by felix

The emotion cube

There is a multitude of ways to model emotions, and some of them are collected in the EmotionML vacabularies.
Really popular with engineers and non-psychologists are two approaches:

  • discreet categories like anger, sadness, fear or joy, often associated with an intensity.
  • continuous dimensions like valence/pleasure, arousal or dominance

The emotion cube maps the emotional categories to a three dimensional space:

Nkululeko: how to augment the training set

To do data augmentation with Nkululeko, you can use the augment or the aug_train interface.
The difference is that the former only augments samples, whereas the latter augments the training set of a configuration and then immediately performs the training, including the augmented files.

In the AUGMENT section of your configuration file, you specify the method and name of the output list of file

  • traditional: is the classic augmentation, e.g. by cropping data or adding a bit of noise. We use the audiomentations package for this
  • random-splice: is a special method introduced in this paper that randomly splices and re-connects the audio samples
[AUGMENT]
# select the samples to augment: either train, test, or all
sample_selection = train
# select the method(s)
augment = ['traditional', 'random_splice']
# file name to store the augmented data (can then be added to training)
result = augmented.csv

and then call the interface:

python -m nkululeko.augment --config myconfig.ini

or

python -m nkululeko.aug_train--config myconfig.ini

if you want to run a training in the same run.

Currently, apart from random-splicing, Nkululeko simply uses the audiomentations module, i.e.:

[AUGMENT]
augment = ['traditional']
augmentations = Compose([
AddGaussianNoise(min_amplitude=0.001, max_amplitude=0.05),
Shift(p=0.5),
BandPassFilter(min_center_freq=100.0, max_center_freq=6000),])

These manipulations are applied randomly to your training set.

With respect to the random_splicing method, you can adjust two parameters:

  • p_reverse: probability of some samples to be in reverse order (default: 0.3)
  • top_db: top dB level for silence to be recognized (default: 12)

This configuration, for example, would distort the samples much more than the default:

[AUGMENT]
augment = ['random_splice']
p_reverse = .8
top_db = 6

You should find the augmented files in the storage folder of the result folder of your experiment and could listen to them there.

Once you augmentations have been processed, you can add them to the training in a new experiment:

[DATA]
databases = ['original data', 'augment']
augment = my_augmentations.csv
augment.type = csv
augment.split_strategy = train

Supervised vs. unsupervised

Supervised vs. unsupervised

means the distinction whether your training data is annotated (or labeled) with respect to your task. An example: If you want to build a machine learner for human age estimation based on speech, you might give an algorithm a lot of examples of human speech annotated with the age of the person. This would be your training data and the approach would be supervised (by the age annotations). With unsupervised learning, you would give an algorithm simply a lot of human speech data and might ask it to cluster the data, based on differences. And might hope that the resulting clusters coincide with age.

Nkululeko exercise

-> Nkululeko: install the Berlin Emodb

This database contains examples of labels:

  • emotion and gender labels as categorical data, for classification
  • age labels as numerical data, for regression

Nkululeko: show feature importance

Since version 0.40, Nkululeko can now show the best performing X acoustic features according to some model.

There is a new section call EXPL (short for exploration), and you could state

[EXPL]
feature_distributions = True
model = ['tree', 'gmm', 'svm']
permutation = True
max_feats = 5
sample_selection = test

in your config file, and then run the exploration module like this:

python -m nkululeko.explore --config my_config.ini

The following values can be set in the config:

  • feature_distributions: use this analysis.
  • sample_selection: Which sample set/split to use, either all, train or test
  • model: Which models to use to estimate feature importance. Can be all models from the MODEL section, If several are stated, the mean result is used.
  • max_feats: The best n feats are shown
  • permutation: use feature permutation to determine the best features. Make sure to test the models before.

The resulting list will then appear in the result folder and a barplot image in the image folder.

Afterwards you could inspect single features as described here

Nkululeko: how to plot distributions of feature values

As shown in this post, with Nkululeko you can select only specific features from your features sets by specifying them in the [FEAT] section:

[FEATS]
features = ['JitterPCA', 'meanF0Hz', 'hld_sylRate']

What you can also do, is plotting them per category (only for classification), by specifying in the PLOT section if you would like that for all samples or only test or train samples:

[EXPL]
# turn it on
feature_distributions = True 
# use only training samples
sample_selection = train 
# only plot the 5 most important features 
max_feats = 5  
# also plot this feature, irrespective of their importance
plot_features = ['f3_median']
# turn on output of all statistical tests
print_stats = True

You would have to call nkululeko with the explore interface:

python -m nkululeko.explore --config <myConfig.ini>

The image file is in the image folder and should look similar to this:

Stating in the title:

  • the number of samples
  • The outcome of a Kruskal-Wallis test (if more than two catgories)
  • The outcome of the most significant difference according to a t-test (you see all outcomes in the debug output if print_stats=True)

or, like this (for the additional feature).
You can turn gender specifics off by specifying

[EXPL]
ignore_gender = True

Nkululeko: how to predict many samples

There are three ways to predict a number of samples:

  1. If you want to save the predictions of an experiment for later use, you can do so by stating in the EXP section

    [EXP]
    save_test = ./my_saved_test_predictions.csv

    The output format is CSV, comma seperated values.

  2. Alternatively, you can test an existing database against the best model you trained before, by stating the databases as tests in the DATA section:

    [DATA]
    tests = ['my_testdb']
    my_testdb = /mypath/my_testdb
    ...

    and then calling Nkululeko's test module

    python -m nkululeko.test --config mycoonfg.ini --outfile myresults.csv
  3. Run the demo module simply for a set of files:

    python -m nkululeko.demo --config mycoonfg.ini --list my_filelist.txt

How to normalize features

"Normalizing" or scaling feature values means to shift them to a common range, or distribution with same mean and standard deviation (also called z-transformation).
You would do that for several reasons:

  • Artificial neural nets can handle small numbers best, so they all should be in the range -1, 1
  • Speakers have their individual ways to speak which you are not interested in if you want to learn a general task, e.g. emotion or age. So you would speaker-normalize the values for each speaker individually. Of course this is in most applications not possible because you don't have already samples of your test speakers.
  • You might want to normalize the sexes, because woman typicall have a higher pitch. But another way out is also to use only relative values and not absolute ones.

Mind that you shouldn't use your test set for normalization as it really only should be used for test and is supposed to be unknown. That's why you should compute your normalization parameters on the training set, you can then use them to normalize/scale the test.

Nkululeko exercise

  • try different scaling methods for the acoustic features with nkululeko

Augmenting data

Often (kind of always) there is a lack of training data for supervised learning.

One way to tackle this is representation learning which can be done in an self-supervised fashion.

Another approach is to multiply your labeled training data by adding slightly altered versions of it, that would not change the information that is the aim of the detection, for example by adding noise to the data or clipping it. This is called augmentation and here is a post how to do this with nkululeko.

A third way is to synthesize data based on the labeled training, for example with GANs, VAEs or with rule-based simulation. It can be distinguished if in this case only a parameterized for of the samples (ie. the features) or whole audio files are generated.

Sometimes only samples for a rare class are needed, in this case techniques like ROS (random over sampling), Synthetic Minority Oversampling Technique (SMOTE) or the Adaptive Synthetic (ADASYN) can be used.

Nkululeko exercise

  • Here is a post how to do this with nkululeko. Try it out

Nkululeko

This is the entry post for Nkululeko: a framework to do machine learning experiments on audio data based on configuration files.

Here's an overview on the tutorials: