Category Archives: nkululeko

Nkululeko: ensemble learners with late fusion

With nkululeko since version 0.88.0 you can combine experiment results and report on the outcome, by using the ensemble module.

For example, you would like to know if the combination of expert features and learned embeddings works better than one of those. You could then do

python -m nkululeko.ensemble \
--method max_class \
tests/exp_emodb_praat_xgb.ini \
tests/exp_emodb_ast_xgb.ini \
tests/exp_emodb_wav2vec_xgb.in

(all in one line)
and would then get the results for a majority voting of the three results for Praat, AST and Wav2vec2 features.

Other methods are mean, max, sum, max_class, uncertainty_threshold, uncertainty_weighted, confidence_weighted:

  • majority_voting: The modality function for classification: predict the category that most classifiers agree on.
  • mean: For classification: compute the arithmetic mean of probabilities from all predictors for each labels, use highest probability to infer the label.
  • max: For classification: use the maximum value of probabilities from all predictors for each labels, use highest probability to infer the label.
  • sum: For classification: use the sum of probabilities from all predictors for each labels, use highest probability to infer the label.
  • max_class: For classification: compare the highest probabilities of all models across classes (instead of same class as in max_ensemble) and return the highest probability and the class
  • uncertainty_threshold: For classification: predict the class with the lowest uncertainty if lower than a threshold (default to 1.0, meaning no threshold), else calculate the mean of uncertainties for all models per class and predict the lowest.
  • uncertainty_weighted: For classification: weigh each class with the inverse of its uncertainty (1/uncertainty), normalize the weights per model, then multiply each class model probability with their normalized weights and use the maximum one to infer the label.
  • confidence_weighted: Weighted ensemble based on confidence (1-uncertainty), normalized for all samples per model. Like before, but use confidence (instead of inverse of uncertainty) as weights.

Nkululeko: export acoustic features

With nkululeko since version 0.85.0 the acoustic features for the test and the train (aka dev) set are exported to the project store.

If you specify the store_format:

[FEATS]
store_format = csv

they will be exported to CSV (comma separated value) files, else PKL (readable by python pickle module).
I.e. you store should then after execution of any nkululeko module that computes features the two files:

  • feats_test.csv
  • feats_train.csv

If you specified scaling the features:

[FEATS]
scale = standard # or speaker

you will have two additional files with features:

  • feats_test_scaled.csv
  • feats_train_scaled..csv

In contrast to the other feature stores, these contain the exact features that are used for training or feature importance exploration, so they might be combined from different feature types and selected via the features value. An example:

[FEATS]
type = ['praat', 'os']
features = ['speechrate_nsyll_dur', 'F0semitoneFrom27.5Hz_sma3nz_amean']
scale = standard
store_format = csv

results in the following feats_test.csv:

file,start,end,speechrate_nsyll_dur,F0semitoneFrom27.5Hz_sma3nz_amean
./data/emodb/emodb/wav/11b03Wb.wav,0 days,0 days 00:00:05.213500,4.028004219813945,34.42206
./data/emodb/emodb/wav/16b10Td.wav,0 days,0 days 00:00:03.934187500,3.0501850763340586,31.227554

....

How to use train, dev and test splits with Nkululeko

Usually in machine learning, you train your predictor on a train set, tune meta-parameters on a dev (development or validation set ) and evaluate on a test set.
With nkululeko, there currently the test set is not, as there are only two sets that can be specified: train and evaluation set.
A work-around is to use the test module to evaluate your best model on a hold out test set at the end of your experiments.
All you need to do is to specify the name of the test data in your [DATA] section, like so (let's call it myconf.ini):

[EXP]
save = True
....
[DATA]
databases =  ['my_train-dev_data']
... 
tests = ['my_test_data']
my_test_data = ./data/my_test_data/
my_test_data.split_strategy = test
...

you can run the experiment module with your config:

python -m nkululeko.nkululeko --config myconf.ini

and then, after optimization (of predictors, features sets and meta-parameters), use the test module

python -m nkululeko.test --config myconf.ini

The results will appear at the same place as all other results, but the files are named with test and the test database as a suffix.

If you need to compare several predictors and feature sets, you can use the nkuluflag module
All you need to do, is, in your main script, if you call the nkuluflag module, pass a parameter (named --mod) to tell it to use the test module:

cmd = 'python -m nkululeko.nkuluflag --config myconf.ini  --mod test '

Nkululeko: how to bin/discretize your feature values

With nkululeko since version 0.77.8 you have the possibility to convert all feature values into the discreet classes low, mid and high

Simply state

[FEATS]
type = ['praat']
scale = bins
store_format = csv

in your config to use Praat features.
With the store format stated as csv you will be able to look at the train and test features in the store folder.

The binning will be done based on the 33 and 66 percent of the training feature values.

Nkululeko: compare several databases

With nkululeko since version 0.77.7 there is a new interface named multidb which lets you compare several databases.

You can state their names in the [EXP] section and they will then be processed one after each other and against each other, the results are stored in a file called heatmap.png in the experiment folder.

!Mind YOU NEED TO OMIT THE PROJECT NAME!

Here is an example for such an ini.file:

[EXP]
root = ./experiments/emodbs/
#  DON'T give it a name, 
# this will be the combination 
# of the two databases: 
# traindb_vs_testdb
epochs = 1
databases = ['emodb', 'polish']
[DATA]
root_folders = ./experiments/emodbs/data_roots.ini
target = emotion
labels = ['neutral', 'happy', 'sad', 'angry']
[FEATS]
type = ['os']
[MODEL]
type = xgb

you can (but don't have to), state the specific dataset values in an external file like above.
data_roots.ini:

[DATA]
emodb = ./data/emodb/emodb
emodb.split_strategy = specified
emodb.test_tables = ['emotion.categories.test.gold_standard']
emodb.train_tables = ['emotion.categories.train.gold_standard']
emodb.mapping = {'anger':'angry', 'happiness':'happy', 'sadness':'sad', 'neutral':'neutral'}
polish = ./data/polish_emo
polish.mapping = {'anger':'angry', 'joy':'happy', 'sadness':'sad', 'neutral':'neutral'}
polish.split_strategy = speaker_split
polish.test_size = 30

Call it with:

python -m nkululeko.multidb --config my_conf.ini

The default behavior is that all databases are used as a whole when being test or training database. If you would rather like the splits to be used, you can add a flag for this:

[EXP]
use_splits = True

Here's a result with two databases:

and this is the same experiment, but with augmentations:

In order to add augmentation, simply add an [AUGMENT] section:

[EXP]
root = ./experiments/emodbs/augmented/
epochs = 1
databases = ['emodb', 'polish']
[DATA]
--
[AUGMENT]
augment = ['traditional', 'random_splice']
[FEATS]
...

In order to add an additional training database to all experiments, you can use:

[CROSSDB]
train_extra = [meta, emodb]

, to add two databases to all training data sets,
where meta and emodb should then be declared in the root_folders file

Nkululeko: generate a latex/pdf report

With nkululeko since version 0.66.3, a report document formatted in Latex and compiled as a PDF file can automatically be generated, basically as a compilation of the images that are generated.
There is a dedicated REPORT section in the config file for this, here is an example:

[REPORT]
# should the report be shown in the terminal at the end?
show = False 
# should a latex/pdf file be printed? if so, state the filename
latex = emodb_report
# name of the experiment author (default "anon")
author = Felix
# title of the report (default "report")
title = EmoDB

NOTE:
with each run of a nkululeko module in the same experiment environment, the details of the report will be added.
So a typical use would be, to first run the general module and than more specialized ones:

# first run a segmentation 
python -m nkululeko.segment --config myconf.ini 
# then rename the data-file in the config.ini and
# run some data exploration
python -m nkululeko.explore --config myconf.ini 
# then run a machine learning experiment
python -m nkululeko.nkululeko --config myconf.ini 

Each run will add some contents to the report

Nkululeko: segmenting a database

Segmenting a database means to split the audio samples of a database into smaller segments or chunks. With speech data this is usually done on the basis of VAD, aka voice activity detection, meaning that the pauses between speech in the audio samples are used as segment borders.

The reason for segmenting could be to label the data with something that would not last over the whole sample, e.g. emotional state.
Another motivation to segment audio data might be that the acoustic features are targeted at a specific stretch of audio, e.g. 3-5 seconds long.

Within nkululeko this would be done with the segment module, which is currently based on the silero software.

You simply call your experiment configuration with the segment module, and the train, test set or both will be segmented.
The advantage is, that you can use all filters on your data that might make sense beforehand, for example with the android corpus, only the reading task samples are not segmented.
You can select them like so:

[DATA]
filter = [['task', 'reading']]

and then call the segment module:

python -m nkululeko.segment --config my_conf.ini

The output is a new database file in CSV format.

If you want, you can specify if only the training, or test split, or both should be segmented, as well as the string that is added to the name of the resulting csv file (the name per default consists of the database names):

[SEGMENT]
# name postfix
target = _segmented
# which model to use
method = silero
# which split: train, test or all (both)
sample_selection = all
# the minimum lenght of rest-samples (in seconds)
min_length = 2
# the maximum length of segments, longer ones are cut here.  (in seconds)
max_length = 10

Nkululeko: check your dataset

Within nkululeko, since version 0.53.0, you can perform automatic data checks, which means that some of your data might be filtered out if it doesn't fulfill certain requirements.

Currently two checks are implemented:

[DATA]
# check the filesize of all samples in train and test splits, in bytes
 check_size = 1000
# check if the files contain speech with voice activity detection (VAD)
 check_vad = True

VAD is using silero VAD

Nkululeko: how to visualize your data distribution

If you just want to see how your data distributes on the target with nkululeko, you can do a value_counts plot with the explore module

In your config, you would specify like this:

[EXPL]
# all samples, or only test or train split?
sample_selection = all 
# activate the plot
value_counts = [['age'], ['gender'], ['duration'], ['duration', 'age']] 

and then, run this with the explore module:

python -m nkululeko.explore --config myconfig.ini

The results, for a data set with target=depression, looks similar to this for all samples:


and this for the speakers (if there is a speaker annotation)

If you prefer a kernel density estimation over a histogram, you can select this with

[EXPL]
dist_type = kde

which would result for duration to:

Nkululeko distinguishes between categorical and continuous properties, this would be the output for gender

You can show the distribution of two sample properties at once, by using a scatter plot:

In addition, this module will automatically plot the distribution of samples per speaker, per gender (if annotated):

Nkululeko: how to augment the training set

To do data augmentation with Nkululeko, you can use the augment or the aug_train interface.
The difference is that the former only augments samples, whereas the latter augments the training set of a configuration and then immediately performs the training, including the augmented files.

In the AUGMENT section of your configuration file, you specify the method and name of the output list of file

  • traditional: is the classic augmentation, e.g. by cropping data or adding a bit of noise. We use the audiomentations package for this
  • random-splice: is a special method introduced in this paper that randomly splices and re-connects the audio samples
[AUGMENT]
# select the samples to augment: either train, test, or all
sample_selection = train
# select the method(s)
augment = ['traditional', 'random_splice']
# file name to store the augmented data (can then be added to training)
result = augmented.csv

and then call the interface:

python -m nkululeko.augment --config myconfig.ini

or

python -m nkululeko.aug_train--config myconfig.ini

if you want to run a training in the same run.

Currently, apart from random-splicing, Nkululeko simply uses the audiomentations module, i.e.:

[AUGMENT]
augment = ['traditional']
augmentations = Compose([
AddGaussianNoise(min_amplitude=0.001, max_amplitude=0.05),
Shift(p=0.5),
BandPassFilter(min_center_freq=100.0, max_center_freq=6000),])

These manipulations are applied randomly to your training set.

With respect to the random_splicing method, you can adjust two parameters:

  • p_reverse: probability of some samples to be in reverse order (default: 0.3)
  • top_db: top dB level for silence to be recognized (default: 12)

This configuration, for example, would distort the samples much more than the default:

[AUGMENT]
augment = ['random_splice']
p_reverse = .8
top_db = 6

You should find the augmented files in the storage folder of the result folder of your experiment and could listen to them there.

Once you augmentations have been processed, you can add them to the training in a new experiment:

[DATA]
databases = ['original data', 'augment']
augment = my_augmentations.csv
augment.type = csv
augment.split_strategy = train