Category Archives: Allgemein

Nkululeko: how to use train/dev/test splits

Supervised machine learning operates as follows: during the training phase, a learning algorithm is adapted to a training dataset, producing a trained model, which is then used to make predictions on a test set during the inference phase.

One potential issue with this approach is that, for sufficiently complex models, they may simply memorise all items in the training set rather than learning a generalised distinction based on an underlying process, such as emotional expression or speaker age. This means that while the model performs well on the training data, it fails to generalise to new data—a phenomenon known as overfitting.

To mitigate this, the model's hyperparameters are optimised using a held-out evaluation set that is not used during training. One particularly important hyperparameter is the number of epochs—that is, the number of times the entire training set is processed. Typically, to prevent overfitting, training is halted when performance on the evaluation set begins to decline, a technique known as early stopping. The model that performs best on the evaluation data is then selected.

However, this approach introduces a new problem: the model may (and most likely has) now overfitted to the evaluation data. This is why a third dataset is necessary for final testing—one that has not been used at any stage of model development.

The evaluation set is often referred to as the dev set (short for development set). Consequently, Nkululeko now provides support for three distinct data splits: train, dev, and test.

Here is an example how you would do this with emoDB (the distribution has no predefined splits for train, dev and test)

[EXP]
root = ./experiments/emodb_3split/
name = results
epochs = 100
traindevtest = True
[DATA]
databases = ['emodb']
emodb = ./data/emodb/emodb
emodb.split_strategy = speaker_split
labels = ["neutral", "sadness", "happiness"]
target = emotion
[FEATS]
type = ['os']
[MODEL]
type = mlp
layers = {'l1':100, 'l2':16}
patience = 10
[PLOT]
best_model = True
epoch_progression = True

You trigger the handling of three splits with

traindevtest = True

and the rest happens in this case automatically, the results are then shown for

best model based on development sets:

best model for dev set, but evaluated on test set:

and the last model, evaluated on the dev set:

In this case, you see that the 62nd epoch performed like the 52nd for the dev set. But, this best model evaluated on the test set, drops by more than 20 % average recall, which is a more stable value for the general performance of this model (this is only a toy example with 4 speakers in the training, and 2 each for dev and test set.)

Nkululeko: predict speaker id

With nkululeko since version 0.93.0 the pyannote segmentation package is interfaced (as an alternative to silero)

There are two modules that you can use for this:

  • SEGMENT
  • PREDICT

The (huge) difference is, that the SEGMENT module looks at each file in the input data and looks for speakers per file (can be only one large file), while the PREDICT module concatenates all input data and looks for different speakers in the whole database.

In any case best run it on a GPU, as CPU will be very slow (and there is no progress bar).

Segment module

If you specify the method in [SEGMENT] section and the hf_token (needed for the pyannote model) in the [MODEL] section

[SEGMENT]
method = pyannote
segment_target = _segmented
sample_selection = all
[MODEL]
hf_token = <my hugging face token>

your resulting segmentations will have predicted speaker id attachched.. Be aware that this is really slow on CPU, so best run on GPU and declare so in the [MODEL] section:

[MODEL]
hf_token = <my hugging face token>
device=gpu # or cuda:0

As a result a new plot would appear in the image folder: the distribution of speakers that were found, e.g. like this:

Predict module

Simply select speaker as the prediction target:

[PREDICT]
targets = ["speaker"]

Generally, the PREDICT module is described here

Nkululeko: how to finetune a transformer model

With nkululeko since version 0.85.0 you can finetune a transformer model with huggingface (and even publish it there if you like).

If you like to have your model published, set:

[MODEL]
push_to_hub = True

Finetuning in this context means to train the (pre-trained) transformer layers with your new training data labels, as opposed to only using the last layer as embeddings.

The only thing you need to do is to set your MODEL type to finetune:

[FEATS]
type = []
[MODEL]
type = finetune

The acoustic features can/should be empty, because the transformer model starts with CNN layers to model the acoustics frame-wise. The frames are then getting pooled by the model for the whole utterance (max. duration the first 8 seconds, the rest is ignored).

The default base model is the one from facebook, but you can specify a different one like this:

[MODEL]
type = finetune
pretrained_model = microsoft/wavlm-base

duration = 10.5

The parameter max_duration is also optional (default=8) and means the maximum duration of your samples / segments (in seconds) that will be used, starting from 0. The rest is disregarded.

You can use the usual deep learning parameters:

[MODEL]
learning_rate = .001
batch_size = 16
device = cuda:3
measure = mse
loss = mse

but all of them have defaults.

The loss function is fixed to

  • weighted cross entropy for classification
  • concordance correlation coefficient for regression

The resulting best model and the huggingface logs (which can be read by tensorboard) are stored in the project folder.

Nkululeko: how to tweak the target variable for database comparison

Sometimes you want to compare two different databases that share a similar target variable, say, related to likability, but in a different scaling, say the one asked on a scale from 1 to 10 and the other used likert-scale from 1-7.

With nkululeko you can re-name labels, normalize the target values, and even inverse the polarity, for each databases.

In the following example there are two databases

[DATA]
databases = ['db1', 'db2']
db1.split_strategy = test
db1.scale = True
db2.colnames = {'non-attractive':'likability'}
db2.split_strategy = train
db2.scale = True
db2.reverse = True
db2.reverse.max = 10
target = likability
bins = [-1000, .2, 1000]
labels = ['less likable', 'more likable']

The one database db1 already has a likability label and just needs to be standard-normalized, the second one db2 has a related label non-attractive which needs to be renamed, inverted (based on a hypothetical maximum value of 10) and normalized.
Then, db1 can be used as test data and db2 as training.

Nkululeko: compare several databases

With nkululeko since version 0.77.7 there is a new interface named multidb which lets you compare several databases.

You can state their names in the [EXP] section and they will then be processed one after each other and against each other, the results are stored in a file called heatmap.png in the experiment folder.

!Mind YOU NEED TO OMIT THE PROJECT NAME!

Here is an example for such an ini.file:

[EXP]
root = ./experiments/emodbs/
#  DON'T give it a name, 
# this will be the combination 
# of the two databases: 
# traindb_vs_testdb
epochs = 1
databases = ['emodb', 'polish']
[DATA]
root_folders = ./experiments/emodbs/data_roots.ini
target = emotion
labels = ['neutral', 'happy', 'sad', 'angry']
[FEATS]
type = ['os']
[MODEL]
type = xgb

you can (but don't have to), state the specific dataset values in an external file like above.
data_roots.ini:

[DATA]
emodb = ./data/emodb/emodb
emodb.split_strategy = specified
emodb.test_tables = ['emotion.categories.test.gold_standard']
emodb.train_tables = ['emotion.categories.train.gold_standard']
emodb.mapping = {'anger':'angry', 'happiness':'happy', 'sadness':'sad', 'neutral':'neutral'}
polish = ./data/polish_emo
polish.mapping = {'anger':'angry', 'joy':'happy', 'sadness':'sad', 'neutral':'neutral'}
polish.split_strategy = speaker_split
polish.test_size = 30

Call it with:

python -m nkululeko.multidb --config my_conf.ini

The default behavior is that all databases are used as a whole when being test or training database. If you would rather like the splits to be used, you can add a flag for this:

[EXP]
use_splits = True

Here's a result with two databases:

and this is the same experiment, but with augmentations:

In order to add augmentation, simply add an [AUGMENT] section:

[EXP]
root = ./experiments/emodbs/augmented/
epochs = 1
databases = ['emodb', 'polish']
[DATA]
--
[AUGMENT]
augment = ['traditional', 'random_splice']
[FEATS]
...

In order to add an additional training database to all experiments, you can use:

[CROSSDB]
train_extra = [meta, emodb]

, to add two databases to all training data sets,
where meta and emodb should then be declared in the root_folders file

Nkululeko: oversample the training set

Sometimes, with categorically labeled data, the number of samples per class is very unevenly distributed, misleading the model to think that the overwhelming majority class is more important than the others.
In this case, two techniques might help: class weighting assigns a higher weight to samples from minority classes, and oversampling "invents" new samples for the minority classes.
With nkululeko since version 0.70.0, you can oversample the trainig set with different algorithms implemented by the imb_learn package.
You simply state the method in the FEATS section like so:

[FEATS]
...
balancing = adasyn # either ros, smote or adasyn

Three methods are available:

  • ros: simply repeat random samples from the minority classes
  • smote: "invent" new minority samples by little changes from the existing ones
  • adasyn: similar to smote, but resulting in uneven class distributions

Nkululeko: re-name data column names

With nkululeko since version 0.68.1, you can re-name data fields (columns in your data table) by setting the following in your ini-file:

[DATA]
databases = ['mydata']
mydata.colnames = {'Participant ID':'speaker', 'sex':'gender', 'Age': 'age'}

which means, that, before further processing, the Participant ID field in your database mydata will be treated as speaker label and so on.

Nkululeko: automatically stratify your split sets

With nkululeko since version 0.68.0, the selection of test/dev vs. train samples can be done automatically in a stratified manner, i.e. trying to find splits that are age or gender balanced.
An example for such a configuration is this:

[DATA]
# the name of the database
databases = ['emodb']
# the location of the data
emodb = ./data/emodb/emodb
# set the split strategy to "balanced"
emodb.split_strategy = balanced
# set a percentage value for your test split
emodb.test_size = 20
# stratify variables with weights for importance
balance = {'emotion':2, 'age':1, 'gender':1}
# all stratification variables need to be categorical, 
# so we need to state the number of bins for "age" 
age_bins = 2
# a value for how much importance to give for the ideal group sizes
size_diff_weight = 1
# the target value of the experiment
target = emotion

Nkululeko will always keep the speaker variable disjunct, i.e. resulting splits will contain different speakers.
With the example above, the algorithm will try to balance emotion, gender and (binned) age distributions across the splits.

Nkululeko: inspect your data with Spotlight

With nkululeko since version 0.67.0, the spotlight software is directly integrated as part of the EXPLORE module.

You can simply run your data filters, augmentations, machine learning experiments, segmentations and model predictions as usual, and then call the spotlight software by adding to your configuration file:

[EXPL]
sample_selection = all # or train or test 
spotlight = True 

and running the EXPORE module

python -m nkululeko.explore --config myconfig.ini

Note that you might require to install an extra package:

pip install renumics-spotlight

A new web browser window should open as an interface to spotlight:

Torchaudio

If you use modules, feature-extractors or models that use torchaudio with Nkululeko, like e.g . Resampler or Squim model, you need to install the nightly version.

pip uninstall -y torch torchvision torchaudio
pip install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu