Nkululeko: how to align databases

6. August 2025 felix Leave a comment

Sometimes you might want to combine databases that are similar, or alike, but don't handle exactly the same phenomena.

Take for example stress and emotion, you don't have enough data that labels stress, but many emotion databases that label anger and happiness. You might try the approach to use angry samples as stressed and happy or neutral as non-stressed.

Taking the usual emodb as example, and famous Susas as a database sampling stressed voices, you can do this like this:

[DATA]
databases = ['emodb', 'susas']

emodb = ./data/emodb/emodb
# indicate where the target values are
emodb.target_tables = ["emotion"]
# rename emotion to stress
emodb.colnames = {"emotion": "stress"}
# only use angry, neutral and happy samples
emodb.filter = [["stress", ["anger", "neutral", "happiness"]]]
# map them to stress
emodb.mapping = {"anger": "stress",  "neutral": "no stress", "happiness": "no stress"}
# and put everything to the training
emodb.split_strategy = train

susas = data/susas/
# map ternary stress labes to binary
susas.mapping = {'0,1':'no stress', '2':'stress'}
susas.split_strategy = speaker_split

target = stress
labels = ["stress", "no stress"]

So Susas will be split into train and test, but the training will be strenghend by the whole of emodb. This usually makes actually more sense if a third database is available for evaluation, because in-domain machine learning in most of the cases always works better than adding out-of-domain data (like we do here with emodb).

Allgemein

Nkululeko: using uncertainty

4. August 2025 felix Leave a comment

With nkululeko since version 0.94 (aleatoric) uncertainty, i.e. the confidence of the model, is explicitly visualized. You simply find a plot in the image folder after running an experiment, like so:

You see the distribution for true vs. false predictions wrt. uncertainty, i.e. in this case this worked out quite well (because less uncertain prediction are usually correct).

The approach is described in our paper Uncertainty-Based Ensemble Learning For Speech Classification

You can use this to tweak your results if you specify an uncertainty-threshold, i.e. you refuse to predict sample that are above some threshold:

[PLOT]
uncertainty_thresshold = .4

You will than get additionally a confusion plot that only takes the selected samples into account.

This might feel like cheating, but especially in critical use cases it might be better to deliver not prediction than a wrong one.

Allgemein

Nkululeko: feauture scaling

4. August 2025 felix Leave a comment

As described in this previous post, features scaling can be quite important in machine learning.

With nkululeko since version 0.97 you have a multitude if scaling methods at hand.

You simply state in the config:

[FEATS]
scale = xxx

For xxx you specify the scaling methods are

standard: z-transformation (mean of 0 and std of 1) based on the training set
- robust: robust scaler
speaker: like standard but based on individual speaker sets (also for the test)
bins: convert feature values into 0, .5 and 1 (for low, mid and high)
minmax: rescales the data set such that all feature values are in the range [0, 1]
maxabs: similar to MinMaxScaler except that the values are mapped across several ranges depending on whether negative OR positive values are present
normalizer: scales each sample (row) individually to have unit norm (e.g., L2 norm)
powertransformer: applies a power transformation to each feature to make the data more Gaussian-like in order to stabilize variance and minimize skewness
quantiletransformer: applies a non-linear transformation such that the probability density function of each feature will be mapped to a uniform or Gaussian distribution (range [0, 1])

speechsurfer

Monthly Archives: August 2025

Nkululeko: how to align databases

Nkululeko: using uncertainty

Nkululeko: feauture scaling

blog around speech technology