Category Archives: nkululeko

Nkululeko: how to predict topics for your texts

16. October 2025 felix Leave a comment

With nkululeko since version 1.0.1 we integrated a text classification model. It's a so-called zero-shot model, which means you can define the categories you would like to have predicted by yourself.

Prerequisite for this is that your data is transcribed, i.e. there is a text column in your data.

Here is an example ini file how to use this on a transcripted version of emodb

[EXP]
root = ./examples/results
name = emodb_textclassifier
[DATA]
databases = ['emodb']
emodb = ./examples/results//exp_emodb_translate/results/all_predicted.csv
emodb.type = csv
emodb.split_strategy = random
labels = ['anger', 'happiness']
target = emotion
[FEATS]
type = ['os']
store_format = csv
[MODEL]
type = svm
[PREDICT]
targets = ['textclassification']
textclassifier.candidates = ["sadness", "anger", "neutral", "happiness", "fear", "disgust", "boredom"]

The output is a version with all columns and one with only the pewdicted emotions (from text)

file,start,end,classification_winner,sadness,anger,neutral,happiness,fear,disgust,boredom
./data/emodb/emodb/wav/12a01Fb.wav,0 days,0 days 00:00:01.863625,neutral,0.11576763540506363,0.1414959877729416,0.3593694567680359,0.05933323875069618,0.08951663225889206,0.12100014835596085,0.11351688951253891
./data/emodb/emodb/wav/12a01Wc.wav,0 days,0 days 00:00:02.358812500,neutral,0.12048673629760742,0.1446247100830078,0.25808465480804443,0.04279503598809242,0.0794658437371254,0.25803136825561523,0.09651164710521698

It makes sense that almost all predicted labels are neutral, because emodb was designed to have linguistically neutral emotional content.

Following the winner class are the logits for all candidate classes.

nkululeko, tutorial

Nkululeko: how to compare classifiers, features and databases using multiple runs

24. September 2025 felix Leave a comment

With nkululeko since version 0.98 there is a functionality to compare the outcome for several runs across experiments.

Say, you would like to know if the difference between using acoustic (opensmile) features and linguistic embeddings (bert) as features for some classifier is significant. You could than use the outcomes of several runs from one MLP (multi layer perceptron) as tests that represent all possible runs (disclaimer: afaik this approach is disputable according to some statisticians).

You would set up your experiment like this:

[EXP]
...
runs = 10
epochs = 100
[FEATS]
type = ['bert']
#type = ['os']
#type = ['os', 'bert']
[MODEL]
type = mlp
...
patience = 5
[EXPL]
# turn on extensive statistical output
print_stats = True
[PLOT]
runs_compare = features

and run this three times, each time changing the feature type that is being used (bert, os, or the combination of both), so in the end you got a results folder three different run_results as text files in it.

Using this, nkululeko prints a plot that compares the three feature sets, here's a example (having used only 5 runs):

The title states the overall significance for all differences, as well as the largest one for pair-wise comparison. If you run-number is larger than 30, t-tests will be used instead of Mann-Whitney.

Allgemein, nkululeko, tutorial

Nkululeko tutorial: voice of wellness workshop

11. September 2025 felix Leave a comment

Context

In Sep 2025, we did the Voice of wellness workshop.

In this post i try the nkululeko experiments i use for the tutorials there.

Prepare the Database

i use the Androids corpus, paper here

First thing you should probably do is check the data formats and re-sample if necessary.

[RESAMPLE]
# which of the data splits to re-sample: train, test or all (both)
sample_selection = all
replace = True
target = data_resampled.csv

Explore

Check the database distributions

python -m nkululeko.explore --config data/androids/exp.in

Transcribe and translate

transcribe Note! this should be done on a GPU

translate, no GPU required as it uses a Google service

Segment

Androids database samples are quite long sometimes.
It makes sense to check if approaches work better on shorter speech segments.

python -m nkululeko.segment --config data/androids/exp.ini

Filter the data

[DATA]
data.limit_samples_per_speaker = 8
data.filter = [['task', 'interview']]
check_size = 1000

Define splits

Either use pre-defined folds:

[MODEL]
logo=5

or, randomly define splits, but stratify them:

[DATA]
data.split_strategy = balanced
data.balance = {'depression':2, 'age':1, 'gender':1}
data.age_bins = 2

Add additional training data

More details here

[DATA]
databases = ['data', 'emodb']
data.split_strategy = speaker_split
# add German emotional data
emodb = ./data/emodb/emodb
# rename emotion to depression
emodb.colnames = {"emotion": "depression"}
# only use neutral and sad samples
emodb.filter = [["depression", ["neutral", "sadness"]]]
# map them to depression
emodb.mapping = {"neutral": "control", "sadness": "depressed"}
# and put everything to the training
emodb.split_strategy = train
target = depression
labels = ['depressed', 'control']

nkululeko

Nkululeko: how to align databases

6. August 2025 felix Leave a comment

Sometimes you might want to combine databases that are similar, or alike, but don't handle exactly the same phenomena.

Take for example stress and emotion, you don't have enough data that labels stress, but many emotion databases that label anger and happiness. You might try the approach to use angry samples as stressed and happy or neutral as non-stressed.

Taking the usual emodb as example, and famous Susas as a database sampling stressed voices, you can do this like this:

[DATA]
databases = ['emodb', 'susas']

emodb = ./data/emodb/emodb
# indicate where the target values are
emodb.target_tables = ["emotion"]
# rename emotion to stress
emodb.colnames = {"emotion": "stress"}
# only use angry, neutral and happy samples
emodb.filter = [["stress", ["anger", "neutral", "happiness"]]]
# map them to stress
emodb.mapping = {"anger": "stress",  "neutral": "no stress", "happiness": "no stress"}
# and put everything to the training
emodb.split_strategy = train

susas = data/susas/
# map ternary stress labes to binary
susas.mapping = {'0,1':'no stress', '2':'stress'}
susas.split_strategy = speaker_split

target = stress
labels = ["stress", "no stress"]

So Susas will be split into train and test, but the training will be strenghend by the whole of emodb. This usually makes actually more sense if a third database is available for evaluation, because in-domain machine learning in most of the cases always works better than adding out-of-domain data (like we do here with emodb).

nkululeko, tutorial

Nkululeko: ensemble learners with late fusion

25. June 2024 felix Leave a comment

With nkululeko since version 0.88.0 you can combine experiment results and report on the outcome, by using the ensemble module.

For example, you would like to know if the combination of expert features and learned embeddings works better than one of those. You could then do

python -m nkululeko.ensemble \
--method max_class \
tests/exp_emodb_praat_xgb.ini \
tests/exp_emodb_ast_xgb.ini \
tests/exp_emodb_wav2vec_xgb.in

(all in one line)
and would then get the results for a majority voting of the three results for Praat, AST and Wav2vec2 features.

Other methods are mean, max, sum, max_class, uncertainty_threshold, uncertainty_weighted, confidence_weighted:

majority_voting: The modality function for classification: predict the category that most classifiers agree on.
mean: For classification: compute the arithmetic mean of probabilities from all predictors for each labels, use highest probability to infer the label.
max: For classification: use the maximum value of probabilities from all predictors for each labels, use highest probability to infer the label.
sum: For classification: use the sum of probabilities from all predictors for each labels, use highest probability to infer the label.
max_class: For classification: compare the highest probabilities of all models across classes (instead of same class as in max_ensemble) and return the highest probability and the class
uncertainty_threshold: For classification: predict the class with the lowest uncertainty if lower than a threshold (default to 1.0, meaning no threshold), else calculate the mean of uncertainties for all models per class and predict the lowest.
uncertainty_weighted: For classification: weigh each class with the inverse of its uncertainty (1/uncertainty), normalize the weights per model, then multiply each class model probability with their normalized weights and use the maximum one to infer the label.
confidence_weighted: Weighted ensemble based on confidence (1-uncertainty), normalized for all samples per model. Like before, but use confidence (instead of inverse of uncertainty) as weights.

nkululeko, tutorial

Nkululeko: export acoustic features

30. May 2024 felix Leave a comment

With nkululeko since version 0.85.0 the acoustic features for the test and the train (aka dev) set are exported to the project store.

If you specify the store_format:

[FEATS]
store_format = csv

they will be exported to CSV (comma separated value) files, else PKL (readable by python pickle module).
I.e. you store should then after execution of any nkululeko module that computes features the two files:

feats_test.csv
feats_train.csv

If you specified scaling the features:

[FEATS]
scale = standard # or speaker

you will have two additional files with features:

feats_test_scaled.csv
feats_train_scaled..csv

In contrast to the other feature stores, these contain the exact features that are used for training or feature importance exploration, so they might be combined from different feature types and selected via the features value. An example:

[FEATS]
type = ['praat', 'os']
features = ['speechrate_nsyll_dur', 'F0semitoneFrom27.5Hz_sma3nz_amean']
scale = standard
store_format = csv

results in the following feats_test.csv:

file,start,end,speechrate_nsyll_dur,F0semitoneFrom27.5Hz_sma3nz_amean
./data/emodb/emodb/wav/11b03Wb.wav,0 days,0 days 00:00:05.213500,4.028004219813945,34.42206
./data/emodb/emodb/wav/16b10Td.wav,0 days,0 days 00:00:03.934187500,3.0501850763340586,31.227554

....

nkululeko, tutorial

How to use train, dev and test splits with Nkululeko

26. April 2024 felix Leave a comment

Usually in machine learning, you train your predictor on a train set, tune meta-parameters on a dev (development or validation set ) and evaluate on a test set.
With nkululeko, there currently the test set is not, as there are only two sets that can be specified: train and evaluation set.
A work-around is to use the test module to evaluate your best model on a hold out test set at the end of your experiments.
All you need to do is to specify the name of the test data in your [DATA] section, like so (let's call it myconf.ini):

[EXP]
save = True
....
[DATA]
databases =  ['my_train-dev_data']
... 
tests = ['my_test_data']
my_test_data = ./data/my_test_data/
my_test_data.split_strategy = test
...

you can run the experiment module with your config:

python -m nkululeko.nkululeko --config myconf.ini

and then, after optimization (of predictors, features sets and meta-parameters), use the test module

python -m nkululeko.test --config myconf.ini

The results will appear at the same place as all other results, but the files are named with test and the test database as a suffix.

If you need to compare several predictors and feature sets, you can use the nkuluflag module
All you need to do, is, in your main script, if you call the nkuluflag module, pass a parameter (named --mod) to tell it to use the test module:

cmd = 'python -m nkululeko.nkuluflag --config myconf.ini  --mod test '

nkululeko, tutorial

Nkululeko: how to bin/discretize your feature values

15. February 2024 felix Leave a comment

With nkululeko since version 0.77.8 you have the possibility to convert all feature values into the discreet classes low, mid and high

Simply state

[FEATS]
type = ['praat']
scale = bins
store_format = csv

in your config to use Praat features.
With the store format stated as csv you will be able to look at the train and test features in the store folder.

The binning will be done based on the 33 and 66 percent of the training feature values.

Allgemein, nkululeko, tutorial

Nkululeko: compare several databases

2. January 2024 felix Leave a comment

With nkululeko since version 0.77.7 there is a new interface named multidb which lets you compare several databases.

You can state their names in the [EXP] section and they will then be processed one after each other and against each other, the results are stored in a file called heatmap.png in the experiment folder.

!Mind YOU NEED TO OMIT THE PROJECT NAME!

Here is an example for such an ini.file:

[EXP]
root = ./experiments/emodbs/
#  DON'T give it a name, 
# this will be the combination 
# of the two databases: 
# traindb_vs_testdb
epochs = 1
databases = ['emodb', 'polish']
[DATA]
root_folders = ./experiments/emodbs/data_roots.ini
target = emotion
labels = ['neutral', 'happy', 'sad', 'angry']
[FEATS]
type = ['os']
[MODEL]
type = xgb

you can (but don't have to), state the specific dataset values in an external file like above.
data_roots.ini:

[DATA]
emodb = ./data/emodb/emodb
emodb.split_strategy = specified
emodb.test_tables = ['emotion.categories.test.gold_standard']
emodb.train_tables = ['emotion.categories.train.gold_standard']
emodb.mapping = {'anger':'angry', 'happiness':'happy', 'sadness':'sad', 'neutral':'neutral'}
polish = ./data/polish_emo
polish.mapping = {'anger':'angry', 'joy':'happy', 'sadness':'sad', 'neutral':'neutral'}
polish.split_strategy = speaker_split
polish.test_size = 30

Withe respect to the mapping, you can also specify super categories, by giving a list as a source category. Here's an example:

emodb.mapping = {'anger, sadness':'negative', 'happiness': 'positive'}
labels = ['negative', 'positive']

Call it with:

python -m nkululeko.multidb --config my_conf.ini

The default behavior is that all databases are used as a whole when being test or training database. If you would rather like the splits to be used, you can add a flag for this:

[EXP]
use_splits = True

Here's a result with two databases:

and this is the same experiment, but with augmentations:

In order to add augmentation, simply add an [AUGMENT] section:

[EXP]
root = ./experiments/emodbs/augmented/
epochs = 1
databases = ['emodb', 'polish']
[DATA]
--
[AUGMENT]
augment = ['traditional', 'random_splice']
[FEATS]
...

In order to add an additional training database to all experiments, you can use:

[CROSSDB]
train_extra = [meta, emodb]

, to add two databases to all training data sets,
where meta and emodb should then be declared in the root_folders file

nkululeko, tutorial

Nkululeko: generate a latex/pdf report

26. September 2023 felix Leave a comment

With nkululeko since version 0.66.3, a report document formatted in Latex and compiled as a PDF file can automatically be generated, basically as a compilation of the images that are generated.
There is a dedicated REPORT section in the config file for this, here is an example:

[REPORT]
# should the report be shown in the terminal at the end?
show = False 
# should a latex/pdf file be printed? if so, state the filename
latex = emodb_report
# name of the experiment author (default "anon")
author = Felix
# title of the report (default "report")
title = EmoDB

NOTE:
with each run of a nkululeko module in the same experiment environment, the details of the report will be added.
So a typical use would be, to first run the general module and than more specialized ones:

# first run a segmentation 
python -m nkululeko.segment --config myconf.ini 
# then rename the data-file in the config.ini and
# run some data exploration
python -m nkululeko.explore --config myconf.ini 
# then run a machine learning experiment
python -m nkululeko.nkululeko --config myconf.ini

Each run will add some contents to the report

speechsurfer

Category Archives: nkululeko

Nkululeko: how to predict topics for your texts

Nkululeko: how to compare classifiers, features and databases using multiple runs

Nkululeko tutorial: voice of wellness workshop

Context

Prepare the Database

Explore

Transcribe and translate

Segment

Filter the data

Define splits

Add additional training data

Nkululeko: how to align databases

Nkululeko: ensemble learners with late fusion

Nkululeko: export acoustic features

How to use train, dev and test splits with Nkululeko

Nkululeko: how to bin/discretize your feature values

Nkululeko: compare several databases

Nkululeko: generate a latex/pdf report

blog around speech technology