Nkululeko tutorial: voice of wellness workshop

Context

In Sep 2025, we did the Voice of wellness workshop.

In this post i try the nkululeko experiments i use for the tutorials there.

Prepare the Database

i use the Androids corpus, paper here

First thing you should probably do is check the data formats and re-sample if necessary.

[RESAMPLE]
# which of the data splits to re-sample: train, test or all (both)
sample_selection = all
replace = True
target = data_resampled.csv

Explore

Check the database distributions

python -m nkululeko.explore --config data/androids/exp.in

Transcribe and translate

transcribe Note! this should be done on a GPU

translate, no GPU required as it uses a Google service

Segment

Androids database samples are quite long sometimes.
It makes sense to check if approaches work better on shorter speech segments.

python -m nkululeko.segment --config data/androids/exp.ini

Filter the data

[DATA]
data.limit_samples_per_speaker = 8
data.filter = [['task', 'interview']]
check_size = 1000

Define splits

Either use pre-defined folds:

[MODEL]
logo=5

or, randomly define splits, but stratify them:

[DATA]
data.split_strategy = balanced
data.balance = {'depression':2, 'age':1, 'gender':1}
data.age_bins = 2

Add additional training data

More details here

[DATA]
databases = ['data', 'emodb']
data.split_strategy = speaker_split
# add German emotional data
emodb = ./data/emodb/emodb
# rename emotion to depression
emodb.colnames = {"emotion": "depression"}
# only use neutral and sad samples
emodb.filter = [["depression", ["neutral", "sadness"]]]
# map them to depression
emodb.mapping = {"neutral": "control", "sadness": "depressed"}
# and put everything to the training
emodb.split_strategy = train
target = depression
labels = ['depressed', 'control']