Category Archives: tutorial

How to test a trained model on a new test set with Nkululeko

Sometimes you might want to test your already trained model(s) on a new dataset, e.g. because the training took a lot of resources.
If you stored your models during the training this is possible.

databases = ['emodb']
save = True

In a new config file for your experiment that uses a dufferent test set, you set

databases = ['emodb', 'polish']
trains = ['emodb']
tests = ['polish']
strategy = cross_data....
only_test = True

In the example above, emodb has been used as the training database, and polish in a second experiment later as a test database.

How to compare several MLP layer layouts with each other

Some days ago I showed how you can run several experiments in one go.
Obviously this can be used to compare several ANN layer architectures as an alternative to the approach discussed in this (much earlier) post

There is an example configuration shipped with Nkululeko, and you simply can specify your layer specifications per experiment like this:

classifiers = [
    {'--model': 'mlp',
    '--layers': '\"{\'l1\':16,\'l2\':4}\"'},
    {'--model': 'mlp',
    '--layers': '\"{\'l1\':64,\'l2\':16}\"'},
    {'--model': 'mlp',
    '--layers': '\"{\'l1\':128,\'l2\':32}\"',
    '--learning_rate': '.0001',
    '--drop': '.3',},
    {'--model': 'xgb',
    {'--model': 'svm',

i.e in this example three MLP classifiers are specified with architectures:

  • (hidden) layer 1 with 16 neurons, and (hidden) layer 2 with 4 neurons
  • one layer with 64 and one with 16 neurons
  • and a third one with
    • one layer with 128 and a second one with 32 neurons,
    • learning rate of .0001 and
    • dropout probability of 30%

and, for comparison:

  • a XGB classifier
  • and a SVM classifier

both only need to be trained one epoch because there are no weights to be adapted.
The MLP classifiers are trained with the epoch number that is specified in the sceleton config file

How to run multiple experiments in one go with Nkululeko

Sometimes you will want to run several experiments without the need to manually start them one after the other, e.g. if you want to run them over night.
This post shows you one way how to do this.
The necessary Python files are part of the Nkululeko distribution.

You need three files:

The value parser

First i created a Python file that accepts nkululeko ini file values as targets, called

# imports
import sys
import constants
import numpy as np
import experiment as exp
import configparser
from util import Util
import argparse
import os.path

def main():

# use the argparse package to parse arguments:
    parser = argparse.ArgumentParser(description='Call the nkululeko framework.')
    parser.add_argument('--data', help='The databases', nargs='*', \
    parser.add_argument('--label', nargs='*', help='The labels for the target', \
    parser.add_argument('--tuning_params', nargs='*', help='parameters to be tuned', \
    parser.add_argument('--model', default='xgb', help='The model type', required=True)
    parser.add_argument('--feat', default='os', help='The model type')
    parser.add_argument('--set', help='The opensmile set')
    parser.add_argument('--with_os', help='To add os features')
    parser.add_argument('--target', help='The target designation')

    args = parser.parse_args()

# Use a prepared config file with values that are stable across experiments:
    config_file = './exp.ini'
    util = Util()
    # test if config is there
    if not os.path.isfile(config_file):
        util.error(f'no such file {config_file}')

    config = configparser.ConfigParser()

# fill the config file
    if is not None:
        databases = []
        for t in
        print(f'got databases: {databases}')
        config['DATA']['databases'] = str(databases)
    if args.label is not None:
        labels = []
        for l in args.label:
        print(f'got labels: {labels}')
        config['DATA']['labels'] = str(labels)
    if args.tuning_params is not None:
        tuning_params = []
        for tp in args.tuning_params:
        config['MODEL']['tuning_params'] = str(tuning_params)
    if is not None:
        config['DATA']['target'] =
    if args.model is not None:
        config['MODEL']['type'] = args.model
    if args.feat is not None:
        config['FEATS']['type'] = args.feat
    if args.with_os is not None:
        config['FEATS']['with_os'] = args.with_os
    if args.set is not None:
        config['FEATS']['set'] = args.set
    name = config['EXP']['name']
    util = Util()
    util.debug(f'running {name}, Nkululeko version {constants.VERSION}')

# Now run the experiment
    # init the experiment
    expr = exp.Experiment(config)
    # load the data
    # split into train and test
    # extract features
    # initialize a run manager
    # run the experiment
    reports =
    result = reports[-1].result.test
    # report result
    util.debug(f'result for {expr.get_name()} is {result}')

if __name__ == "__main__":

The configuration file

A Nkululeko config file with the constant values for all experiments (to be adapted to your needs and pathes)

root = ./
name = exp
runs = 1
epochs = 1
root_folders = ../data_roots.ini
databases = ['mydata']
target = mytarget
labels = ['label1', 'label2']
wav2vec.model = xxx/wav2vec2-large-robust-ft-swbd-300h
xbow.model = xxx/openXBOW/
trill.model = xxx/trill_model
mld.model = xxx/mld/src
scale = standard
C_val = .001
loso = True

The script to specify and run all experiments

Lastly, you need a script to start and specify the experiments, here's an example that combines tweo classifiers and eight feature sets:

import os

classifiers = [
    {'--model': 'xgb'},
    {'--model': 'svm'},

features = [
    {'--feat': 'os'},
    {'--feat': 'os', 
    '--set': 'ComParE_2016',
    {'--feat': 'mld'},
    {'--feat': 'mld',
    '--with_os': 'True',
    {'--feat': 'xbow'},
    {'--feat': 'xbow',
    '--with_os': 'True',
    {'--feat': 'trill'},
    {'--feat': 'wav2vec'},

for c in classifiers:
    for f in features:
        cmd = f'python '
        for item in c:
            cmd += f'{item} {c[item]} '
        for item in f:
            cmd += f'{item} {f[item]} '

How to do cross validation with Nkululeko

Only for linear classifiers like XGB, SVM, SGR and SVR you have the possibility to disregard training and development splits and do a cross validation, i.e. validate one data set in a circular manner against itself.

The basic idea is that you take part of the data and evaluate against the rest, and in the next round take another part and so forth, until all data has been evaluated. Because the speaker identity is so strong in speech, this is done usually in a speaker exclusive manner, known under the term "leave one speaker out " (LOSO).

If you have too many speakers and/or each speaker really only one sample, you might want to split your speakers into groups and do a "leave one speaker group out" strategy (LOGO).

A related approach is known under the name k fold cross validation, where k usually equals 10.
When you only have one sample per speaker, this might make more sense.
So, how would you do that with Nkululeko?
First, you would define a training and development split for your data anyway, because Nkululeko is expecting it if there is only one database. You might set that to random, it's not used anyway:

mydata.split_strategy = random 

Then in your config file, you specify in the MODEL section either:

loso = True 

which means leave one speaker out strategy will be used to evaluate the experiment. Obviously the speaker must be known then in the data.
Or you do

logo = 10 

to assign 10 groups to your speakers and then evaluate each group against all others.

Or you do

k_fold_cross = 5 

for instance to disregard speaker information and simply evaluate 5 times a fifth of the data against the rest.
We use stratified sets, i.e. the algorithm tries to balance the class data within each set.

Import speech data to nkululeko

Often you simply start an experiment with some audio data that you got from somewhere in no special format. Often the labels are encoded in the filenames.
If so, this Python script can help to convert the audio to a Nkululeko readable format and generate a CSV (comma separated values) file.

import os
from audeer import list_file_names
from os.path import basename

# folder with the original audio files (in wav format)
root = './orig_wav/'
# output folder, empty at the beginning
out_dir = './audio/'
# name of the output file list
out_file = 'data.csv'

# get a list of wav files
list = list_file_names(root, filetype = 'wav', basenames=True, recursive=True)
# write the list header (change to your data)
with open(out_file, 'a') as the_file:
# for each file
for file in list:
    # get the file name without path
    fn = basename(file)
    # convert to 16kHz sampling rate and mono channel 
    os.system(f'sox {root+file} -r 16000 -c 1 {out_dir+fn}')
    # extract the annotation label from the file name (change this to your needs)
    label = fn[0]
    # lastly: add file to list 
    with open(out_file, 'a') as the_file:

The resulting data list can then be read by Nkululeko in the config file (using randomly 30 % of the data as development set):

my_data = /some_path/data.csv
my_data.type = csv
my_data.split_strategy = random
my_data.testsplit = 30

How to limit a dataset with Nkululeko

In some cases you don't want to use the whole dataset for training or test, but filter it in some way. There are several possibilities demonstrated:
Some are valid per database:

databases = ['d1']
# force a specific feature to be present, e.g. gender labels ( when not all data has gender values)
d1.required = gender

# limit the number of samples per speaker
d1.max_samples_per_speaker = 20

# only use the first 10000 samples
d1.limit = 10000

Others are valid for the whole experiment, i.e. all databases

# specify a minimum duration for test samples (in seconds)
min_dur_test = 3.5

# use only samples where gender is female
sex = female

Specifying database disk location with Nkululeko

Since version 0.13.0 with Nkululeko you can define all root folders for your databases at one single place.
This is very handy if you work in paralell on several computers, e.g. a development and a deployment environment.

In the [DATA] section of your ini file, you specify the path to the local data root folder file like this:

root_folders = data_roots.ini
databases = ['dataset_1']

and then within the data_roots.ini file (you can actually call it what you want), you declare the folders to your databases like this:

dataset_1 = /mypath/d1/
dataset_1.files_tables = ['files']
dataset_2 = ./d2

you can add all your data set options that you need in this file:

emodb = /mypath/d1/
emodb.split_strategy = speaker_split
emodb.testsplit = 40
emodb.mapping = {'anger':'angry', 'happiness':'happy', 'sadness':'sad', 'fear':'fright.', 'neutral':'neutral'}
dataset_2 = ./d2
dataset_2.files_tables = ['files_test', 'files_train']

If you define those fields in your experiment ini file, it will have precedence.

Nkululeko: How to import a database

Nkululeko is a tool to ease machine learning on speech databases.
This tutorial should help you to import databases.
There are two formats upported:
1) csv (comma seperated values)
2) audformat

CSV format

The easiest is CSV, you simply create a table with the following informations:

  • file: the path to the audio file
  • speaker: a speaker identifier
  • sex: the biological sex (has quite an influence on the voice, so sometimes submodeling makes senss)
  • task: is the speaker characteristics value that you want to explore, e.g. age or emotion.

and then fill it with values of your database.
So a file for emotion might look like this

file, speaker, sex, emotion
<path to>/s12343.wav, s1, female, happy

You can then specify the data in your initialization file like this:

databases = ['my_db']
my_db.type = csv
my_db = <path to>/my_data_file.csv
target = emotion

You can not specify split tables with this format, but would have to simply split the file in several databases.


audformat allows for many usecases, so the specification might be more complex.
So in the easiest case you have a database with two tables, one called files that contains the speaker informations (id and sex) and one called like your task (aka target), so for example age or emotion.
That's the case for our demo example, the Berlin EmoDB, ando so you can include it simply with.

databases = ['emodb']
emodb = /<path to>/emodb/
target = emotion

But if there are more tables and they have special names, you can specifiy them like this:

databases = ['msp']
# path to data
msp = /<path to>/msppodcast/
# tables with speaker information
msp.files_tables =  ['files.test-1', 'files.train']
# tables with task labels
msp.target_tables =  ['emotion.test-1', 'emotion.train']
# train and evaluation splits will be provided
msp.split_strategy = specified
# here are the test/evaluatoin split tables
msp.test_tables = ['emotion.test-1']
# here are the training tables
msp.train_tables = ['emotion.train']
target = emotion