Nkululeko: How to import a database

Nkululeko is a tool to ease machine learning on speech databases.
This tutorial should help you to import databases.
There are two formats upported:
1) csv (comma seperated values)
2) audformat

CSV format

The easiest is CSV, you simply create a table with the following informations:

  • file: the path to the audio file
  • task: is the speaker characteristics value that you want to explore, e.g. age or emotion, or both

and then fill it with values of your database. Optionally, your data can contain any amount of additional information in further columns. Some naming conventions are pre-defined:

  • speaker: speaker id, a string being unique for samples from one speaker
  • gender: biological sex
  • age: an integer between 0 and 100 denoting the age in years.

So a file for emotion might look like this

file, speaker, gender, emotion
<path to>/s12343.wav, s1, female, happy
...

You can then specify the data in your initialization file like this:

[DATA]
databases = ['my_db']
my_db.type = csv
my_db = <path to>/my_data_file.csv
my_db.absolute_path = False 
...
target = emotion

You should set the flag absolute_path depending on whether

  • the file paths start from the location of where you run Nkululeko (or start from root: /), then True
  • or they start from the location where the data resides, then False

(if in doubt, just try it out: there should be an error message that the audio files don't exist)

You can not specify split tables with this format, but would have to simply split the file in several databases.

There is an example on how to import the ravdess database here.

And this would be an example ini file to use it:

[EXP]
root = ./tests/results/
name = exp_ravdess
runs = 1
epochs = 1
save = True
[DATA]
databases = ['train', 'test', 'dev']
train = ../nkululeko/data/ravdess/ravdess_train.csv
train.type = csv
train.absolute_path = False
train.split_strategy = train
dev = ../nkululeko/data/ravdess/ravdess_dev.csv
dev.type = csv
dev.absolute_path = False
dev.split_strategy = train
test = ../nkululeko/data/ravdess/ravdess_test.csv
test.type = csv
test.absolute_path = False
test.split_strategy = test
target = emotion
labels = ['angry', 'happy', 'neutral', 'sad']
[FEATS]
type = ['os']
scale = standard
[MODEL]
type = xgb

I.e. the splits train and dev get concatenated to a common train set

Fun fact: the result is:

audformat

audformat allows for many usecases, so the specification might be more complex.
So in the easiest case you have a database with two tables, one called files that contains the speaker informations (id and sex) and one called like your task (aka target), so for example age or emotion.
That's the case for our demo example, the Berlin EmoDB, ando so you can include it simply with.

[DATA]
databases = ['emodb']
emodb = /<path to>/emodb/
target = emotion
...

But if there are more tables and they have special names, you can specifiy them like this:

[DATA]
databases = ['msp']
# path to data
msp = /<path to>/msppodcast/
# tables with speaker information
msp.files_tables =  ['files.test-1', 'files.train']
# tables with task labels
msp.target_tables =  ['emotion.test-1', 'emotion.train']
# train and evaluation splits will be provided
msp.split_strategy = specified
# here are the test/evaluatoin split tables
msp.test_tables = ['emotion.test-1']
# here are the training tables
msp.train_tables = ['emotion.train']
target = emotion

Nkululeko: classifying continuous variables

Nkululeko supports classification and regression.
Classification means predicting a class (or category) from data, regression predicting a continuous value, as for example the speaker age in years.

If you want to use classification with continuous variables, you need to first bin it, which means that you put the values into pre-defined bins. To stay with our age example, you'd declare everyone above 50 years as old and all other as young.

This post shows you how to do that with Nkululeko by setting up your .ini file.

You set up the experiment as classification type:

[EXP]
...
type = classification

But declare the data to be continuous:

[DATA]
...
type = continuous
labels = ['u40', '40ies', '50ies', '60ies', 'ü70']
bins  = [-1000,  40, 50, 60, 70, 1000]

Then the data will be binned according to the sepecified bins and labeled accordingly.
You need (number of labels) + 1 values for the bins, as they are given lower and upper limit. It makes sense to set the lower and upper absolute limits extreme as you don't know what the classifier will predict.

How to soft-label a database with Nkululeko

Soft-labeling means to annotate data with labels that were predicited by a machine classifier.
As they were not evaluated by a human, you might call them "soft".

Two steps are necessary:
1) save a test/evaluation set as a new database
2) load this new database in a new experiment as training data

Within nkululeko, you would do it like this:

step 1: save a new database

You simply specifify a name in the EXP section to save the test predictions like this:

[EXP]
...
save_test = ./my_test_predictions.csv

You need Model save to be turned on, because it will look for the best performing model:

[MODEL]
...
save = True

This will store the new database to a file called my_test_predictions.csv in the folder where the python was called.

step 2: load as training data

Here is an example configuration how to load this data as additional training (in this case in addition to emodb):

[EXP]
root = ./tests/
name = exp_add_learned
runs = 1
epochs = 1
[DATA]
strategy = cross_data
databases = ['emodb', 'learned', 'test_db']
trains = ['emodb', 'learned']
tests = ['test_db']
emodb = /path-to-emodb/
emodb.split_strategy = speaker_split
emodb.mapping = {'anger':'angry', 'happiness':'happy', 'sadness':'sad', 'neutral':'neutral'}
test_db = /path to test database/
test_db.mapping = <if any mapping to the target categories is needed>
learned = ./my_test_predictions.csv
learned.type = csv
target = emotion
labels = ['angry', 'happy', 'neutral', 'sad']
[FEATS]
type = os
[MODEL]
type = xgb
save = True
[PLOT]

Nkululeko: try out / demo a trained model

This is another Nkululeko post that shows you how to demo model that you trained before.

First you need to train a model, e.g. on emodb as shown here
In the ini file you MUST set the parameters

[EXP]
...
save = True

in the general section and

[MODEL]
...
save = True

in the MODEL section of the configuration file.
Background: we need both the experiment as well as all model files to be saved on disk.

Here is an example script then how to call the demo mode:

python -m nkululeko.demo --config exp_emodb.ini

, if your config file is called exp_emob.ini
Automatically the best performing model will be used.
This will start recording for 3 seconds from your microphone.
if you specify

python -m nkululeko.demo --config exp_emodb.ini --file test.wav

the file test.wav will be predicted, it needs to be in 16 kHz sampling rate and mono channel.
If you specify

python -m nkululeko.demo --config exp_emodb.ini --list my_list.txt --folder data/ravdess/ --outfile my_results.csv

The file my_list.txt will be read and it is expected to contain one file to be predicted per line, e.g.:

tests/a.wav
tests/b.wav

The optional argument --folder can be used to specify a parent folder for the input files.

The optional argument --outfile can be used to save the results in a CSV table.

If --list my_file.csv is in CSV format, audformat will be interpreted, but only the index will be used.

E.g. with

file,emotion
tests/a.wav,happy
tests/b.wav,angry

only the file column will be used.

How to set up wav2vec embedding for nkululeko

Since version 0.10, nkululeko supports facebook's wav2vec 2.0 embeddings as acoustic features.
This post shows you how to set this up.

set up nkululeko

in your nkululeko configuration (*.ini) file, set the feature extractor as wav2vec2 and denote the path to the model like this:

[FEATS]
type = ['wav2vec2']
wav2vec.model = /my path to the huggingface model/

Alternatively you can state the huggingface model name directly:

[FEATS]
type = ['wav2vec2-base-960h']

Out of the box, as embeddings the last hidden layer is used. But the original wav2vec2 model consists of 7 CNN layers followed by up to 24 transformer layers. if you like to use an earlier layer than the last one, you can simply count down-

[FEATS]
type = wav2vec2
wav2vec.layer = 12

This would use the 12th layer of a 24 layer model and only the4 CNN layers of a 12 layer model.

Nkululeko: perform cross database experiments

This is one of a series of posts about how to use nkululeko.
If you're unfamilar with nkululelo, you might want to start here.

This post is about cross database experiments, i.e. training a classifier on one database and test it on another, something that happens quite often with real life situations.

In this post I will only talk about the config file, the python file can be re-used.

I'll walk you through the sections of the config file (all options here):
The first section deals with general setup:

[EXP]
# root is the base directory for the experiment relative to the python call
root = ./experiment_1/
# mainly a name for the top folder to store results (inside root)
name = cross_data

Next, the DATA section might look like this

# declare which databases to use
databases = ['emodb', 'polish']
# specify the location of the data
emodb = <path to the database>
polish = <path to the database>
# we split one database as specified
emodb.split_strategy = specified
# here is the test, the training set is disregarded
emodb.test.tables = ['emotion.categories.test.gold_standard']
# the whole of the polish database is used for train
polish.split_strategy = train
# the target label for the experiment
target = emotion
# we need to unify the labels of both databases
emodb.mapping = {'anger':'angry', 'happiness':'happy', 'sadness':'sad', 'neutral':'neutral'}
polish.mapping = {'anger':'angry', 'joy':'happy', 'sadness':'sad', 'neutral':'neutral'}
# and these are the labels we want to distinguidh
labels = ['angry', 'happy', 'neutral', 'sad']

The features section, better explained in this post

[FEATS]
type = os

The classifiers section, better explained in this post

[MODEL]
type = xgb

Again, you might want to plot the final distribution of categories per train and test set:

[PLOT]
value_counts = True

Nkululeko: comparing classifiers and features

This is one of a series of posts about how to use nkululeko.

Although Nkululeko is meant as a programming library, many experiments can be done simply by adapting the configuration file of the experiment. If you're unfamilar with nkululelo, you might want to start here.

This post is about maschine classification (as opposed to regression problems) and an introduction how to combine different features sets with different classifiers.

In this post I will only talk about the config file, the python file can be re-used.

I'll walk you through the sections of the config file (all options here):
The first section deals with general setup:

[EXP]
# root is the base directory for the experiment relative to the python call
root = ./experiment_1/
# mainly a name for the top folder to store results (inside root)
name = exp_A
# needed only for neural net classifiers
#epochs = 100
# needed only for classifiers with random initialization
# runs = 3 

The DATA section deals with the data sets:

[DATA]
# list all the databases  you will be using
databases = ['emodb']
# state the path to the audformat root folder
emodb = /home/felix/data/audb/emodb
# split train and test based on different random speakers
emodb.split_strategy = speaker_split
# state the percentage of test speakers (in this case 4 speakers, as emodb only has 10 speakers)
emodb.testsplit = 40
# for a subsequent run you might want to skip the speaker selection as it requires to extract features for each run
# emodb.split_strategy = reuse # uncomment the other strategy then
# the target label that should be classified
target = emotion
# the categories for this label
labels = ['anger', 'boredom', 'disgust', 'fear', 'happiness', 'neutral', 'sadness']

The next secton deals with the features that should be used by the classifier.

[FEATS]
# the type of features to use
type = ['os']

The following altenatives are currently implemented (only os and trill are opensource):

  • type = os # opensmile features
  • type = mld # mid level descriptors, to be published
  • type = trill # TRILL features requires keras to be installed
  • type = spectra # log mel spectra, for convolutional ANNs

Next comes the MODEL section which deals with the classifier:

[MODEL]
# the main thing to sepecify is the kind of classifier:
type = xgb

Choices are:

  • type = xgb # XG-boost algorithm, based on classification trees
  • type = svm # Support Vector Machines, a classifier based on decision planes
  • type = mlp # Multi-Layer-Perceptron, needs a layer-layout to be specified, e.g. layers = {'l1':64}

And finally, the PLOT section specifies possible additional visualizations (a confusion matrix is always plotted)

[PLOT]
tsne = True

A t-SNE plot can be useful to estimate if the selected features seperate the categories at all.

Setting up a base nkululeko experiment

This is one of a series of posts on how to use nkululeko and deals with setting up the "hello world" of nkululeko: performing classification on the berlin emodb emotional datbase.

Typically nkululeko experiments are defined by two files:

  • a python file that is called by the interpreter
  • an initialization file that is interpreted by the nkululeko framework

First we'll take a look at the python file:

# my_experiment.py
# Demonstration code to use the Nkululeko framework

import sys
sys.path.append("TO BE ADAPTED/nkululeko/src")
import configparser # to read the ini file
import experiment as exp # central nkululeko class
from util import Util # mainly for logging

def main(config_file):
    # load one configuration per experiment
    config = configparser.ConfigParser()
    config.read(config_file) # read in the ini file, the experiment is defined there
    util = Util() # init the logging and global stuff

    # create a new experiment
    expr = exp.Experiment(config)
    util.debug(f'running {expr.name}')

    # load the data sets (specified in ini file)
    expr.load_datasets()

    # split into train and test sets
    expr.fill_train_and_tests()
    util.debug(f'train shape : {expr.df_train.shape}, test shape:{expr.df_test.shape}')

    # extract features
    expr.extract_feats()
    util.debug(f'train feats shape : {expr.feats_train.df.shape}, test feats shape:{expr.feats_test.df.shape}')

# initialize a run manager and run the experiment
    expr.init_runmanager()
    expr.run()
    print('DONE')

if __name__ == "__main__":
    main('PATH TO INI FILE/exp_emodb.ini') 
    # main(sys.argv[1]) # alternatively read it from command line

and this would be a minimal nkululeko configuration file (tested with version 0.8)

[EXP]
root = ./emodb/
name = exp_emodb
[DATA]
databases = ['emodb']
emodb = TO BE ADAPTED/emodb
emodb.split_strategy = speaker_split
emodb.testsplit = 40
target = emotion
labels = ['anger', 'boredom', 'disgust', 'fear', 'happiness', 'neutral', 'sadness']
[FEATS]
type = os
[MODEL]
type = svm

I hope the names of the entries are self-explanatory, here's the link to the config file description

Nkululeko: meta parameter optimization

With linear classifiers that are derived from sklearn, you can simply state your variants for a meta parameter in the ini file:

[MODEL]
type = svm
tuning_params = ['C']
scoring = recall_macro
C = [10, 1, 0.1, 0.01, 0.001, 0.0001]

This will iterate the C parameter of the SVM classifier by the stated values and choose the best performing model.
You can have several "tuning_params" and them a grid search (combining everything with each other) will be performed.

Here's an example for XGB classifier:

[MODEL]
type = xgb
tuning_params = ['subsample', 'n_estimators', 'max_depth']
subsample = [.5, .7]
n_estimators = [50, 80, 200]
max_depth = [1, 6]

Here's one idea how to find the optimal values for 2 layers of an MLP net with nkululeko:

  • store your meta-parameters in arrays
  • loop over them and initialize an experiment each time
  • keep the experiment name but change your parameters and the plot name
  • this way you can re-use your extracted features and do not get your harddisk cluttered.

Here's some python code to illustrate this idea:

def main(config_file):
    # load one configuration per experiment
    config = configparser.ConfigParser()
    config.read(config_file)
    util = Util()
    l1s = [32, 64, 128]
    l2s = [16, 32, 64]
    for l1 in l1s:
        for l2 in l2s:
            # create a new experiment
            expr = exp.Experiment(config)

            plotname = f'{util.get_exp_name()}_{l1}_{l2}'
            util.set_config_val('PLOT', 'name', plotname)

            print(f'running {expr.name} with layers {l1} and {l2}')

            layers = {'l1':l1, 'l2':l2}
            util.set_config_val('MODEL', 'layers', layers)

            # load the data
            expr.load_datasets()

            # split into train and test
            expr.fill_train_and_tests()
            util.debug(f'train shape : {expr.df_train.shape}, test shape:{expr.df_test.shape}')

            # extract features
            expr.extract_feats()
            util.debug(f'train feats shape : {expr.feats_train.df.shape}, test feats shape:{expr.feats_test.df.shape}')

            # initialize a run manager
            expr.init_runmanager()

            # run the experiment
            expr.run()

    print('DONE')

Keep in mind though that meta parameter optimization like done here is in itself a learning problem. It is usually not feasible to systematically try out all combinations of possible values and thus some kind of stochastic approach is preferable.