Nkululeko: perform cross database experiments

This is one of a series of posts about how to use nkululeko.
If you're unfamilar with nkululelo, you might want to start here.

This post is about cross database experiments, i.e. training a classifier on one database and test it on another, something that happens quite often with real life situations.

In this post I will only talk about the config file, the python file can be re-used.

I'll walk you through the sections of the config file (all options here):
The first section deals with general setup:

[EXP]
# root is the base directory for the experiment relative to the python call
root = ./experiment_1/
# mainly a name for the top folder to store results (inside root)
name = cross_data

Next, the DATA section might look like this

# declare which databases to use
databases = ['emodb', 'polish']
# specify the location of the data
emodb = <path to the database>
polish = <path to the database>
# we split one database as specified
emodb.split_strategy = specified
# here is the test, the training set is disregarded
emodb.test.tables = ['emotion.categories.test.gold_standard']
# the whole of the polish database is used for train
polish.split_strategy = train
# the target label for the experiment
target = emotion
# we need to unify the labels of both databases
emodb.mapping = {'anger':'angry', 'happiness':'happy', 'sadness':'sad', 'neutral':'neutral'}
polish.mapping = {'anger':'angry', 'joy':'happy', 'sadness':'sad', 'neutral':'neutral'}
# and these are the labels we want to distinguidh
labels = ['angry', 'happy', 'neutral', 'sad']

The features section, better explained in this post

[FEATS]
type = os

The classifiers section, better explained in this post

[MODEL]
type = xgb

Again, you might want to plot the final distribution of categories per train and test set:

[PLOT]
value_counts = True

speechsurfer

Nkululeko: perform cross database experiments

Leave a Reply Cancel reply

blog around speech technology