This is one of a series of posts about how to use nkululeko.
If you're unfamilar with nkululelo, you might want to start here.
This post is about cross database experiments, i.e. training a classifier on one database and test it on another, something that happens quite often with real life situations.
In this post I will only talk about the config file, the python file can be re-used.
I'll walk you through the sections of the config file (all options here):
The first section deals with general setup:
[EXP]
# root is the base directory for the experiment relative to the python call
root = ./experiment_1/
# mainly a name for the top folder to store results (inside root)
name = cross_data
Next, the DATA section might look like this
# declare which databases to use
databases = ['emodb', 'polish']
# specify the location of the data
emodb = <path to the database>
polish = <path to the database>
# we split one database as specified
emodb.split_strategy = specified
# here is the test, the training set is disregarded
emodb.test.tables = ['emotion.categories.test.gold_standard']
# the whole of the polish database is used for train
polish.split_strategy = train
# the target label for the experiment
target = emotion
# we need to unify the labels of both databases
emodb.mapping = {'anger':'angry', 'happiness':'happy', 'sadness':'sad', 'neutral':'neutral'}
polish.mapping = {'anger':'angry', 'joy':'happy', 'sadness':'sad', 'neutral':'neutral'}
# and these are the labels we want to distinguidh
labels = ['angry', 'happy', 'neutral', 'sad']
The features section, better explained in this post
[FEATS]
type = os
The classifiers section, better explained in this post
[MODEL]
type = xgb
Again, you might want to plot the final distribution of categories per train and test set:
[PLOT]
value_counts = True