Nkululeko: perform cross database experiments

This is one of a series of posts about how to use nkululeko.
If you're unfamilar with nkululelo, you might want to start here.

This post is about cross database experiments, i.e. training a classifier on one database and test it on another, something that happens quite often with real life situations.

In this post I will only talk about the config file, the python file can be re-used.

I'll walk you through the sections of the config file (all options here):
The first section deals with general setup:

[EXP]
# root is the base directory for the experiment relative to the python call
root = ./experiment_1/
# mainly a name for the top folder to store results (inside root)
name = cross_data

Next, the DATA section might look like this

# declare which databases to use
databases = ['emodb', 'polish']
# specify the location of the data
emodb = <path to the database>
polish = <path to the database>
# we split one database as specified
emodb.split_strategy = specified
# here is the test, the training set is disregarded
emodb.test.tables = ['emotion.categories.test.gold_standard']
# the whole of the polish database is used for train
polish.split_strategy = train
# the target label for the experiment
target = emotion
# we need to unify the labels of both databases
emodb.mapping = {'anger':'angry', 'happiness':'happy', 'sadness':'sad', 'neutral':'neutral'}
polish.mapping = {'anger':'angry', 'joy':'happy', 'sadness':'sad', 'neutral':'neutral'}
# and these are the labels we want to distinguidh
labels = ['angry', 'happy', 'neutral', 'sad']

The features section, better explained in this post

[FEATS]
type = os

The classifiers section, better explained in this post

[MODEL]
type = xgb

Again, you might want to plot the final distribution of categories per train and test set:

[PLOT]
value_counts = True

Nkululeko: comparing classifiers and features

This is one of a series of posts about how to use nkululeko.

Although Nkululeko is meant as a programming library, many experiments can be done simply by adapting the configuration file of the experiment. If you're unfamilar with nkululelo, you might want to start here.

This post is about maschine classification (as opposed to regression problems) and an introduction how to combine different features sets with different classifiers.

In this post I will only talk about the config file, the python file can be re-used.

I'll walk you through the sections of the config file (all options here):
The first section deals with general setup:

[EXP]
# root is the base directory for the experiment relative to the python call
root = ./experiment_1/
# mainly a name for the top folder to store results (inside root)
name = exp_A
# needed only for neural net classifiers
#epochs = 100
# needed only for classifiers with random initialization
# runs = 3 

The DATA section deals with the data sets:

[DATA]
# list all the databases  you will be using
databases = ['emodb']
# state the path to the audformat root folder
emodb = /home/felix/data/audb/emodb
# split train and test based on different random speakers
emodb.split_strategy = speaker_split
# state the percentage of test speakers (in this case 4 speakers, as emodb only has 10 speakers)
emodb.testsplit = 40
# for a subsequent run you might want to skip the speaker selection as it requires to extract features for each run
# emodb.split_strategy = reuse # uncomment the other strategy then
# the target label that should be classified
target = emotion
# the categories for this label
labels = ['anger', 'boredom', 'disgust', 'fear', 'happiness', 'neutral', 'sadness']

The next secton deals with the features that should be used by the classifier.

[FEATS]
# the type of features to use
type = ['os']

The following altenatives are currently implemented (only os and trill are opensource):

  • type = os # opensmile features
  • type = mld # mid level descriptors, to be published
  • type = trill # TRILL features requires keras to be installed
  • type = spectra # log mel spectra, for convolutional ANNs

Next comes the MODEL section which deals with the classifier:

[MODEL]
# the main thing to sepecify is the kind of classifier:
type = xgb

Choices are:

  • type = xgb # XG-boost algorithm, based on classification trees
  • type = svm # Support Vector Machines, a classifier based on decision planes
  • type = mlp # Multi-Layer-Perceptron, needs a layer-layout to be specified, e.g. layers = {'l1':64}

And finally, the PLOT section specifies possible additional visualizations (a confusion matrix is always plotted)

[PLOT]
tsne = True

A t-SNE plot can be useful to estimate if the selected features seperate the categories at all.

Setting up a base nkululeko experiment

This is one of a series of posts on how to use nkululeko and deals with setting up the "hello world" of nkululeko: performing classification on the berlin emodb emotional datbase.

Typically nkululeko experiments are defined by two files:

  • a python file that is called by the interpreter
  • an initialization file that is interpreted by the nkululeko framework

First we'll take a look at the python file:

# my_experiment.py
# Demonstration code to use the Nkululeko framework

import sys
sys.path.append("TO BE ADAPTED/nkululeko/src")
import configparser # to read the ini file
import experiment as exp # central nkululeko class
from util import Util # mainly for logging

def main(config_file):
    # load one configuration per experiment
    config = configparser.ConfigParser()
    config.read(config_file) # read in the ini file, the experiment is defined there
    util = Util() # init the logging and global stuff

    # create a new experiment
    expr = exp.Experiment(config)
    util.debug(f'running {expr.name}')

    # load the data sets (specified in ini file)
    expr.load_datasets()

    # split into train and test sets
    expr.fill_train_and_tests()
    util.debug(f'train shape : {expr.df_train.shape}, test shape:{expr.df_test.shape}')

    # extract features
    expr.extract_feats()
    util.debug(f'train feats shape : {expr.feats_train.df.shape}, test feats shape:{expr.feats_test.df.shape}')

# initialize a run manager and run the experiment
    expr.init_runmanager()
    expr.run()
    print('DONE')

if __name__ == "__main__":
    main('PATH TO INI FILE/exp_emodb.ini') 
    # main(sys.argv[1]) # alternatively read it from command line

and this would be a minimal nkululeko configuration file (tested with version 0.8)

[EXP]
root = ./emodb/
name = exp_emodb
[DATA]
databases = ['emodb']
emodb = TO BE ADAPTED/emodb
emodb.split_strategy = speaker_split
emodb.testsplit = 40
target = emotion
labels = ['anger', 'boredom', 'disgust', 'fear', 'happiness', 'neutral', 'sadness']
[FEATS]
type = os
[MODEL]
type = svm

I hope the names of the entries are self-explanatory, here's the link to the config file description