Category Archives: nkululeko

Setting up a base nkululeko experiment

This is one of a series of posts on how to use nkululeko and deals with setting up the "hello world" of nkululeko: performing classification on the berlin emodb emotional datbase.

Typically nkululeko experiments are defined by two files:

  • a python file that is called by the interpreter
  • an initialization file that is interpreted by the nkululeko framework

First we'll take a look at the python file:

# my_experiment.py
# Demonstration code to use the Nkululeko framework

import sys
sys.path.append("TO BE ADAPTED/nkululeko/src")
import configparser # to read the ini file
import experiment as exp # central nkululeko class
from util import Util # mainly for logging

def main(config_file):
    # load one configuration per experiment
    config = configparser.ConfigParser()
    config.read(config_file) # read in the ini file, the experiment is defined there
    util = Util() # init the logging and global stuff

    # create a new experiment
    expr = exp.Experiment(config)
    util.debug(f'running {expr.name}')

    # load the data sets (specified in ini file)
    expr.load_datasets()

    # split into train and test sets
    expr.fill_train_and_tests()
    util.debug(f'train shape : {expr.df_train.shape}, test shape:{expr.df_test.shape}')

    # extract features
    expr.extract_feats()
    util.debug(f'train feats shape : {expr.feats_train.df.shape}, test feats shape:{expr.feats_test.df.shape}')

# initialize a run manager and run the experiment
    expr.init_runmanager()
    expr.run()
    print('DONE')

if __name__ == "__main__":
    main('PATH TO INI FILE/exp_emodb.ini') 
    # main(sys.argv[1]) # alternatively read it from command line

and this would be a minimal nkululeko configuration file (tested with version 0.8)

[EXP]
root = ./emodb/
name = exp_emodb
[DATA]
databases = ['emodb']
emodb = TO BE ADAPTED/emodb
emodb.split_strategy = speaker_split
emodb.testsplit = 40
target = emotion
labels = ['anger', 'boredom', 'disgust', 'fear', 'happiness', 'neutral', 'sadness']
[FEATS]
type = os
[MODEL]
type = svm

I hope the names of the entries are self-explanatory, here's the link to the config file description

Nkululeko: meta parameter optimization

With linear classifiers that are derived from sklearn, you can simply state your variants for a meta parameter in the ini file:

[MODEL]
type = svm
tuning_params = ['C']
scoring = recall_macro
C = [10, 1, 0.1, 0.01, 0.001, 0.0001]

This will iterate the C parameter of the SVM classifier by the stated values and choose the best performing model.
You can have several "tuning_params" and them a grid search (combining everything with each other) will be performed.

Here's an example for XGB classifier:

[MODEL]
type = xgb
tuning_params = ['subsample', 'n_estimators', 'max_depth']
subsample = [.5, .7]
n_estimators = [50, 80, 200]
max_depth = [1, 6]

Here's one idea how to find the optimal values for 2 layers of an MLP net with nkululeko:

  • store your meta-parameters in arrays
  • loop over them and initialize an experiment each time
  • keep the experiment name but change your parameters and the plot name
  • this way you can re-use your extracted features and do not get your harddisk cluttered.

Here's some python code to illustrate this idea:

def main(config_file):
    # load one configuration per experiment
    config = configparser.ConfigParser()
    config.read(config_file)
    util = Util()
    l1s = [32, 64, 128]
    l2s = [16, 32, 64]
    for l1 in l1s:
        for l2 in l2s:
            # create a new experiment
            expr = exp.Experiment(config)

            plotname = f'{util.get_exp_name()}_{l1}_{l2}'
            util.set_config_val('PLOT', 'name', plotname)

            print(f'running {expr.name} with layers {l1} and {l2}')

            layers = {'l1':l1, 'l2':l2}
            util.set_config_val('MODEL', 'layers', layers)

            # load the data
            expr.load_datasets()

            # split into train and test
            expr.fill_train_and_tests()
            util.debug(f'train shape : {expr.df_train.shape}, test shape:{expr.df_test.shape}')

            # extract features
            expr.extract_feats()
            util.debug(f'train feats shape : {expr.feats_train.df.shape}, test feats shape:{expr.feats_test.df.shape}')

            # initialize a run manager
            expr.init_runmanager()

            # run the experiment
            expr.run()

    print('DONE')

Keep in mind though that meta parameter optimization like done here is in itself a learning problem. It is usually not feasible to systematically try out all combinations of possible values and thus some kind of stochastic approach is preferable.