Nkululeko: perform cross database experiments

This is one of a series of posts about how to use nkululeko.
If you're unfamilar with nkululelo, you might want to start here.

This post is about cross database experiments, i.e. training a classifier on one database and test it on another, something that happens quite often with real life situations.

In this post I will only talk about the config file, the python file can be re-used.

I'll walk you through the sections of the config file (all options here):
The first section deals with general setup:

[EXP]
# root is the base directory for the experiment relative to the python call
root = ./experiment_1/
# mainly a name for the top folder to store results (inside root)
name = cross_data

Next, the DATA section is in this case more complex than usual:

[DATA]
# list all databases
databases = ['polish', 'emodb']
# strategy as opposed to train_test
strategy = cross_data
# state which databases to use for training
trains = ['emodb']
# state with databases to use as a test
tests = ['polish']
# what is the target label?
target = emotion
# what are the category names?
labels = ['neutral', 'happy', 'sad', 'angry', 'fright.']
# for each database:
# where is it?
polish = PATH/polish-emotional-speech
# map the databases categories to a common set 
polish.mapping = {'anger':'angry', 'joy':'happy', 'sadness':'sad', 'fear':'fright.', 'neutral':'neutral'}
# plot the distribution of categories
polish.value_counts = True
# and for the second database:
emodb = PATH/emodb
emodb.mapping = {'anger':'angry', 'happiness':'happy', 'sadness':'sad', 'fear':'fright.', 'neutral':'neutral'}
emodb.value_counts = True

The features section, better explained in this post

[FEATS]
type = os

The classifiers section, better explained in this post

[MODEL]
type = xgb

Again, you might want to plot the final distribution of categories per train and test set:

[PLOT]
value_counts = True

Nkululeko: comparing classifiers and features

This is one of a series of posts about how to use nkululeko.

Although Nkululeko is meant as a programming library, many experiments can be done simply by adapting the configuration file of the experiment. If you're unfamilar with nkululelo, you might want to start here.

This post is about maschine classification (as opposed to regression problems) and an introduction how to combine different features sets with different classifiers.

In this post I will only talk about the config file, the python file can be re-used.

I'll walk you through the sections of the config file (all options here):
The first section deals with general setup:

[EXP]
# root is the base directory for the experiment relative to the python call
root = ./experiment_1/
# mainly a name for the top folder to store results (inside root)
name = exp_A
# needed only for neural net classifiers
#epochs = 100
# needed only for classifiers with random initialization
# runs = 3 

The DATA section deals with the data sets:

[DATA]
# list all the databases  you will be using
databases = ['emodb']
# state the path to the audformat root folder
emodb = /home/felix/data/audb/emodb
# split train and test based on different random speakers
emodb.split_strategy = speaker_split
# state the percentage of test speakers (in this case 4 speakers, as emodb only has 10 speakers)
emodb.testsplit = 40
# for a subsequent run you might want to skip the speaker selection as it requires to extract features for each run
# emodb.split_strategy = reuse # uncomment the other strategy then
# the target label that should be classified
target = emotion
# the categories for this label
labels = ['anger', 'boredom', 'disgust', 'fear', 'happiness', 'neutral', 'sadness']

The next secton deals with the features that should be used by the classifier.

[FEATS]
# the type of features to use
type = os

The following altenatives are currently implemented (only os and trill are opensource):

  • type = os # opensmile features
  • type = mld # mid level descriptors, to be published
  • type = trill # TRILL features requires keras to be installed
  • type = spectra # log mel spectra, for convolutional ANNs

Next comes the MODEL section which deals with the classifier:

[MODEL]
# the main thing to sepecify is the kind of classifier:
type = xgb

Choices are:

  • type = xgb # XG-boost algorithm, based on classification trees
  • type = svm # Support Vector Machines, a classifier based on decision planes
  • type = mlp # Multi-Layer-Perceptron, needs a layer-layout to be specified, e.g. layers = {'l1':64}

And finally, the PLOT section specifies possible additional visualizations (a confusion matrix is always plotted)

[PLOT]
tsne = True

A t-SNE plot can be useful to estimate if the selected features seperate the categories at all.

Setting up a base nkululeko experiment

This is one of a series of posts on how to use nkululeko and deals with setting up the "hello world" of nkululeko: performing classification on the berlin emodb emotional datbase.

Typically nkululeko experiments are defined by two files:

  • a python file that is called by the interpreter
  • an initialization file that is interpreted by the nkululeko framework

First we'll take a look at the python file:

# my_experiment.py
# Demonstration code to use the Nkululeko framework

import sys
sys.path.append("TO BE ADAPTED/nkululeko/src")
import configparser # to read the ini file
import experiment as exp # central nkululeko class
from util import Util # mainly for logging

def main(config_file):
    # load one configuration per experiment
    config = configparser.ConfigParser()
    config.read(config_file) # read in the ini file, the experiment is defined there
    util = Util() # init the logging and global stuff

    # create a new experiment
    expr = exp.Experiment(config)
    util.debug(f'running {expr.name}')

    # load the data sets (specified in ini file)
    expr.load_datasets()

    # split into train and test sets
    expr.fill_train_and_tests()
    util.debug(f'train shape : {expr.df_train.shape}, test shape:{expr.df_test.shape}')

    # extract features
    expr.extract_feats()
    util.debug(f'train feats shape : {expr.feats_train.df.shape}, test feats shape:{expr.feats_test.df.shape}')

# initialize a run manager and run the experiment
    expr.init_runmanager()
    expr.run()
    print('DONE')

if __name__ == "__main__":
    main('PATH TO INI FILE/exp_emodb.ini') 
    # main(sys.argv[1]) # alternatively read it from command line

and this would be a minimal nkululeko configuration file (tested with version 0.8)

[EXP]
root = ./emodb/
name = exp_emodb
[DATA]
databases = ['emodb']
emodb = TO BE ADAPTED/emodb
emodb.split_strategy = speaker_split
emodb.testsplit = 40
target = emotion
labels = ['anger', 'boredom', 'disgust', 'fear', 'happiness', 'neutral', 'sadness']
[FEATS]
type = os
[MODEL]
type = svm

I hope the names of the entries are self-explanatory, here's the link to the config file description

Nkululeko: meta parameter optimization

Here's one idea how to find the optimal values for 2 layers of an MLP net with nkululeko:

  • store your meta-parameters in arrays
  • loop over them and initialize an experiment each time
  • keep the experiment name but change your parameters and the plot name
  • this way you can re-use your extracted features and do not get your harddisk cluttered.

Here's some python code to illustrate this idea:

def main(config_file):
    # load one configuration per experiment
    config = configparser.ConfigParser()
    config.read(config_file)
    util = Util()
    l1s = [32, 64, 128]
    l2s = [16, 32, 64]
    for l1 in l1s:
        for l2 in l2s:
            # create a new experiment
            expr = exp.Experiment(config)

            plotname = f'{util.get_exp_name()}_{l1}_{l2}'
            util.set_config_val('PLOT', 'name', plotname)

            print(f'running {expr.name} with layers {l1} and {l2}')

            layers = {'l1':l1, 'l2':l2}
            util.set_config_val('MODEL', 'layers', layers)

            # load the data
            expr.load_datasets()

            # split into train and test
            expr.fill_train_and_tests()
            util.debug(f'train shape : {expr.df_train.shape}, test shape:{expr.df_test.shape}')

            # extract features
            expr.extract_feats()
            util.debug(f'train feats shape : {expr.feats_train.df.shape}, test feats shape:{expr.feats_test.df.shape}')

            # initialize a run manager
            expr.init_runmanager()

            # run the experiment
            expr.run()

    print('DONE')

Keep in mind though that meta parameter optimization like done here is in itself a learning problem. It is usually not feasible to systematically try out all combinations of possible values and thus some kind of stochastic approach is preferable.

How to set up your first nkululeko project

Nkululeko is a framework to build machine learning models that recognize speaker characteristics on a very high level of abstraction (i.e. starting without programming experience).

This post is meant to help you with setting up your first experiment, based on the Berlin Emodb.

1) Set up python

It's written in python so first you have to set up a Python environment

2) Get a database

Load the Berlin emodb database to some location on you harddrive, as discussed in this post. I will refer to the location as "emodb root" from now on.

3) Download nkululeko

Navigate with a browser to the nkululeko github page and click on the "code" button, download the zip or (better) clone with your git software (step 1).

Unpack (if zip file) to some location on your hard disk that I will call "nkululeko root" from now on.

4) Install the required python packages

Inside the virtual environment that you created!

Navigate with a shell to the nkululeko root and install the python packages needed by nkululeko with

pip install -r requirements.txt

5) Adapt the ini file

Use your favourite editor, e.g. visual studio code and open the nkululeko root. If you use visual studio code, set the path to the environment as python interpreter path and store this (nkululeko root and python envirnment path) as a workspace configuration, so next time you can simply open the wprkspace and you're set up.

Open the exp_emodb.ini file and put your nkululeko root as the root value, for me this looks like this:

root = /home/felix/data/research/nkululeko/

Put the emodb root folder as the emodb value, for me this looks like this

emodb = /home/felix/data/audb/emodb

An overview on all nkululeko options should be here

6) Run the experiment

Inside a shell type (or use VSC) and start the process with

python my_experiment.py exp_emodb.ini

7) Inspect the results

If all goes well, the program should start by extracting opensmile features, and, if you're done, you should be able to inspect the results in the folder named like the experiment: exp_emodb.
There should be a subfolder with a confusion matrix named images` and a subfolder for the textual results named `results.

What to do next?

You might be interested in the hello world of nkululeko

.

Get all information from emodb

When you load the Berlin emodb as has been done in numerous postings of this blog, you will get per default only information on file name, speaker id, text id and emotion.

But there is more information contained in the audformat file and this posts shows you how to access it.

If not already somewhere on your computer, start by downloading the emodb:

if not os.path.isdir('./emodb/'):
    !wget -c https://tubcloud.tu-berlin.de/s/LzPWz83Fjneb6SP/download
    !mv download emodb_audformat.zip
    !unzip emodb_audformat.zip
    !rm emodb_audformat.zip

This code will then load the database, prepare a single dataframe with all information and store it to disk for later use:

# load the database to memory
root = './emodb/'
db = audformat.Database.load(root)
# map the file pathes to the audio
db.map_files(lambda x: os.path.join(root, x))   
# access speaker gender and age, and transcription, from the speaker dictionaries
df = db.tables['files'].get(map={'speaker': ['speaker', 'gender', 'age'], 'transcription': ['transcription']})
# copy the emotion label from the the emotion dataframe to the files dataframe
df['emotion'] = db.tables['emotion'].df['emotion']
# add a column with the word count
df['wordcount'] = df['transcription'].apply (lambda row: len(row.split()))
# store to disk for later use
df.to_pickle('store/emodb.pkl')

df.head(1)

Machine learning experiment framework

Currently i'm working on (yet another) framework for machine learning, i.e. a python coded set of classes that can be used to run machine learning experiments in a flexible but reusable way.

I'm not sure where this is heading yet, but a first runnable version exists, if interested check it out at my github account, I'll update news there.

The general idea looks something like this:

Predict emodb emotions with a Multi Layer Perceptron ANN

This post shows you how to classify emotions with a Multi Layer Perceptron (MLP) artificial neural net based on the torch framework (a different very famous ANN framework would be Keras).

Here's a complete jupyter notebook for your convenience.

We start with some imports, you need to install these packages, e.g. with pip, before you run this code:

import audformat
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import torch
import os
import opensmile
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import recall_score

Then we need to download and prepare our sample dataset, the Berlin emodb:

# get and unpack the Berlin Emodb emotional database if not already there
if not os.path.isdir('./emodb/'):
    !wget -c https://tubcloud.tu-berlin.de/s/LzPWz83Fjneb6SP/download
    !mv download emodb_audformat.zip
    !unzip emodb_audformat.zip
    !rm emodb_audformat.zip
# prepare the dataframe
db = audformat.Database.load('./emodb')
root = './emodb/'
db.map_files(lambda x: os.path.join(root, x))    
df_emotion = db.tables['emotion'].df
df = db.tables['files'].df
# copy the emotion label from the the emotion dataframe to the files dataframe
df['emotion'] = df_emotion['emotion']

As neural nets can only deal with numbers, we need to encode the target emotion labels with numbers:

# Encode the emotion words as numbers and use this as target 
target = 'enc_emo'
encoder = LabelEncoder()
encoder.fit(df['emotion'])
df[target] = encoder.transform(df['emotion'])

Now the dataframe should look like this:

df.head()

To ensure that we learn about emotions and not speaker idiosyncrasies we need to have speaker disjunct training and development sets:

# define fixed speaker disjunct train and test sets
train_spkrs = df.speaker.unique()[5:]
test_spkrs = df.speaker.unique()[:5]
df_train = df[df.speaker.isin(train_spkrs)]
df_test = df[df.speaker.isin(test_spkrs)]

print(f'#train samples: {df_train.shape[0]}, #test samples: {df_test.shape[0]}')
#train samples: 292, #test samples: 243

Next, we need to extract some acoustic features:

# extract (or get) GeMAPS features
if os.path.isfile('feats_train.pkl'):
    feats_train = pd.read_pickle('feats_train.pkl')
    feats_test = pd.read_pickle('feats_test.pkl')
else:
    smile = opensmile.Smile(
        feature_set=opensmile.FeatureSet.GeMAPSv01b,
        feature_level=opensmile.FeatureLevel.Functionals,
    )
    feats_train = smile.process_files(df_train.index)
    feats_test = smile.process_files(df_test.index)
    feats_train.to_pickle('feats_train.pkl')
    feats_test.to_pickle('feats_test.pkl')

Because neural nets are sensitive to large numbers, we need to scale all features with a mean of 0 and stddev of 1:

# Perform a standard scaling / z-transformation on the features (mean=0, std=1)
scaler = StandardScaler()
scaler.fit(feats_train)
feats_train_norm = pd.DataFrame(scaler.transform(feats_train))
feats_test_norm = pd.DataFrame(scaler.transform(feats_test))

Next we define two torch dataloaders, one for the training and one for the dev set:

def get_loader(df_x, df_y):
    data=[]
    for i in range(len(df_x)):
       data.append([df_x.values[i], df_y[target][i]])
    return torch.utils.data.DataLoader(data, shuffle=True, batch_size=8)
trainloader = get_loader(feats_train_norm, df_train)
testloader = get_loader(feats_test_norm, df_test)

We can then define the model, in this example with one hidden layer of 16 neurons:

class MLP(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = torch.nn.Sequential(
            torch.nn.Linear(feats_train_norm.shape[1], 16),
            torch.nn.ReLU(),
            torch.nn.Linear(16, len(encoder.classes_))
        )
    def forward(self, x):
        # x: (batch_size, channels, samples)
        x = x.squeeze(dim=1)
        return self.linear(x)

We define two functions to train and evaluate the model:

def train_epoch(model, loader, device, optimizer, criterion):
    model.train()
    losses = []
    for features, labels in loader:
        logits = model(features.to(device))
        loss = criterion(logits, labels.to(device))
        losses.append(loss.item())
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    return (np.asarray(losses)).mean()

def evaluate_model(model, loader, device, encoder):
    logits = torch.zeros(len(loader.dataset), len(encoder.classes_))
    targets = torch.zeros(len(loader.dataset))
    model.eval()
    with torch.no_grad():
        for index, (features, labels) in enumerate(loader):
            start_index = index * loader.batch_size
            end_index = (index + 1) * loader.batch_size
            if end_index > len(loader.dataset):
                end_index = len(loader.dataset)
            logits[start_index:end_index, :] = model(features.to(device))
            targets[start_index:end_index] = labels

    predictions = logits.argmax(dim=1)
    uar = recall_score(targets.numpy(), predictions.numpy(), average='macro')
    return uar, targets, predictions

Next we initialize the model and set the loss function (criterion) and optimizer:

device = 'cpu'
model = MLP().to(device)
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.0001)
epoch_num = 250
uars_train = []
uars_dev = []
losses = []

We can then do the training loop over the epochs:

for epoch in range(0, epoch_num):
    loss = train_epoch(model, trainloader, device, optimizer, criterion)
    losses.append(loss)
    acc_train = evaluate_model(model, trainloader, device, encoder)[0]
    uars_train.append(acc_train)
    acc_dev, truths, preds = evaluate_model(model, testloader, device, encoder)
    uars_dev.append(acc_dev)
# scale the losses so they fit on the picture
losses = np.asarray(losses)/2

Next we might want to take a look at how the net performed with respect to unweighted average recall (UAR):

plt.figure(dpi=200)
plt.plot(uars_train, 'green', label='train set') 
plt.plot(uars_dev, 'red', label='dev set')
plt.plot(losses, 'grey', label='losses/2')
plt.xlabel('eopchs')
plt.ylabel('UAR')
plt.legend()
plt.show()

And perhaps see the resulting confusion matrix:

from sklearn.metrics import ConfusionMatrixDisplay
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(truths, preds,  normalize = 'true')
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=encoder.classes_).plot(cmap='gray')

Make a t-SNE plot

This post shows you how to generate a t-distributed stochastic neighbor embedding (t-SNE) plot with the opensmile features extracted from emodb data (which is explained in more detail in a previous blog post).

A t-SNE plot is a very useful visualization, as it condenses your feature space into two dimensions (so it can be plotted) and then uses colors to represent the class membership. This means, if you can identify clusters of same colored dots in your data cloud, the features are able to separate the classes.

We need the following imports:

import audformat
from sklearn.manifold import TSNE
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import os
import opensmile

First, you download and prepare emodb:

# get and unpack the berlin Emodb emotional database
!wget -c https://tubcloud.tu-berlin.de/s/LzPWz83Fjneb6SP/download
!mv download emodb_audformat.zip
!unzip emodb_audformat.zip
!rm emodb_audformat.zip
# preapare the dataframe
db = audformat.Database.load('./emodb')
root = './emodb/'
db.map_files(lambda x: os.path.join(root, x))
df = db.tables['emotion'].df

Then, you extract the geMAPS features:

smile = opensmile.Smile(
    feature_set=opensmile.FeatureSet.GeMAPSv01b,
    feature_level=opensmile.FeatureLevel.Functionals,
)
feats_df = smile.process_files(df.index)

And finally, you generate the t-SNE plot with the sklearn library like this:

# Plot a TSNE
def plotTsne(feats, labels, perplexity=30, learning_rate=200):
    model = TSNE(n_components=2, random_state=0, perplexity=perplexity, learning_rate=learning_rate)
    tsne_data = model.fit_transform(feats)
    tsne_data_labs = np.vstack((tsne_data.T, labels)).T
    tsne_df = pd.DataFrame(data=tsne_data_labs, columns=('Dim_1', 'Dim_2', 'label'))
    sns.FacetGrid(tsne_df, hue='label', size=6).map(plt.scatter, 'Dim_1', 'Dim_2').add_legend()
    plt.show()
plotTsne(feats_df, df['emotion'], 30, 200)

It seems that these features are useful to distinguish at least the category anger from the rest.

You might want to fiddle around with the two main parameters of the algorithm: perplexity and learning-rate.

A python class to predict your emotions

This is a post to introduce you to the idea of encapsulating functionality with object-oriented programming.

We simply put the emotional classification of speech that was demonstrated in this post in a python class like this:

import opensmile
import os
import audformat
from sklearn import svm
import sounddevice as sd
import soundfile as sf
from scipy.io.wavfile import write

class EmoRec():
    root = './emodb/'
    clf = None
    filename = 'emorec.wav'
    sr = 16000
    def __init__(self):
        self.smile = opensmile.Smile(
            feature_set=opensmile.FeatureSet.GeMAPSv01b,
            feature_level=opensmile.FeatureLevel.Functionals,
        )
        if not os.path.isdir(self.root):
            self.download_emodb()
        db = audformat.Database.load(self.root)
        db.map_files(lambda x: os.path.join(self.root, x))
        self.df_emo = db.tables['emotion'].df
        self.df_files = db.tables['files'].df
        if not self.clf:
            self.train_model()

    def download_emodb(self):
        os.system('wget -c https://tubcloud.tu-berlin.de/s/LzPWz83Fjneb6SP/download')
        os.system('mv download emodb_audformat.zip')
        os.system('unzip emodb_audformat.zip')
        os.system('rm emodb_audformat.zip')

    def train_model(self):
        print('training a model...')
        df_feats = self.smile.process_files(self.df_emo.index)
        train_labels = self.df_emo.emotion
        train_feats =  df_feats
        self.clf = svm.SVC(kernel='linear', C=.001)
        self.clf.fit(train_feats, train_labels)
        print('done')

    def classify(self, wavefile):
        test_feats = self.smile.process_file(wavefile)
        return self.clf.predict(test_feats)

    def classify_from_micro(self, seconds):
        self.record(seconds)
        return self.classify(self.filename)[0]

    def record(self, seconds):
        data = sd.rec(int(seconds * self.sr), samplerate=self.sr, channels=1)
        sd.wait()  
        write(self.filename, self.sr, data)

def main():
    test = EmoRec()
    print(test.classify_from_micro(3))

if __name__ == "__main__":
    main()

To try this you could store the above in a file called , for example, 'emorec.py' and then in a jupyter notebook, call the constructor

import emorec
emoRec = emorec.EmoRec()

and use the functionality

result = emoRec.classify_from_micro(3)
print(f'emodb thinks your emotion is {result}')