Category Archives: tutorial

Nkululeko: classifying continuous variables

26. January 2022 felix Leave a comment

Nkululeko supports classification and regression.
Classification means predicting a class (or category) from data, regression predicting a continuous value, as for example the speaker age in years.

If you want to use classification with continuous variables, you need to first bin it, which means that you put the values into pre-defined bins. To stay with our age example, you'd declare everyone above 50 years as old and all other as young.

This post shows you how to do that with Nkululeko by setting up your .ini file.

You set up the experiment as classification type:

[EXP]
...
type = classification

But declare the data to be continuous:

[DATA]
...
type = continuous
labels = ['u40', '40ies', '50ies', '60ies', 'ü70']
bins  = [-1000,  40, 50, 60, 70, 1000]

Then the data will be binned according to the sepecified bins and labeled accordingly.
You need (number of labels) + 1 values for the bins, as they are given lower and upper limit. It makes sense to set the lower and upper absolute limits extreme as you don't know what the classifier will predict.

nkululeko, tutorial

How to soft-label a database with Nkululeko

24. January 2022 felix Leave a comment

Soft-labeling means to annotate data with labels that were predicited by a machine classifier.
As they were not evaluated by a human, you might call them "soft".

Two steps are necessary:
1) save a test/evaluation set as a new database
2) load this new database in a new experiment as training data

Within nkululeko, you would do it like this:

step 1: save a new database

You simply specifify a name in the EXP section to save the test predictions like this:

[EXP]
...
save_test = ./my_test_predictions.csv

You need Model save to be turned on, because it will look for the best performing model:

[MODEL]
...
save = True

This will store the new database to a file called my_test_predictions.csv in the folder where the python was called.

step 2: load as training data

Here is an example configuration how to load this data as additional training (in this case in addition to emodb):

[EXP]
root = ./tests/
name = exp_add_learned
runs = 1
epochs = 1
[DATA]
strategy = cross_data
databases = ['emodb', 'learned', 'test_db']
trains = ['emodb', 'learned']
tests = ['test_db']
emodb = /path-to-emodb/
emodb.split_strategy = speaker_split
emodb.mapping = {'anger':'angry', 'happiness':'happy', 'sadness':'sad', 'neutral':'neutral'}
test_db = /path to test database/
test_db.mapping = <if any mapping to the target categories is needed>
learned = ./my_test_predictions.csv
learned.type = csv
target = emotion
labels = ['angry', 'happy', 'neutral', 'sad']
[FEATS]
type = os
[MODEL]
type = xgb
save = True
[PLOT]

nkululeko, tutorial

Nkululeko: try out / demo a trained model

24. January 2022 felix Leave a comment

This is another Nkululeko post that shows you how to demo model that you trained before.

First you need to train a model, e.g. on emodb as shown here
In the ini file you MUST set the parameters

[EXP]
...
save = True

in the general section and

[MODEL]
...
save = True

in the MODEL section of the configuration file.
Background: we need both the experiment as well as all model files to be saved on disk.

Here is an example script then how to call the demo mode:

python -m nkululeko.demo --config exp_emodb.ini

, if your config file is called exp_emob.ini
Automatically the best performing model will be used.
This will start recording for 3 seconds from your microphone.
if you specify

python -m nkululeko.demo --config exp_emodb.ini --file test.wav

the file test.wav will be predicted, it needs to be in 16 kHz sampling rate and mono channel.
If you specify

python -m nkululeko.demo --config exp_emodb.ini --list my_list.txt --folder data/ravdess/ --outfile my_results.csv

The file my_list.txt will be read and it is expected to contain one file to be predicted per line, e.g.:

tests/a.wav
tests/b.wav

The optional argument --folder can be used to specify a parent folder for the input files.

The optional argument --outfile can be used to save the results in a CSV table.

If --list my_file.csv is in CSV format, audformat will be interpreted, but only the index will be used.

E.g. with

file,emotion
tests/a.wav,happy
tests/b.wav,angry

only the file column will be used.

nkululeko, tutorial

How to set up wav2vec embedding for nkululeko

3. December 2021 felix Leave a comment

Since version 0.10, nkululeko supports facebook's wav2vec 2.0 embeddings as acoustic features.
This post shows you how to set this up.

set up nkululeko

in your nkululeko configuration (*.ini) file, set the feature extractor as wav2vec2 and denote the path to the model like this:

[FEATS]
type = ['wav2vec2']
wav2vec.model = /my path to the huggingface model/

Alternatively you can state the huggingface model name directly:

[FEATS]
type = ['wav2vec2-base-960h']

Out of the box, as embeddings the last hidden layer is used. But the original wav2vec2 model consists of 7 CNN layers followed by up to 24 transformer layers. if you like to use an earlier layer than the last one, you can simply count down-

[FEATS]
type = wav2vec2
wav2vec.layer = 12

This would use the 12th layer of a 24 layer model and only the4 CNN layers of a 12 layer model.

nkululeko, tutorial

Nkululeko: perform cross database experiments

5. October 2021 felix Leave a comment

This is one of a series of posts about how to use nkululeko.
If you're unfamilar with nkululelo, you might want to start here.

This post is about cross database experiments, i.e. training a classifier on one database and test it on another, something that happens quite often with real life situations.

In this post I will only talk about the config file, the python file can be re-used.

I'll walk you through the sections of the config file (all options here):
The first section deals with general setup:

[EXP]
# root is the base directory for the experiment relative to the python call
root = ./experiment_1/
# mainly a name for the top folder to store results (inside root)
name = cross_data

Next, the DATA section might look like this

# declare which databases to use
databases = ['emodb', 'polish']
# specify the location of the data
emodb = <path to the database>
polish = <path to the database>
# we split one database as specified
emodb.split_strategy = specified
# here is the test, the training set is disregarded
emodb.test.tables = ['emotion.categories.test.gold_standard']
# the whole of the polish database is used for train
polish.split_strategy = train
# the target label for the experiment
target = emotion
# we need to unify the labels of both databases
emodb.mapping = {'anger':'angry', 'happiness':'happy', 'sadness':'sad', 'neutral':'neutral'}
polish.mapping = {'anger':'angry', 'joy':'happy', 'sadness':'sad', 'neutral':'neutral'}
# and these are the labels we want to distinguidh
labels = ['angry', 'happy', 'neutral', 'sad']

The features section, better explained in this post

[FEATS]
type = os

The classifiers section, better explained in this post

[MODEL]
type = xgb

Again, you might want to plot the final distribution of categories per train and test set:

[PLOT]
value_counts = True

nkululeko, tutorial

Nkululeko: comparing classifiers and features

5. October 2021 felix Leave a comment

This is one of a series of posts about how to use nkululeko.

If you're unfamilar with nkululelo, you might want to start here.

This post is about machine classification (as opposed to regression problems) and an introduction how to combine different features sets with different classifiers.

I'll walk you through the sections of the config file (all options here):
The first section deals with general setup:

[EXP]
# root is the base directory for the experiment relative to the python call
root = ./experiment_1/
# mainly a name for the top folder to store results (inside root)
name = exp_A
# needed only for neural net classifiers
#epochs = 100
# needed only for classifiers with random initialization
# runs = 3

The DATA section deals with the data sets:

[DATA]
# list all the databases  you will be using
databases = ['emodb']

# state the path to the audformat root folder
emodb = /home/felix/data/audb/emodb

# split train and test based on different random speakers
emodb.split_strategy = speaker_split

# state the percentage of test speakers (in this case 4 speakers, as emodb only has 10 speakers)
emodb.testsplit = 40

# the target label that should be classified
target = emotion

# the categories you are interested in
labels = ['anger', 'boredom', 'disgust', 'fear', 'happiness', 'neutral', 'sadness']

The next section deals with the features that should be used by the classifier.

[FEATS]
# the type of features to use
type = ['os']

The choice of currently implemented features is described here:

You can combine feature sets, i.e. they will be concatenated column-wise, e.g. like so:

[FEATS]
# the type of features to use
type = ['os', 'praat']

Next comes the MODEL section which deals with the classifier:

[MODEL]
# the main thing to sepecify is the kind of classifier:
type = xgb

Choices are described here

Models can not be combined, but you can do ensemble learning

nkululeko, tutorial

Setting up a base nkululeko experiment

5. October 2021 felix 1 Comment

This is one of a series of posts on how to use nkululeko and deals with setting up the "hello world" of nkululeko: performing classification on the berlin emodb emotional datbase.

Nkululeko experiments are defined by one file:

an initialization file that is interpreted by the nkululeko framework

This would be a minimal nkululeko configuration file (tested with version 0.8)

[EXP]
root = ./results
name = exp_emodb
[DATA]
databases = ['emodb']
emodb = TO BE ADAPTED/emodb
emodb.split_strategy = speaker_split
emodb.testsplit = 40
target = emotion
labels = ['anger', 'boredom', 'disgust', 'fear', 'happiness', 'neutral', 'sadness']
[FEATS]
type = os
[MODEL]
type = svm

I hope the names of the entries are self-explanatory, here's the link to the config file description

code, tutorial

How to set up your first nkululeko project

30. August 2021 felix Leave a comment

Nkululeko is a framework to build machine learning models that recognize speaker characteristics on a very high level of abstraction (i.e. starting without programming experience).

This post is meant to help you with setting up your first experiment, based on the Berlin Emodb.

1) Set up python

It's written in python so first you have to set up a Python environment

2) Get a database

Load the Berlin emodb database to some location on you harddrive, as discussed in this post. I will refer to the location as "emodb root" from now on.

3) Install nkululeko

Inside your virtual environment, run

pip install nkululeko

This should install nkululeko and all required modules.
It takes a long time and a lot of space, when done intially.

5) Adapt the ini file

Use your favourite editor, e.g. visual studio code and edit the file that defines your experiment. You might start with this demo sample.
You can find more templates to start here and an overview on all the options you can set here

Put the emodb root folder as the emodb value, for me this looks like this

emodb = /home/felix/data/audb/emodb

An overview on all nkululeko options should be here

6) Run the experiment

Inside a shell type (or use VSC) and start the process with

python -m nkululeko.nkululeko --config exp_emodb.ini

7) Inspect the results

If all goes well, the program should start by extracting opensmile features, and, if you're done, you should be able to inspect the results in the folder named like the experiment: exp_emodb.
There should be a subfolder with a confusion matrix named images` and a subfolder for the textual results named `results.

What to do next?

You might be interested in the hello world of nkululeko

code, tutorial

Get all information from emodb

10. August 2021 felix Leave a comment

When you load the Berlin emodb as has been done in numerous postings of this blog, you will get per default only information on file name, speaker id, text id and emotion.

But there is more information contained in the audformat file and this posts shows you how to access it.

If not already somewhere on your computer, start by downloading the emodb:

if not os.path.isdir('./emodb/'):
    !wget -c https://zenodo.org/records/16273947/files/emodb_2.0.zip
    !unzip emodb_2.0.zip
    !rm emodb_2.0.zip

This code will then load the database, prepare a single dataframe with all information and store it to disk for later use:

# load the database to memory
root = './emodb/'
db = audformat.Database.load(root)
# map the file pathes to the audio
db.map_files(lambda x: os.path.join(root, x))   
# access speaker gender and age, and transcription, from the speaker dictionaries
df = db.tables['files'].get(map={'speaker': ['speaker', 'gender', 'age'], 'transcription': ['transcription']})
# copy the emotion label from the the emotion dataframe to the files dataframe
df['emotion'] = db.tables['emotion'].df['emotion']
# add a column with the word count
df['wordcount'] = df['transcription'].apply (lambda row: len(row.split()))
# store to disk for later use
df.to_pickle('store/emodb.pkl')

df.head(1)

code, tutorial

Predict emodb emotions with a Multi Layer Perceptron ANN

7. July 2021 felix Leave a comment

This post shows you how to classify emotions with a Multi Layer Perceptron (MLP) artificial neural net based on the torch framework (a different very famous ANN framework would be Keras).

Here's a complete jupyter notebook for your convenience.

We start with some imports, you need to install these packages, e.g. with pip, before you run this code:

import audformat
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import torch
import os
import opensmile
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import recall_score

Then we need to download and prepare our sample dataset, the Berlin emodb:

# get and unpack the Berlin Emodb emotional database if not already there
if not os.path.isdir('./emodb/'):
    !wget -c https://tubcloud.tu-berlin.de/s/8Td8kf8NXpD9aKM/download
    !mv download emodb_audformat.zip
    !unzip emodb_audformat.zip
    !rm emodb_audformat.zip
# prepare the dataframe
db = audformat.Database.load('./emodb')
root = './emodb/'
db.map_files(lambda x: os.path.join(root, x))    
df_emotion = db.tables['emotion'].df
df = db.tables['files'].df
# copy the emotion label from the the emotion dataframe to the files dataframe
df['emotion'] = df_emotion['emotion']

As neural nets can only deal with numbers, we need to encode the target emotion labels with numbers:

# Encode the emotion words as numbers and use this as target 
target = 'enc_emo'
encoder = LabelEncoder()
encoder.fit(df['emotion'])
df[target] = encoder.transform(df['emotion'])

Now the dataframe should look like this:

df.head()

To ensure that we learn about emotions and not speaker idiosyncrasies we need to have speaker disjunct training and development sets:

# define fixed speaker disjunct train and test sets
train_spkrs = df.speaker.unique()[5:]
test_spkrs = df.speaker.unique()[:5]
df_train = df[df.speaker.isin(train_spkrs)]
df_test = df[df.speaker.isin(test_spkrs)]

print(f'#train samples: {df_train.shape[0]}, #test samples: {df_test.shape[0]}')

#train samples: 292, #test samples: 243

Next, we need to extract some acoustic features:

# extract (or get) GeMAPS features
if os.path.isfile('feats_train.pkl'):
    feats_train = pd.read_pickle('feats_train.pkl')
    feats_test = pd.read_pickle('feats_test.pkl')
else:
    smile = opensmile.Smile(
        feature_set=opensmile.FeatureSet.GeMAPSv01b,
        feature_level=opensmile.FeatureLevel.Functionals,
    )
    feats_train = smile.process_files(df_train.index)
    feats_test = smile.process_files(df_test.index)
    feats_train.to_pickle('feats_train.pkl')
    feats_test.to_pickle('feats_test.pkl')

Because neural nets are sensitive to large numbers, we need to scale all features with a mean of 0 and stddev of 1:

# Perform a standard scaling / z-transformation on the features (mean=0, std=1)
scaler = StandardScaler()
scaler.fit(feats_train)
feats_train_norm = pd.DataFrame(scaler.transform(feats_train))
feats_test_norm = pd.DataFrame(scaler.transform(feats_test))

Next we define two torch dataloaders, one for the training and one for the dev set:

def get_loader(df_x, df_y):
    data=[]
    for i in range(len(df_x)):
       data.append([df_x.values[i], df_y[target][i]])
    return torch.utils.data.DataLoader(data, shuffle=True, batch_size=8)
trainloader = get_loader(feats_train_norm, df_train)
testloader = get_loader(feats_test_norm, df_test)

We can then define the model, in this example with one hidden layer of 16 neurons:

class MLP(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = torch.nn.Sequential(
            torch.nn.Linear(feats_train_norm.shape[1], 16),
            torch.nn.ReLU(),
            torch.nn.Linear(16, len(encoder.classes_))
        )
    def forward(self, x):
        # x: (batch_size, channels, samples)
        x = x.squeeze(dim=1)
        return self.linear(x)

We define two functions to train and evaluate the model:

def train_epoch(model, loader, device, optimizer, criterion):
    model.train()
    losses = []
    for features, labels in loader:
        logits = model(features.to(device))
        loss = criterion(logits, labels.to(device))
        losses.append(loss.item())
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    return (np.asarray(losses)).mean()

def evaluate_model(model, loader, device, encoder):
    logits = torch.zeros(len(loader.dataset), len(encoder.classes_))
    targets = torch.zeros(len(loader.dataset))
    model.eval()
    with torch.no_grad():
        for index, (features, labels) in enumerate(loader):
            start_index = index * loader.batch_size
            end_index = (index + 1) * loader.batch_size
            if end_index > len(loader.dataset):
                end_index = len(loader.dataset)
            logits[start_index:end_index, :] = model(features.to(device))
            targets[start_index:end_index] = labels

    predictions = logits.argmax(dim=1)
    uar = recall_score(targets.numpy(), predictions.numpy(), average='macro')
    return uar, targets, predictions

Next we initialize the model and set the loss function (criterion) and optimizer:

device = 'cpu'
model = MLP().to(device)
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.0001)
epoch_num = 250
uars_train = []
uars_dev = []
losses = []

We can then do the training loop over the epochs:

for epoch in range(0, epoch_num):
    loss = train_epoch(model, trainloader, device, optimizer, criterion)
    losses.append(loss)
    acc_train = evaluate_model(model, trainloader, device, encoder)[0]
    uars_train.append(acc_train)
    acc_dev, truths, preds = evaluate_model(model, testloader, device, encoder)
    uars_dev.append(acc_dev)
# scale the losses so they fit on the picture
losses = np.asarray(losses)/2

Next we might want to take a look at how the net performed with respect to unweighted average recall (UAR):

plt.figure(dpi=200)
plt.plot(uars_train, 'green', label='train set') 
plt.plot(uars_dev, 'red', label='dev set')
plt.plot(losses, 'grey', label='losses/2')
plt.xlabel('eopchs')
plt.ylabel('UAR')
plt.legend()
plt.show()

And perhaps see the resulting confusion matrix:

from sklearn.metrics import ConfusionMatrixDisplay
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(truths, preds,  normalize = 'true')
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=encoder.classes_).plot(cmap='gray')

speechsurfer

Category Archives: tutorial

Nkululeko: classifying continuous variables

How to soft-label a database with Nkululeko

step 1: save a new database

step 2: load as training data

Nkululeko: try out / demo a trained model

How to set up wav2vec embedding for nkululeko

set up nkululeko

Nkululeko: perform cross database experiments

Nkululeko: comparing classifiers and features

Setting up a base nkululeko experiment

How to set up your first nkululeko project

1) Set up python

2) Get a database

3) Install nkululeko

5) Adapt the ini file

6) Run the experiment

7) Inspect the results

What to do next?

Get all information from emodb

Predict emodb emotions with a Multi Layer Perceptron ANN

blog around speech technology