All posts by felix

How to use selected features from Praat with Nkululeko

27. June 2022 felix Leave a comment

If you want to use acoustic parameters extracted by the wonderful Praat software with nkululeko, you state

[FEATS]
type=['praat']

in the feature section of your config file.
If you like to use only some features of all the ones that are extracted by David R. Feinberg's Praat scripts, you can look at the output and select some of them in the FEAT section, e.g.

type = ['praat']
praat.features = ['speechrate(nsyll / dur)']

You can do the same with opensmile features:

type = ['os']
os.features = ['F0semitoneFrom27.5Hz_sma3nz_amean']

or even combine them

type = ['praat', 'os']
praat.features = ['speechrate(nsyll / dur)']
os.features = ['F0semitoneFrom27.5Hz_sma3nz_amean']

this is actually the same as

type = ['praat', 'os']
features = ['speechrate(nsyll / dur)', 'F0semitoneFrom27.5Hz_sma3nz_amean']

if you would want to combine all of opensmile eGeMAPS features with selected Praat features, you would do:

type = ['praat', 'os']
praat.features = ['speechrate(nsyll / dur)']

It is interesting to see, how many emotions of Berlin EmoDB still get recognized with only mean F0 and Jitter as features:

What kind of features are there, you might ask yoursel?
Here's a list:
'duration', 'meanF0Hz', 'stdevF0Hz', 'HNR', 'localJitter',
'localabsoluteJitter', 'rapJitter', 'ppq5Jitter', 'ddpJitter',
'localShimmer', 'localdbShimmer', 'apq3Shimmer', 'apq5Shimmer',
'apq11Shimmer', 'ddaShimmer', 'f1_mean', 'f2_mean', 'f3_mean',
'f4_mean', 'f1_median', 'f2_median', 'f3_median', 'f4_median',
'JitterPCA', 'ShimmerPCA', 'pF', 'fdisp', 'avgFormant', 'mff',
'fitch_vtl', 'delta_f', 'vtl_delta_f''

nkululeko, tutorial

How to test a trained model on a new test set with Nkululeko

7. June 2022 felix Leave a comment

Sometimes you might want to test your already trained model(s) on a new dataset, e.g. because the training took a lot of resources.
If you stored your models during the training this is possible.

[DATA]
databases = ['emodb']
....
[MODEL]
save = True

In a new config file for your experiment that uses a dufferent test set, you set

[DATA]
databases = ['emodb', 'polish']
trains = ['emodb']
tests = ['polish']
strategy = cross_data....
[MODEL]
only_test = True

In the example above, emodb has been used as the training database, and polish in a second experiment later as a test database.

nkululeko, tutorial

How to compare several MLP layer layouts with each other

11. April 2022 felix Leave a comment

Some days ago I showed how you can run several experiments in one go.
Obviously this can be used to compare several ANN layer architectures as an alternative to the approach discussed in this (much earlier) post

There is an example configuration shipped with Nkululeko, and you simply can specify your layer specifications per experiment like this:

classifiers = [
    {'--model': 'mlp',
    '--layers': '\"{\'l1\':16,\'l2\':4}\"'},
    {'--model': 'mlp',
    '--layers': '\"{\'l1\':64,\'l2\':16}\"'},
    {'--model': 'mlp',
    '--layers': '\"{\'l1\':128,\'l2\':32}\"',
    '--learning_rate': '.0001',
    '--drop': '.3',},
    {'--model': 'xgb',
    '--epochs':1},
    {'--model': 'svm',
    '--epochs':1},
]

i.e in this example three MLP classifiers are specified with architectures:

(hidden) layer 1 with 16 neurons, and (hidden) layer 2 with 4 neurons
one layer with 64 and one with 16 neurons
and a third one with
- one layer with 128 and a second one with 32 neurons,
- learning rate of .0001 and
- dropout probability of 30%

and, for comparison:

a XGB classifier
and a SVM classifier

both only need to be trained one epoch because there are no weights to be adapted.
The MLP classifiers are trained with the epoch number that is specified in the sceleton config file

nkululeko, tutorial

How to run multiple experiments in one go with Nkululeko

28. March 2022 felix Leave a comment

With nkululeko since version 0.98.0 there is a new module that allows for the run of several classifier / feature combinations in one go, it's called flags.

Here is an example how to run this with any database:

[MODEL]
layers = [1024, 16]
patience = 5
[FLAGS]
models = ['xgb', 'svm', 'mlp']
features = ['os', 'praat']

You run this with the python command

python -m nkululeko.flags --config exp.ini

and then would get the output (in this case for emodb)

=== SUMMARY OF 6 EXPERIMENTS ===
Experiment 1: {'models': 'xgb', 'features': 'os'}
  Result: 0.571226718262931
Experiment 2: {'models': 'xgb', 'features': 'praat'}
  Result: 0.571226718262931
Experiment 3: {'models': 'svm', 'features': 'os'}
  Result: 0.3257565615802607
Experiment 4: {'models': 'svm', 'features': 'praat'}
  Result: 0.3257565615802607
Experiment 5: {'models': 'mlp', 'features': 'os'}
  Result: 0.36177991225837364
Experiment 6: {'models': 'mlp', 'features': 'praat'}
  Result: 0.3472090337439332

=== BEST CONFIGURATION ===
Best Result: 0.571226718262931
Best Parameters:
  models: xgb
  features: os

To use these parameters, set in your config file:
[MODEL]
type = xgb
[FEATS]
type = ['os']

Flags experiments time: 29.04 seconds (0.48 minutes)
DONE

nkululeko, tutorial

How to combine predictions per speaker with Nkululeko

24. March 2022 felix Leave a comment

Sometimes you might want to check how a model performs when the predictions of a single speaker are combined. In nkululeko there are two functions supported: mode (selecting the majority value) and mean (taking the arithmetic mean).

Simply specify in the PLOT section:

[PLOT]
combine_per_speaker = mean # or mode

nkululeko, tutorial

How to do cross validation with Nkululeko

23. March 2022 felix Leave a comment

Only for linear classifiers like XGB, SVM, SGR and SVR you have the possibility to disregard training and development splits and do a cross validation, i.e. validate one data set in a circular manner against itself.

The basic idea is that you take part of the data and evaluate against the rest, and in the next round take another part and so forth, until all data has been evaluated. Because the speaker identity is so strong in speech, this is done usually in a speaker exclusive manner, known under the term "leave one speaker out " (LOSO).

If you have too many speakers and/or each speaker really only one sample, you might want to split your speakers into groups and do a "leave one speaker group out" strategy (LOGO).

A related approach is known under the name k fold cross validation, where k usually equals 10.
When you only have one sample per speaker, this might make more sense.
So, how would you do that with Nkululeko?
First, you would define a training and development split for your data anyway, because Nkululeko is expecting it if there is only one database. You might set that to random, it's not used anyway:

[DATA]
mydata.split_strategy = random

Then in your config file, you specify in the MODEL section either:

[MODEL] 
logo = 10

to assign 10 groups to your speakers and then evaluate each group against all others.
If you want to do a leave-one-speaker_out experiment (LOSO), simply assign the number for logo the number of your speakers.

If there already is a fold column in your data, this will be used, otherwise Nkululeko will randomly assign folds to speakers.

Or you do

[MODEL] 
k_fold_cross = 5

for instance to disregard speaker information and simply evaluate 5 times a fifth of the data against the rest.
We use stratified sets, i.e. the algorithm tries to balance the class data within each set.

nkululeko, tutorial

Import speech data to nkululeko

18. March 2022 felix Leave a comment

Often you simply start an experiment with some audio data that you got from somewhere in no special format. Often the labels are encoded in the filenames.
If so, this Python script can help to convert the audio to a Nkululeko readable format and generate a CSV (comma separated values) file.

import os
from audeer import list_file_names
from os.path import basename

# folder with the original audio files (in wav format)
root = './orig_wav/'
# output folder, empty at the beginning
out_dir = './audio/'
# name of the output file list
out_file = 'data.csv'

# get a list of wav files
list = list_file_names(root, filetype = 'wav', basenames=True, recursive=True)
# write the list header (change to your data)
with open(out_file, 'a') as the_file:
    the_file.write('file,type\n')
# for each file
for file in list:
    # get the file name without path
    fn = basename(file)
    # convert to 16kHz sampling rate and mono channel 
    os.system(f'sox {root+file} -r 16000 -c 1 {out_dir+fn}')
    # extract the annotation label from the file name (change this to your needs)
    label = fn[0]
    # lastly: add file to list 
    with open(out_file, 'a') as the_file:
        the_file.write(f'{out_dir+fn},{label}\n')

The resulting data list can then be read by Nkululeko in the config file (using randomly 30 % of the data as development set):

[DATA]
my_data = /some_path/data.csv
my_data.type = csv
my_data.split_strategy = random
my_data.testsplit = 30

Allgemein

Heterophone Homographs in German

2. March 2022 felix Leave a comment

A list of heterophone homographs in the German language

Heroin
modern
Rentier

Allgemein

syntAct

1. March 2022 felix Leave a comment

I wrote a new project to generate a whole database of synthesized speech with emotion simulation called syntact.

nkululeko, tutorial

Adding dropout to MLP models with Nkululeko

25. February 2022 felix Leave a comment

Since version 0.15.0, dropout within MLP (multi layer perceptron) models is supported by Nkululeko, here's an example:

[MODEL]
type = mlp
layers = [128, 16]
drop = .5

meaning, that after each hidden layer a dropout probability of 50 percent is applied.

Since version 0.97.3, you can also assign individual drop-out rates per layer:

[MODEL]
type = mlp
layers = [128, 16]
drop = [.5, .2]

speechsurfer

All posts by felix

How to use selected features from Praat with Nkululeko

How to test a trained model on a new test set with Nkululeko

How to compare several MLP layer layouts with each other

How to run multiple experiments in one go with Nkululeko

How to combine predictions per speaker with Nkululeko

How to do cross validation with Nkululeko

Import speech data to nkululeko

Heterophone Homographs in German

A list of heterophone homographs in the German language

syntAct

Adding dropout to MLP models with Nkululeko

blog around speech technology