Category Archives: Allgemein

Nkululeko: predict speaker id

With nkululeko since version 0.93.0 the pyannote segmentation package is interfaced (as an alternative to silero)

There are two modules that you can use for this:

  • SEGMENT
  • PREDICT

The (huge) difference is, that the SEGMENT module looks at each file in the input data and looks for speakers per file (can be only one large file), while the PREDICT module concatenates all input data and looks for different speakers in the whole database.

In any case best run it on a GPU, as CPU will be very slow (and there is no progress bar).

Segment module

If you specify the method in [SEGMENT] section and the hf_token (needed for the pyannote model) in the [MODEL] section

[SEGMENT]
method = pyannote
segment_target = _segmented
sample_selection = all
[MODEL]
hf_token = <my hugging face token>

your resulting segmentations will have predicted speaker id attachched.. Be aware that this is really slow on CPU, so best run on GPU and declare so in the [MODEL] section:

[MODEL]
hf_token = <my hugging face token>
device=gpu # or cuda:0

As a result a new plot would appear in the image folder: the distribution of speakers that were found, e.g. like this:

Predict module

Simply select speaker as the prediction target:

[PREDICT]
targets = ["speaker"]

Generally, the PREDICT module is described here

Nkululeko: how to finetune a transformer model

With nkululeko since version 0.85.0 you can finetune a transformer model with huggingface (and even publish it there if you like).

If you like to have your model published, set:

[MODEL]
push_to_hub = True

Finetuning in this context means to train the (pre-trained) transformer layers with your new training data labels, as opposed to only using the last layer as embeddings.

The only thing you need to do is to set your MODEL type to finetune:

[FEATS]
type = []
[MODEL]
type = finetune

The acoustic features can/should be empty, because the transformer model starts with CNN layers to model the acoustics frame-wise. The frames are then getting pooled by the model for the whole utterance (max. duration the first 8 seconds, the rest is ignored).

The default base model is the one from facebook, but you can specify a different one like this:

[MODEL]
type = finetune
pretrained_model = microsoft/wavlm-base

duration = 10.5

The parameter max_duration is also optional (default=8) and means the maximum duration of your samples / segments (in seconds) that will be used, starting from 0. The rest is disregarded.

You can use the usual deep learning parameters:

[MODEL]
learning_rate = .001
batch_size = 16
device = cuda:3
measure = mse
loss = mse

but all of them have defaults.

The loss function is fixed to

  • weighted cross entropy for classification
  • concordance correlation coefficient for regression

The resulting best model and the huggingface logs (which can be read by tensorboard) are stored in the project folder.

Nkululeko: how to tweak the target variable for database comparison

Sometimes you want to compare two different databases that share a similar target variable, say, related to likability, but in a different scaling, say the one asked on a scale from 1 to 10 and the other used likert-scale from 1-7.

With nkululeko you can re-name labels, normalize the target values, and even inverse the polarity, for each databases.

In the following example there are two databases

[DATA]
databases = ['db1', 'db2']
db1.split_strategy = test
db1.scale = True
db2.colnames = {'non-attractive':'likability'}
db2.split_strategy = train
db2.scale = True
db2.reverse = True
db2.reverse.max = 10
target = likability
bins = [-1000, .2, 1000]
labels = ['less likable', 'more likable']

The one database db1 already has a likability label and just needs to be standard-normalized, the second one db2 has a related label non-attractive which needs to be renamed, inverted (based on a hypothetical maximum value of 10) and normalized.
Then, db1 can be used as test data and db2 as training.

Nkululeko: compare several databases

With nkululeko since version 0.77.7 there is a new interface named multidb which lets you compare several databases.

You can state their names in the [EXP] section and they will then be processed one after each other and against each other, the results are stored in a file called heatmap.png in the experiment folder.

!Mind YOU NEED TO OMIT THE PROJECT NAME!

Here is an example for such an ini.file:

[EXP]
root = ./experiments/emodbs/
#  DON'T give it a name, 
# this will be the combination 
# of the two databases: 
# traindb_vs_testdb
epochs = 1
databases = ['emodb', 'polish']
[DATA]
root_folders = ./experiments/emodbs/data_roots.ini
target = emotion
labels = ['neutral', 'happy', 'sad', 'angry']
[FEATS]
type = ['os']
[MODEL]
type = xgb

you can (but don't have to), state the specific dataset values in an external file like above.
data_roots.ini:

[DATA]
emodb = ./data/emodb/emodb
emodb.split_strategy = specified
emodb.test_tables = ['emotion.categories.test.gold_standard']
emodb.train_tables = ['emotion.categories.train.gold_standard']
emodb.mapping = {'anger':'angry', 'happiness':'happy', 'sadness':'sad', 'neutral':'neutral'}
polish = ./data/polish_emo
polish.mapping = {'anger':'angry', 'joy':'happy', 'sadness':'sad', 'neutral':'neutral'}
polish.split_strategy = speaker_split
polish.test_size = 30

Call it with:

python -m nkululeko.multidb --config my_conf.ini

The default behavior is that all databases are used as a whole when being test or training database. If you would rather like the splits to be used, you can add a flag for this:

[EXP]
use_splits = True

Here's a result with two databases:

and this is the same experiment, but with augmentations:

In order to add augmentation, simply add an [AUGMENT] section:

[EXP]
root = ./experiments/emodbs/augmented/
epochs = 1
databases = ['emodb', 'polish']
[DATA]
--
[AUGMENT]
augment = ['traditional', 'random_splice']
[FEATS]
...

In order to add an additional training database to all experiments, you can use:

[CROSSDB]
train_extra = [meta, emodb]

, to add two databases to all training data sets,
where meta and emodb should then be declared in the root_folders file

Nkululeko: oversample the training set

Sometimes, with categorically labeled data, the number of samples per class is very unevenly distributed, misleading the model to think that the overwhelming majority class is more important than the others.
In this case, two techniques might help: class weighting assigns a higher weight to samples from minority classes, and oversampling "invents" new samples for the minority classes.
With nkululeko since version 0.70.0, you can oversample the trainig set with different algorithms implemented by the imb_learn package.
You simply state the method in the FEATS section like so:

[FEATS]
...
balancing = adasyn # either ros, smote or adasyn

Three methods are available:

  • ros: simply repeat random samples from the minority classes
  • smote: "invent" new minority samples by little changes from the existing ones
  • adasyn: similar to smote, but resulting in uneven class distributions

Nkululeko: re-name data column names

With nkululeko since version 0.68.1, you can re-name data fields (columns in your data table) by setting the following in your ini-file:

[DATA]
databases = ['mydata']
mydata.colnames = {'Participant ID':'speaker', 'sex':'gender', 'Age': 'age'}

which means, that, before further processing, the Participant ID field in your database mydata will be treated as speaker label and so on.

Nkululeko: automatically stratify your split sets

With nkululeko since version 0.68.0, the selection of test/dev vs. train samples can be done automatically in a stratified manner, i.e. trying to find splits that are age or gender balanced.
An example for such a configuration is this:

[DATA]
# the name of the database
databases = ['emodb']
# the location of the data
emodb = ./data/emodb/emodb
# set the split strategy to "balanced"
emodb.split_strategy = balanced
# set a percentage value for your test split
emodb.test_size = 20
# stratify variables with weights for importance
balance = {'emotion':2, 'age':1, 'gender':1}
# all stratification variables need to be categorical, 
# so we need to state the number of bins for "age" 
age_bins = 2
# a value for how much importance to give for the ideal group sizes
size_diff_weight = 1
# the target value of the experiment
target = emotion

Nkululeko will always keep the speaker variable disjunct, i.e. resulting splits will contain different speakers.
With the example above, the algorithm will try to balance emotion, gender and (binned) age distributions across the splits.

Nkululeko: inspect your data with Spotlight

With nkululeko since version 0.67.0, the spotlight software is directly integrated as part of the EXPLORE module.

You can simply run your data filters, augmentations, machine learning experiments, segmentations and model predictions as usual, and then call the spotlight software by adding to your configuration file:

[EXPL]
sample_selection = all # or train or test 
spotlight = True 

and running the EXPORE module

python -m nkululeko.explore --config myconfig.ini

Note that you might require to install an extra package:

pip install renumics-spotlight

A new web browser window should open as an interface to spotlight:

Torchaudio

If you use modules, feature-extractors or models that use torchaudio with Nkululeko, like e.g . Resampler or Squim model, you need to install the nightly version.

pip uninstall -y torch torchvision torchaudio
pip install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu

Nkululeko: get some statistics on correlation and effect size

With nkululeko since version 0.64.0, some statistics are printed as part of the plot's titles.
With the explore module, you can plot correlations between the target (e.g. emotion or age) and other variables that are in the database, e.g. gender or duration, or everything you might have predicted with the predict module.
You need to differentiate if any of your variables is categorical/nominal (strings) or real valued (numbers).

If you plot the distribution of two categories, the Chi2 statistics is used to estimate if the correlation is significant. A p-value is given in the title, like e.g. in this plot:

If you plot the distribution of two real-valued variables, the correlation will be estimated by the Pearson's correlation coefficient:

If the target is categorical and the variable real-valued, we use Cohen's d to print out the maximal effect size of a category combination and the variable:

If the target is also real-valued, it will be binned (made categorical) per default:

If you want to prevent that, you can set a value in the configuration:

[EXPL]
bin_reals = False

and you get a different plot: