Category Archives: tutorial

Nkululeko: segmenting a database

Segmenting a database means to split the audio samples of a database into smaller segments or chunks. With speech data this is usually done on the basis of VAD, aka voice activity detection, meaning that the pauses between speech in the audio samples are used as segment borders.

The reason for segmenting could be to label the data with something that would not last over the whole sample, e.g. emotional state.
Another motivation to segment audio data might be that the acoustic features are targeted at a specific stretch of audio, e.g. 3-5 seconds long.

Within nkululeko this would be done with the segment module, which is currently based on the silero software.

You simply call your experiment configuration with the segment module, and the train, test set or both will be segmented.
The advantage is, that you can use all filters on your data that might make sense beforehand, for example with the android corpus, only the reading task samples are not segmented.
You can select them like so:

[DATA]
filter = [['task', 'reading']]

and then call the segment module:

python -m nkululeko.segment --config my_conf.ini

The output is a new database file in CSV format.

If you want, you can specify if only the training, or test split, or both should be segmented, as well as the string that is added to the name of the resulting csv file (the name per default consists of the database names):

[SEGMENT]
# name postfix
target = _segmented
# which model to use, can be pyannote or silero
method = silero
# which split: train, test or all (both)
sample_selection = all
# the minimum lenght of rest-samples (in seconds)
min_length = 2
# the maximum length of segments, longer ones are cut here.  (in seconds)
max_length = 10

If method = pyannote, you better work on a GPU and you need to state a huggingface ID:

[SEGMENT]
method = pyannote
[MODEL]
hf_token = <my huggingface token>
device = gpu

Nkululeko: check your dataset

Within nkululeko, since version 0.53.0, you can perform automatic data checks, which means that some of your data might be filtered out if it doesn't fulfill certain requirements.

Currently two checks are implemented:

[DATA]
# check the filesize of all samples in train and test splits, in bytes
 check_size = 1000
# check if the files contain speech with voice activity detection (VAD)
 check_vad = True

VAD is using silero VAD

Nkululeko: how to visualize your data distribution

If you just want to see how your data distributes on the target with nkululeko, you can do a value_counts plot with the explore module

In your config, you would specify like this:

[EXPL]
# all samples, or only test or train split?
sample_selection = all 
# activate the plot
value_counts = [['age'], ['gender'], ['duration'], ['duration', 'age']] 

and then, run this with the explore module:

python -m nkululeko.explore --config myconfig.ini

The results, for a data set with target=depression, looks similar to this for all samples:


and this for the speakers (if there is a speaker annotation)

If you prefer a kernel density estimation over a histogram, you can select this with

[EXPL]
dist_type = kde

which would result for duration to:

Nkululeko distinguishes between categorical and continuous properties, this would be the output for gender

You can show the distribution of two sample properties at once, by using a scatter plot:

In addition, this module will automatically plot the distribution of samples per speaker, per gender (if annotated):

Nkululeko: how to augment the training set

To do data augmentation with Nkululeko, you can use the augment or the aug_train interface.
The difference is that the former only augments samples, whereas the latter augments the training set of a configuration and then immediately performs the training, including the augmented files.

In the AUGMENT section of your configuration file, you specify the method and name of the output list of file

  • traditional: is the classic augmentation, e.g. by cropping data or adding a bit of noise. We use the audiomentations package for this
  • random-splice: is a special method introduced in this paper that randomly splices and re-connects the audio samples
[AUGMENT]
# select the samples to augment: either train, test, or all
sample_selection = train
# select the method(s)
augment = ['traditional', 'random_splice']
# file name to store the augmented data (can then be added to training)
result = augmented.csv

and then call the interface:

python -m nkululeko.augment --config myconfig.ini

or

python -m nkululeko.aug_train--config myconfig.ini

if you want to run a training in the same run.

Currently, apart from random-splicing, Nkululeko simply uses the audiomentations module, i.e.:

[AUGMENT]
augment = ['traditional']
augmentations = Compose([
AddGaussianNoise(min_amplitude=0.001, max_amplitude=0.05),
Shift(p=0.5),
BandPassFilter(min_center_freq=100.0, max_center_freq=6000),])

These manipulations are applied randomly to your training set.

With respect to the random_splicing method, you can adjust two parameters:

  • p_reverse: probability of some samples to be in reverse order (default: 0.3)
  • top_db: top dB level for silence to be recognized (default: 12)

This configuration, for example, would distort the samples much more than the default:

[AUGMENT]
augment = ['random_splice']
p_reverse = .8
top_db = 6

You should find the augmented files in the storage folder of the result folder of your experiment and could listen to them there.

Once you augmentations have been processed, you can add them to the training in a new experiment:

[DATA]
databases = ['original data', 'augment']
augment = my_augmentations.csv
augment.type = csv
augment.split_strategy = train

Nkululeko: show feature importance

Since version 0.40, Nkululeko can now show the best performing X acoustic features according to some model.

There is a new section call EXPL (short for exploration), and you could state

[EXPL]
feature_distributions = True
model = ['tree', 'gmm', 'svm']
permutation = True
max_feats = 5
sample_selection = test

in your config file, and then run the exploration module like this:

python -m nkululeko.explore --config my_config.ini

The following values can be set in the config:

  • feature_distributions: use this analysis.
  • sample_selection: Which sample set/split to use, either all, train or test
  • model: Which models to use to estimate feature importance. Can be all models from the MODEL section, If several are stated, the mean result is used.
  • max_feats: The best n feats are shown
  • permutation: use feature permutation to determine the best features. Make sure to test the models before.

The resulting list will then appear in the result folder and a barplot image in the image folder.

Afterwards you could inspect single features as described here

Nkululeko: how to plot distributions of feature values

As shown in this post, with Nkululeko you can select only specific features from your features sets by specifying them in the [FEAT] section:

[FEATS]
features = ['JitterPCA', 'meanF0Hz', 'hld_sylRate']

What you can also do, is plotting them per category (only for classification), by specifying in the PLOT section if you would like that for all samples or only test or train samples:

[EXPL]
# turn it on
feature_distributions = True 
# use only training samples
sample_selection = train 
# only plot the 5 most important features 
max_feats = 5  
# also plot this feature, irrespective of their importance
plot_features = ['f3_median']
# turn on output of all statistical tests
print_stats = True

You would have to call nkululeko with the explore interface:

python -m nkululeko.explore --config <myConfig.ini>

The image file is in the image folder and should look similar to this:

Stating in the title:

  • the number of samples
  • The outcome of a Kruskal-Wallis test (if more than two catgories)
  • The outcome of the most significant difference according to a t-test (you see all outcomes in the debug output if print_stats=True)

or, like this (for the additional feature).
You can turn gender specifics off by specifying

[EXPL]
ignore_gender = True

Nkululeko: how to predict many samples

There are three ways to predict a number of samples:

  1. If you want to save the predictions of an experiment for later use, you can do so by stating in the EXP section

    [EXP]
    save_test = ./my_saved_test_predictions.csv

    The output format is CSV, comma seperated values.

  2. Alternatively, you can test an existing database against the best model you trained before, by stating the databases as tests in the DATA section:

    [DATA]
    tests = ['my_testdb']
    my_testdb = /mypath/my_testdb
    ...

    and then calling Nkululeko's test module

    python -m nkululeko.test --config mycoonfg.ini --outfile myresults.csv
  3. Run the demo module simply for a set of files:

    python -m nkululeko.demo --config mycoonfg.ini --list my_filelist.txt

Transformation architectures

Generally a difference for machine learners can be made by the nature of input and output.


source

One to one

Typically an application would be to classify the main motive of a picture (e.g. cat or dog) or the emotional category that is displayed in an audio recording. Key is, that the input is represented by a single vector of values of fixed length.

One to many

Many to one

Sequence to sequence

Many to many

How to import acoustic features with the Nkululeko software

With nkululeko you usually compute acoustic features with some software, but you can also simply import them from a text file.

You simply use the import as feature extraction type, and specify the file locations (can of course be only one) like this:

[FEATS]
type = ['import']
import_file = ['tests/results/exp_emodb_praat/store/emodb_praat_test.csv', 'tests/results/exp_emodb_praat/store/emodb_praat_train.csv']

The import file(s) must be in CSV (comma separated values) format and contain either a file index (a column with file names) or an audformat segmented index.

The index must match with the one from your data, else you will get the error:

Imported features for data set XXX not found!

You can combine with other features, e.g.

[FEATS]
type = ['import', 'os']

and select features, just like with any other feature sets.