How to normalize features

15. December 2022 felix Leave a comment

"Normalizing" or scaling feature values means to shift them to a common range, or distribution with same mean and standard deviation (also called z-transformation).
You would do that for several reasons:

Artificial neural nets can handle small numbers best, so they all should be in the range -1, 1
Speakers have their individual ways to speak which you are not interested in if you want to learn a general task, e.g. emotion or age. So you would speaker-normalize the values for each speaker individually. Of course this is in most applications not possible because you don't have already samples of your test speakers.
You might want to normalize the sexes, because woman typicall have a higher pitch. But another way out is also to use only relative values and not absolute ones.

Mind that you shouldn't use your test set for normalization as it really only should be used for test and is supposed to be unknown. That's why you should compute your normalization parameters on the training set, you can then use them to normalize/scale the test.

course_content

Augmenting data

7. December 2022 felix Leave a comment

Often (kind of always) there is a lack of training data for supervised learning.

One way to tackle this is representation learning which can be done in an self-supervised fashion.

Another approach is to multiply your labeled training data by adding slightly altered versions of it, that would not change the information that is the aim of the detection, for example by adding noise to the data or clipping it. This is called augmentation and here is a post how to do this with nkululeko.

A third way is to synthesize data based on the labeled training, for example with GANs, VAEs or with rule-based simulation. It can be distinguished if in this case only a parameterized for of the samples (ie. the features) or whole audio files are generated.

Sometimes only samples for a rare class are needed, in this case techniques like ROS (random over sampling), Synthetic Minority Oversampling Technique (SMOTE) or the Adaptive Synthetic (ADASYN) can be used.
Here is a post how to do this with nkululeko

course_content, nkululeko

Nkululeko

1. December 2022 felix Leave a comment

This is the entry post for Nkululeko: a framework to do machine learning experiments on audio data based on configuration files.

Here's an overview on the tutorials:

Allgemein, course_content

Meta parameter tuning

1. December 2022 felix Leave a comment

The parameters that configure machine learning algorithms are called meta parameters in contrast to the "normal" parameters that are learned during training.

But as they obviously also influence the quality of your predictions, these parameters also must be learned.

Examples are

the C parameter for SVM
the number of subsamples for XGB
the number of layers and neurons for a neural net

The naive approach is simply to try them all,
how to do this with Nkululeko is described here

But in general, because the search space for the optimal configuration usually is without limit, it'd be better to try a stochastic approach or a genetic one.

Allgemein

How to split your data

1. December 2022 felix Leave a comment

In supervised machine learning, you usually need three kinds of data sets:

train data: to teach the model the relation between data and labels
dev data: (short for development) to tune meta parameters of your model, e.g. number of neurons, batch size or learning rate.
test data: to evaluate your model ONCE at the end to check on generalization

Of course all this is to prevent overfitting on your train and/or dev data.

If you've used your test data for a while, you might need to find a new set, as chances are high that you overfitted on your test during experiments.

So what's a good split?

Some rules apply:

train and dev can be from the same set, but the test set is ideally from a different database.
if you don't have so much data, a 80/20/20 % split is normal
if you have masses an data, use only so much dev and test that your population seems covered.
If you have really little data: use x cross validation for train and dev, still the test set should be extra

Nkululeko exercise

Edit the demo configuration

1)
Set/keep as target emotion as FEAT type os and as MODEL type xgb

Use the emodb as test and train set but try out all split methods

specified
speaker split
random
loso
logo
5_fold_cross_validation

Which works best and why?

2)
Set the

[EXP]
epochs = 200
[MODEL] 
type = mlp
layers = {'l1':1024, 'l2':64} 
save = True
[PLOT]
epoch_progression = True
best_model = True

run the experiment.
Find the epoch progression plot and see at which epoch overfitting starts.

speechsurfer

Monthly Archives: December 2022

How to normalize features

Augmenting data

Nkululeko

Meta parameter tuning

How to split your data

blog around speech technology