Category Archives: course_content

Supervised vs. unsupervised

4. March 2023 felix Leave a comment

Supervised vs. unsupervised

means the distinction whether your training data is annotated (or labeled) with respect to your task. An example: If you want to build a machine learner for human age estimation based on speech, you might give an algorithm a lot of examples of human speech annotated with the age of the person. This would be your training data and the approach would be supervised (by the age annotations). With unsupervised learning, you would give an algorithm simply a lot of human speech data and might ask it to cluster the data, based on differences. And might hope that the resulting clusters coincide with age.

Nkululeko exercise

-> Nkululeko: install the Berlin Emodb

This database contains examples of labels:

emotion and gender labels as categorical data, for classification
age labels as numerical data, for regression

Allgemein, course_content

How to normalize features

15. December 2022 felix Leave a comment

"Normalizing" or scaling feature values means to shift them to a common range, or distribution with same mean and standard deviation (also called z-transformation).
You would do that for several reasons:

Artificial neural nets can handle small numbers best, so they all should be in the range -1, 1
Speakers have their individual ways to speak which you are not interested in if you want to learn a general task, e.g. emotion or age. So you would speaker-normalize the values for each speaker individually. Of course this is in most applications not possible because you don't have already samples of your test speakers.
You might want to normalize the sexes, because woman typicall have a higher pitch. But another way out is also to use only relative values and not absolute ones.

Mind that you shouldn't use your test set for normalization as it really only should be used for test and is supposed to be unknown. That's why you should compute your normalization parameters on the training set, you can then use them to normalize/scale the test.

course_content

Augmenting data

7. December 2022 felix Leave a comment

Often (kind of always) there is a lack of training data for supervised learning.

One way to tackle this is representation learning which can be done in an self-supervised fashion.

Another approach is to multiply your labeled training data by adding slightly altered versions of it, that would not change the information that is the aim of the detection, for example by adding noise to the data or clipping it. This is called augmentation and here is a post how to do this with nkululeko.

A third way is to synthesize data based on the labeled training, for example with GANs, VAEs or with rule-based simulation. It can be distinguished if in this case only a parameterized for of the samples (ie. the features) or whole audio files are generated.

Sometimes only samples for a rare class are needed, in this case techniques like ROS (random over sampling), Synthetic Minority Oversampling Technique (SMOTE) or the Adaptive Synthetic (ADASYN) can be used.
Here is a post how to do this with nkululeko

course_content, nkululeko

Nkululeko

1. December 2022 felix Leave a comment

This is the entry post for Nkululeko: a framework to do machine learning experiments on audio data based on configuration files.

Here's an overview on the tutorials:

Allgemein, course_content

Meta parameter tuning

1. December 2022 felix Leave a comment

The parameters that configure machine learning algorithms are called meta parameters in contrast to the "normal" parameters that are learned during training.

But as they obviously also influence the quality of your predictions, these parameters also must be learned.

Examples are

the C parameter for SVM
the number of subsamples for XGB
the number of layers and neurons for a neural net

The naive approach is simply to try them all,
how to do this with Nkululeko is described here

But in general, because the search space for the optimal configuration usually is without limit, it'd be better to try a stochastic approach or a genetic one.

Allgemein, course_content, seminar

ML course: introduction

2. November 2022 felix Leave a comment

This is a first of a series of posts to support my lecture "speech processing with machine learning".
Focus is an introduction to topics related, mainly machine learning as i teach phoneticians which already know a lot about speech.

This page is the landing page which serves as a table of contents for the posts, i will try to introduce a meaningful order for the posts, but sequential read is not required. As said, it's introductory anyway and it's very easy to find much deeper posts on the net. E.g. here's a great list with pictures

Links that are marked with (nkulu) are for posts that use Nkululeko as a hands-on exercise.

How does it work in general? -> learning from data
Supervised or not? (nkulu): Main distintions for machine learning
- learning by example (Supervised)
- Unsupervised
  - clustering
  - representation learning/ Self-Supervised
  - learning by interaction -> Reinforcement Learning
Splits: test, train and dev (nkulu): How to learn what from data
Evaluation: Kinds of evaluation metrics
Meta parameter tuning: How to tune your predictor
Augmentation: Enhance generalization by adding altered training samples
Feature normalization/scaling: Shift the feature values to a common value range.
Kinds of machine learning: A taxonomy of buzzwords around articial neural nets.
Different machine learners: Introducing the most common approaches to machine learning
Transformation architectures: Introducing the architectural differences od input/output processing

speechsurfer

Category Archives: course_content

Supervised vs. unsupervised

How to normalize features

Augmenting data

Nkululeko

Meta parameter tuning

ML course: introduction

Media links

blog around speech technology