Tag Archives: features

Feature scaling

Usually machine learning algorithms are not trained with raw data (aka end-to-end) but with features that model the entities of interest.
With respect to speech samples these features might be for example average pitch value over the whole utterance or length of utterance.

Now if the pitch value is given in Hz and the length in seconds, the pitch value will be in the range of [80, 300] and the length, say, in the range of [1.5, 6].
Machine learning approaches now would give higher consideration on the avr. pitch because the values are higher and differ by a larger amount, which is in the most cases not a good idea because it's a totally different feature.

A solution to this problem is to scale all values so that the features have a mean of 0 and standard deviation of 1.
This can be easily done with the preprocessing API from sklearn:

from sklearn import preprocessing
scaler = StandardScaler()
scaled_features = preprocessing.scaler.fit_transform(features)

Be aware that the use of the standard scaler only makes sense if the data follows a normal distribution.

How to extract formant tracks with Praat and python

This tutorial was adapted based on the examples from David R Feinberg

This tutorial assumes you started a Jupyter notebook . If you don't know what this is, here's a tutorial on how to set one up (first part)

First you should install the parselmouth package, which interfaces Praat with python:

!pip install -U praat-parselmouth

which you would then import:

import parselmouth 
from parselmouth import praat

You do need some audio input (wav header, 16 kHz sample rate)

testfile = '/home/felix/data/data/audio/testsatz.wav'

And would then read in the sound with parselmouth like this:

sound = parselmouth.Sound(testfile) 

Here's the code to extract the first three formant tracks, I guess it's more or less self-explanatory if you know Praat.

First, compute the occurrences of periodic instances in the signal:

f0min=75
f0max=300
pointProcess = praat.call(sound, "To PointProcess (periodic, cc)", f0min, f0max)

then, compute the formants:

formants = praat.call(sound, "To Formant (burg)", 0.0025, 5, 5000, 0.025, 50)

And finally assign formant values with times where they make sense (periodic instances)

numPoints = praat.call(pointProcess, "Get number of points")
f1_list = []
f2_list = []
f3_list = []
for point in range(0, numPoints):
    point += 1
    t = praat.call(pointProcess, "Get time from index", point)
    f1 = praat.call(formants, "Get value at time", 1, t, 'Hertz', 'Linear')
    f2 = praat.call(formants, "Get value at time", 2, t, 'Hertz', 'Linear')
    f3 = praat.call(formants, "Get value at time", 3, t, 'Hertz', 'Linear')
    f1_list.append(f1)
    f2_list.append(f2)
    f3_list.append(f3)