Category Archives: code

Plot two parameters for categories

This is an examle how to plot values for two parameters in on plot and builds upon the dta generated at this example.
So, from the features you extracted you would isolate two parameters from the dataframe:

x1 = df_feats.loc[:, 'F0semitoneFrom27.5Hz_sma3nz_amean']
x2 = df_feats.loc[:, 'F0semitoneFrom27.5Hz_sma3nz_stddevNorm']

You'd need matplotlib

import matplotlib.pyplot as plt

You would color the dots according to the emotion they have been labeled with. Because the plot function does not accept string values as color designators but only numbers, you'd first have to convert them, e.g. with the LabelEncoder:

from sklearn import preprocessing
le = preprocessing.LabelEncoder()
c_vals = le.fit_transform(df_emo.emotion.values)

and then you can simply do the plot:

plt.scatter(x1, x2, c=c_vals)

Feature scaling

Usually machine learning algorithms are not trained with raw data (aka end-to-end) but with features that model the entities of interest.
With respect to speech samples these features might be for example average pitch value over the whole utterance or length of utterance.

Now if the pitch value is given in Hz and the length in seconds, the pitch value will be in the range of [80, 300] and the length, say, in the range of [1.5, 6].
Machine learning approaches now would give higher consideration on the avr. pitch because the values are higher and differ by a larger amount, which is in the most cases not a good idea because it's a totally different feature.

A solution to this problem is to scale all values so that the features have a mean of 0 and standard deviation of 1.
This can be easily done with the preprocessing API from sklearn:

from sklearn import preprocessing
scaler = StandardScaler()
scaled_features = preprocessing.scaler.fit_transform(features)

Be aware that the use of the standard scaler only makes sense if the data follows a normal distribution.

Recording and transcribing a speech sample on Google colab“

Set up the recording method using java script:

# all imports
from IPython.display import Javascript
from google.colab import output
from base64 import b64decode

RECORD = """
const sleep  = time => new Promise(resolve => setTimeout(resolve, time))
const b2text = blob => new Promise(resolve => {
  const reader = new FileReader()
  reader.onloadend = e => resolve(e.srcElement.result)
var record = time => new Promise(async resolve => {
  stream = await navigator.mediaDevices.getUserMedia({ audio: true })
  recorder = new MediaRecorder(stream)
  chunks = []
  recorder.ondataavailable = e => chunks.push(
  await sleep(time)
  recorder.onstop = async ()=>{
    blob = new Blob(chunks)
    text = await b2text(blob)

def record(fn, sec):
  s = output.eval_js('record(%d)' % (sec*1000))
  b = b64decode(s.split(',')[1])
  with open(fn,'wb') as f:
  return fn

Record something:

 filename = 'felixtest.wav'
record(filename, 5)

Play it back:

import IPython

install Google speechbrain

!pip install speechbrain
import speechbrain as sb

Load the ASR nodel train on libri speech:

from speechbrain.pretrained import EncoderDecoderASR
asr_model = EncoderDecoderASR.from_hparams(source="speechbrain/asr-crdnn-rnnlm-librispeech", savedir="pretrained_model")

And get a transcript on your audio:

asr_model.transcribe_file(audio_file )

Record sound from microphone

This works if you got "PortAudio" on your system.

import audiofile as af
import sounddevice as sd

def record_audio(filename, seconds):
    fs = 16000
    print("recording {} ({}s) ...".format(filename, seconds))
    y = sd.rec(int(seconds * fs), samplerate=fs, channels=1)
    y = y.T
    af.write(filename, y, fs)
    print("  ... saved to {}".format(filename))