Monthly Archives: April 2021

How to adapt Speechalyzer/labeltool to your own labels/experiments

I wrote a java tool to annotate/transcribe speech data and would like to show in this blog how to edit the configuration for an adapted layout of the GUI (Labeltool is the GUI of Speechalyzer).
If you start Labeltool without a Speechalyzer server running it should give an error but for tis demonstration it could be ignored:

java -jar Labeltool.jar 

Your GUI might look like this or different, it depends on the configuration. The configuration is a text file called labeltool.config that should reside in the same folder like Labeltool.jar. You can open it with a text editor of your choice:

and in the upper section you can try out to hide or show GUi panels by setting the switches to true or false.
You can not switch off the Label panel as this is the most basic of Labeltool and always there. In the lower section you would find some button configurations:

So the categoryNames field decides which button series is shown (I hope the rest is self explanatory).
In the example config I depicted above, the Labeltool would look like this (if you closed and re-opened the GUI):

I you set


you will not be able to play any sound (because this logic is behinde the then-hidden play button)

Plot two parameters for categories

This is an examle how to plot values for two parameters in on plot and builds upon the dta generated at this example.
So, from the features you extracted you would isolate two parameters from the dataframe:

x1 = df_feats.loc[:, 'F0semitoneFrom27.5Hz_sma3nz_amean']
x2 = df_feats.loc[:, 'F0semitoneFrom27.5Hz_sma3nz_stddevNorm']

You'd need matplotlib

import matplotlib.pyplot as plt

You would color the dots according to the emotion they have been labeled with. Because the plot function does not accept string values as color designators but only numbers, you'd first have to convert them, e.g. with the LabelEncoder:

from sklearn import preprocessing
le = preprocessing.LabelEncoder()
c_vals = le.fit_transform(df_emo.emotion.values)

and then you can simply do the plot:

plt.scatter(x1, x2, c=c_vals)

How to create an audformat Database from a pandas Dataframe

This tutorial explaines how to intitialize an audformat database object from a data collection that's store in a pandas dataframe.
You can also find an official example using emo db here

First you would need the neccessary imports:

import os                       # file operations
import pandas as pd             # work with tables
pd.set_option('display.max_rows', 10)

import audformat.define as define  # some definitions
import audformat.utils as utils    # util functions
import audformat
import pickle

We load a sample pandas dataframe from a speech collection labeled with age and gender.

df = pickle.load(open('../files/sample_df.pkl', 'rb'))

We can then construct an audformat Databse object from this data like this

# remove the absolute path to the audio samples 
root = '/my/example/path/'
files = [file.replace(root, '') for file in df.index.get_level_values('file')]

# start with a general description
db = audformat.Database(
        'Short snippets  annotated by '
        'speaker and speaker age and gender.'
# add audio format information['microphone'] = audformat.Media(
# Describe the age data
db.schemes['age'] = audformat.Scheme(
    description='Speaker age in years',
# describe the gender data
db.schemes['gender'] = audformat.Scheme(
    description='Speaker sex',
# describe the speaker id data
db.schemes['speaker'] = audformat.Scheme(
    description='Name of the speaker',
# initialize a data table with an index which corresponds to the file names
db['files'] = audformat.Table(
# now add columns to the table for each data item of interest (age, gender and speaker id)
db['files']['age'] = audformat.Column(scheme_id='age')
db['files']['gender'] = audformat.Column(scheme_id='gender')
db['files']['speaker'] = audformat.Column(scheme_id='speaker')

and finally inspect the result


name: age-gender-sample
description: Short snippets annotated by speaker and speaker age and gender.
source: intern
usage: research
languages: [deu]
  microphone: {type: audio, format: wav, channels: 1, sampling_rate: 16000}
  age: {description: Speaker age in years, dtype: int, minimum: 0, maximum: 100}
    description: Speaker sex
    dtype: str
    labels: [female, male]
  speaker: {description: Name of the speaker, dtype: str}
    type: filewise
    media_id: microphone
      age: {scheme_id: age}
      gender: {scheme_id: gender}
      speaker: {scheme_id: speaker

and perhaps as a test get the unique valuesof all speakers:


Important: note that the path to the audiofiles needs to be relative to where the database.yaml file resides and is not allowed to start with "./", so if you do


this should result in something like