How to compare formant tracks extracted with opensmile vs. Praat

Note to install not parselmouth but the package praat-parselmouth:

!pip install praat-parselmouth

First, some imports

import pandas as pd
import parselmouth 
from parselmouth import praat
import opensmile
import audiofile

Then, a test file:

testfile = '/home/felix/data/data/audio/testsatz.wav'
signal, sampling_rate = audiofile.read(testfile)
print('length in seconds: {}'.format(len(signal)/sampling_rate))

Get the opensmile formant tracks by copying them from the official GeMAPS config file

smile = opensmile.Smile(
    feature_set=opensmile.FeatureSet.GeMAPSv01b,
    feature_level=opensmile.FeatureLevel.LowLevelDescriptors,
)
result_df = smile.process_file(testfile)
centerformantfreqs = ['F1frequency_sma3nz', 'F2frequency_sma3nz', 'F3frequency_sma3nz']
formant_df = result_df[centerformantfreqs]

Get the Praat tracks (smile configuration computes every 10 msec with frame length 20 msec)

sound = parselmouth.Sound(testfile) 
formants = praat.call(sound, "To Formant (burg)", 0.01, 4, 5000, 0.02, 50)
f1_list = []
f2_list = []
f3_list = []
for i in range(2, formants.get_number_of_frames()+1):
    f1 = formants.get_value_at_time(1, formants.get_time_step()*i)
    f2 = formants.get_value_at_time(2, formants.get_time_step()*i)
    f3 = formants.get_value_at_time(3, formants.get_time_step()*i)
    f1_list.append(f1)
    f2_list.append(f2)
    f3_list.append(f3)

To be sure: compare the size of the output:

print('{}, {}'.format(result_df.shape[0], len(f1_list)))

combine and inspect the result:

formant_df['F1_praat'] = f1_list
formant_df['F2_praat'] = f2_list
formant_df['F3_praat'] = f3_list
formant_df.head()

4 thoughts on “How to compare formant tracks extracted with opensmile vs. Praat”

  1. Thank you for this useful tutorial.
    I checked a few vowels and I got surprised by how much the results by OpenSmile are unreliable (at least compared to Praat).

      1. Can you point me to a good tutorial on the smoothing approaches that will make OpenSmile perform closer to Praat? What feature sets might produce better results as opposed to GeMAPSv01b?

        Thanks!

        1. Sorry, i don’t think there are any tutorials to smoothing approaches for opensmile LLDs.
          Wrt second question: embeddings from transformer pre-trained networks like e.g. wav2vec2 usually work much better, when dealing with speech data

Leave a Reply to Payam Cancel reply

Your email address will not be published. Required fields are marked *