Segmenting means in this case: splitting a longer audio file based on speech pauses.
This post shows you how to record, segment and then label a speech recording using the Ina speech segmenter and Labeltool.
Table of Contents
Record audio
Firstly you need a recording. You might do that with your mobile phone or a microphone connected to your computer using, for example, Audacity.
I'd recommend recording / changing the sample rate to 16 kHz, as this is sufficient for speech recordings.
Let's say you stored your recording in the file longer_test.wav inside a directory named utterances.
Segment the recording
We start doing the segmentation in a python script.
You need some packages installed: pandas, inaSpeechSegmenter and audformat
# we start with the imports
import pandas as pd
from inaSpeechSegmenter import Segmenter
from inaSpeechSegmenter.export_funcs import seg2csv, seg2textgrid
from audformat.utils import to_filewise_index
from audformat import segmented_index
# we then use variables for our recording:
root = './utterance/'
media = 'longer_test.wav'
# the INA speech segmenter is used very easy:
seg = Segmenter()
segmentation = seg(root+media)
# if curious, try:
print(segmentation)
# then collect the segments that were recognized as human, either female or male:
files, starts, ends = [], [], []
for entry in segmentation:
kind = entry[0]
start = entry[1]
end = entry[2]
if kind == 'female' or kind == 'male':
print (f'{media}, {start}, {end}')
files.append(media)
starts.append(start)
ends.append(end)
seg_index = segmented_index(files, starts, ends)
# this index can now be used by audformat to acutally cut the audio file into segments
df = pd.DataFrame(index = seg_index)
file_list = to_filewise_index(df , root, 'audio_out', progress_bar = True)
# the resulting list can be stored to disk:
file_list.to_csv('file_list.csv', header=False)
Label the recording
labeling means to add metadata to the samples, for example emotional arousal.
There are hundreds of tools to do this, I use of course the one i programmed myself, Speechalyzer 😉
Here's a tutorial how to set this up and how to adapt the tool
If running on linux, you could then start the Speechalyzer with the file list you created like this:
java -jar ~/research/Speechalyzer/Speechalyzer.jar -cf ~/research/Speechalyzer/res/speechalyzer.properties -fl file_list.csv
and then simply start the Labeltool to label the files.
Speechalyzer can then export the labels to a file which can be used by Nkululeko as a labeled speech database in CSV format.