Segmenting a database means to split the audio samples of a database into smaller segments or chunks. With speech data this is usually done on the basis of VAD, aka voice activity detection, meaning that the pauses between speech in the audio samples are used as segment borders.
The reason for segmenting could be to label the data with something that would not last over the whole sample, e.g. emotional state.
Another motivation to segment audio data might be that the acoustic features are targeted at a specific stretch of audio, e.g. 3-5 seconds long.
You simply call your experiment configuration with the segment module, and the train, test set or both will be segmented.
The advantage is, that you can use all filters on your data that might make sense beforehand, for example with the android corpus, only the reading task samples are not segmented.
You can select them like so:
filter = [['task', 'reading']]
and then call the segment module:
python -m nkululeko.segment --config my_conf.ini
The output is a new database file in CSV format.
If you want, you can specify if only the training, or test split, or both should be segmented, as well as the string that is added to the name of the resulting csv file (the name per default consists of the database names):
# name postfix
target = _segmented
# which model to use
method = silero
# which split: train, test or all (both)
sample_selection = all
# the minimum lenght of rest-samples (in seconds)
min_length = 2
# the maximum length of segments, longer ones are cut here. (in seconds)
max_length = 10