Nkululeko: check your dataset

Within nkululeko, since version 0.53.0, you can perform automatic data checks, which means that some of your data might be filtered out if it doesn't fulfill certain requirements.

Currently two checks are implemented:

[DATA]
# check the filesize of all samples in train and test splits, in bytes
 check_size = 1000
# check if the files contain speech with voice activity detection (VAD)
 check_vad = True

VAD is using silero VAD

Leave a Reply

Your email address will not be published. Required fields are marked *