Within nkululeko, since version 0.53.0, you can perform automatic data checks, which means that some of your data might be filtered out if it doesn't fulfill certain requirements.
Currently two checks are implemented:
[DATA]
# check the filesize of all samples in train and test splits, in bytes
check_size = 1000
# check if the files contain speech with voice activity detection (VAD)
check_vad = True
VAD is using silero VAD