Since version 0.10, nkululeko supports facebook's wav2vec 2.0 embeddings as acoustic features.
This post shows you how to set this up.
set up nkululeko
in your nkululeko configuration (*.ini) file, set the feature extractor as wav2vec2 and denote the path to the model like this:
[FEATS]
type = ['wav2vec2']
wav2vec.model = /my path to the huggingface model/
Alternatively you can state the huggingface model name directly:
[FEATS]
type = ['wav2vec2-base-960h']
Out of the box, as embeddings the last hidden layer is used. But the original wav2vec2 model consists of 7 CNN layers followed by up to 24 transformer layers. if you like to use an earlier layer than the last one, you can simply count down-
[FEATS]
type = wav2vec2
wav2vec.layer = 12
This would use the 12th layer of a 24 layer model and only the4 CNN layers of a 12 layer model.