Tag Archives: feature extractor

How to set up wav2vec embedding for nkululeko

Since version 0.10, nkululeko supports facebook's wav2vec 2.0 embeddings as acoustic features.
This post shows you how to set this up.

set up nkululeko

in your nkululeko configuration (*.ini) file, set the feature extractor as wav2vec2 and denote the path to the model like this:

[FEATS]
type = ['wav2vec2']
wav2vec.model = /my path to the huggingface model/

Alternatively you can state the huggingface model name directly:

[FEATS]
type = ['wav2vec2-base-960h']

Out of the box, as embeddings the last hidden layer is used. But the original wav2vec2 model consists of 7 CNN layers followed by up to 24 transformer layers. if you like to use an earlier layer than the last one, you can simply count down-

[FEATS]
type = wav2vec2
wav2vec.layer = 12

This would use the 12th layer of a 24 layer model and only the4 CNN layers of a 12 layer model.