feature extractor | speechsurfer

Since version 0.10, nkululeko supports facebook's wav2vec 2.0 embeddings as acoustic features.
This post shows you how to set this up.

set up nkululeko

in your nkululeko configuration (*.ini) file, set the feature extractor as wav2vec2 and denote the path to the model like this:

[FEATS]
type = ['wav2vec2']
wav2vec.model = /my path to the huggingface model/

Alternatively you can state the huggingface model name directly:

[FEATS]
type = ['wav2vec2-base-960h']

Out of the box, as embeddings the last hidden layer is used. But the original wav2vec2 model consists of 7 CNN layers followed by up to 24 transformer layers. if you like to use an earlier layer than the last one, you can simply count down-

[FEATS]
type = wav2vec2
wav2vec.layer = 12

This would use the 12th layer of a 24 layer model and only the4 CNN layers of a 12 layer model.

speechsurfer

Tag Archives: feature extractor

How to set up wav2vec embedding for nkululeko

set up nkululeko

blog around speech technology