How to set up wav2vec embedding for nkululeko

Since version 0.10, nkululeko supports facebook's wav2vec 2.0 embeddings as acoustic features.
This post shows you how to set this up.

1) extend your environment

Firstly, set up your normal nkululeko installation
Then, within the activated environment, install the additional modules that are needed by the wav2vec model

pip install -r requirements_wav2vec.txt

2) download the pretrained model

Wav2vec 2.0 is an end-to-end architecture to do speech to text, but pretrained models can be used as embeddings (from the penultimate hidden layer) to represent speech audio features usable for speaker classification.
Facebook published several pretrained models that can be accessed
from hugginface.

a) get git-lfs

First, you should get the git version that supports large file download (the model we target here is about 1.5 GB).
In linux ubuntu, I do

sudo apt install git-lfs

with other operating systems you should google how to get git-lfs

b) download the model

Go to the huggingface repository for the model and copy the url.

Clone the repository somewhere on your computer:

git-lfs clone https://huggingface.co/facebook/wav2vec2-large-robust-ft-swbd-300h

set up nkululeko

in your nkululeko configuration (*.ini) file, set the feature extractor as eav2vec and denote the path to the model like this:

[FEATS]
type = wav2vec
model = /my path to the huggingface model/

That's all!

Leave a Reply

Your email address will not be published. Required fields are marked *