Since version 0.10, nkululeko supports facebook's wav2vec 2.0 embeddings as acoustic features.
This post shows you how to set this up.
Table of Contents
1) extend your environment
Firstly, set up your normal nkululeko installation
Then, within the activated environment, install the additional modules that are needed by the wav2vec model
pip install -r requirements_wav2vec.txt
2) download the pretrained model
Wav2vec 2.0 is an end-to-end architecture to do speech to text, but pretrained models can be used as embeddings (from the penultimate hidden layer) to represent speech audio features usable for speaker classification.
Facebook published several pretrained models that can be accessed
from hugginface.
a) get git-lfs
First, you should get the git version that supports large file download (the model we target here is about 1.5 GB).
In linux ubuntu, I do
sudo apt install git-lfs
with other operating systems you should google how to get git-lfs
b) download the model
Go to the huggingface repository for the model and copy the url.
Clone the repository somewhere on your computer:
git-lfs clone https://huggingface.co/facebook/wav2vec2-large-robust-ft-swbd-300h
set up nkululeko
in your nkululeko configuration (*.ini) file, set the feature extractor as eav2vec and denote the path to the model like this:
[FEATS]
type = wav2vec
model = /my path to the huggingface model/
That's all!