Nkululeko is a tool to ease machine learning on speech databases.
This tutorial should help you to import databases.
There are two formats upported:
1) csv (comma seperated values)
The easiest is CSV, you simply create a table with the following informations:
- file: the path to the audio file
- speaker: a speaker identifier
- gender: the biological sex (has quite an influence on the voice, so sometimes submodeling makes senss)
- task: is the speaker characteristics value that you want to explore, e.g. age or emotion, or both
and then fill it with values of your database.
So a file for emotion might look like this
file, speaker, gender, emotion <path to>/s12343.wav, s1, female, happy ...
You can then specify the data in your initialization file like this:
[DATA] databases = ['my_db'] my_db.type = csv my_db = <path to>/my_data_file.csv my_db.absolute_path = False ... target = emotion
You should set the flag absolute_path depending on whether
- the file paths start from the location of where you run Nkululeko (or start from root: /), then True
- or they start from the location where the data resides, then False
(if in doubt, just try it out: there should be an error message that the audio files don't exist)
You can not specify split tables with this format, but would have to simply split the file in several databases.
There is an example on how to import the ravdess database here.
And this would be an example ini file to use it:
[EXP] root = ./tests/results/ name = exp_ravdess runs = 1 epochs = 1 save = True [DATA] databases = ['train', 'test', 'dev'] train = ../nkululeko/data/ravdess/ravdess_train.csv train.type = csv train.absolute_path = False train.split_strategy = train dev = ../nkululeko/data/ravdess/ravdess_dev.csv dev.type = csv dev.absolute_path = False dev.split_strategy = train test = ../nkululeko/data/ravdess/ravdess_test.csv test.type = csv test.absolute_path = False test.split_strategy = test target = emotion labels = ['angry', 'happy', 'neutral', 'sad'] [FEATS] type = ['os'] scale = standard [MODEL] type = xgb
I.e. the splits train and dev get concatenated to a common train set
audformat allows for many usecases, so the specification might be more complex.
So in the easiest case you have a database with two tables, one called files that contains the speaker informations (id and sex) and one called like your task (aka target), so for example age or emotion.
That's the case for our demo example, the Berlin EmoDB, ando so you can include it simply with.
[DATA] databases = ['emodb'] emodb = /<path to>/emodb/ target = emotion ...
But if there are more tables and they have special names, you can specifiy them like this:
[DATA] databases = ['msp'] # path to data msp = /<path to>/msppodcast/ # tables with speaker information msp.files_tables = ['files.test-1', 'files.train'] # tables with task labels msp.target_tables = ['emotion.test-1', 'emotion.train'] # train and evaluation splits will be provided msp.split_strategy = specified # here are the test/evaluatoin split tables msp.test_tables = ['emotion.test-1'] # here are the training tables msp.train_tables = ['emotion.train'] target = emotion