Soft-labeling means to annotate data with labels that were predicited by a machine classifier.
As they were not evaluated by a human, you might call them "soft".
Two steps are necessary:
1) save a test/evaluation set as a new database
2) load this new database in a new experiment as training data
Within nkululeko, you would do it like this:
step 1: save a new database
You simply specifify a name in the EXP section to save the test predictions like this:
[EXP]
...
save_test = ./my_test_predictions.csv
You need Model save to be turned on, because it will look for the best performing model:
[MODEL]
...
save = True
This will store the new database to a file called my_test_predictions.csv in the folder where the python was called.
step 2: load as training data
Here is an example configuration how to load this data as additional training (in this case in addition to emodb):
[EXP]
root = ./tests/
name = exp_add_learned
runs = 1
epochs = 1
[DATA]
strategy = cross_data
databases = ['emodb', 'learned', 'test_db']
trains = ['emodb', 'learned']
tests = ['test_db']
emodb = /path-to-emodb/
emodb.split_strategy = speaker_split
emodb.mapping = {'anger':'angry', 'happiness':'happy', 'sadness':'sad', 'neutral':'neutral'}
test_db = /path to test database/
test_db.mapping = <if any mapping to the target categories is needed>
learned = ./my_test_predictions.csv
learned.type = csv
target = emotion
labels = ['angry', 'happy', 'neutral', 'sad']
[FEATS]
type = os
[MODEL]
type = xgb
save = True
[PLOT]