Nkululeko: classifying continuous variables

Nkululeko supports classification and regression.
Classification means predicting a class (or category) from data, regression predicting a continuous value, as for example the speaker age in years.

If you want to use classification with continuous variables, you need to first bin it, which means that you put the values into pre-defined bins. To stay with our age example, you'd declare everyone above 50 years as old and all other as young.

This post shows you how to do that with Nkululeko by setting up your .ini file.

You set up the experiment as classification type:

[EXP]
...
type = classification

But declare the data to be continuous:

[DATA]
...
type = continuous
labels = ['u40', '40ies', '50ies', '60ies', 'ΓΌ70']
bins  = [-1000,  40, 50, 60, 70, 1000]

Then the data will be binned according to the sepecified bins and labeled accordingly.
You need (number of labels) + 1 values for the bins, as they are given lower and upper limit. It makes sense to set the lower and upper absolute limits extreme as you don't know what the classifier will predict.

Leave a Reply

Your email address will not be published. Required fields are marked *