Nkululeko: how to plot distributions of feature values

As shown in this post, with Nkululeko you can select only specific features from your features sets by specifying them in the [FEAT] section:

[FEATS]
features = ['JitterPCA', 'meanF0Hz', 'hld_sylRate']

What you can also do, is plotting them per category (only for classification), by specifying in the PLOT section if you would like that for all samples or only test or train samples:

[EXPL]
# turn it on
feature_distributions = True 
# use only training samples
sample_selection = train 
# only plot the 5 most important features 
max_feats = 5  
# also plot this feature, irrespective of their importance
plot_features = ['f3_median']
# turn on output of all statistical tests
print_stats = True

You would have to call nkululeko with the explore interface:

python -m nkululeko.explore --config <myConfig.ini>

The image file is in the image folder and should look similar to this:

Stating in the title:

  • the number of samples
  • The outcome of a Kruskal-Wallis test (if more than two catgories)
  • The outcome of the most significant difference according to a t-test (you see all outcomes in the debug output if print_stats=True)

or, like this (for the additional feature).
You can turn gender specifics off by specifying

[EXPL]
ignore_gender = True

Leave a Reply

Your email address will not be published. Required fields are marked *