Nkululeko: get some statistics on correlation and effect size

With nkululeko since version 0.64.0, some statistics are printed as part of the plot's titles.
With the explore module, you can plot correlations between the target (e.g. emotion or age) and other variables that are in the database, e.g. gender or duration, or everything you might have predicted with the predict module.
You need to differentiate if any of your variables is categorical/nominal (strings) or real valued (numbers).

If you plot the distribution of two categories, the Chi2 statistics is used to estimate if the correlation is significant. A p-value is given in the title, like e.g. in this plot:

If you plot the distribution of two real-valued variables, the correlation will be estimated by the Pearson's correlation coefficient:

If the target is categorical and the variable real-valued, we use Cohen's d to print out the maximal effect size of a category combination and the variable:

If the target is also real-valued, it will be binned (made categorical) per default:

If you want to prevent that, you can set a value in the configuration:

[EXPL]
bin_reals = False

and you get a different plot:

Leave a Reply

Your email address will not be published. Required fields are marked *