Nkululeko: generate a latex/pdf report

With nkululeko since version 0.66.3, a report document formatted in Latex and compiled as a PDF file can automatically be generated, basically as a compilation of the images that are generated.
There is a dedicated REPORT section in the config file for this, here is an example:

# should the report be shown in the terminal at the end?
show = False 
# should a latex/pdf file be printed? if so, state the filename
latex = emodb_report
# name of the experiment author (default "anon")
author = Felix
# title of the report (default "report")
title = EmoDB

with each run of a nkululeko module in the same experiment environment, the details of the report will be added.
So a typical use would be, to first run the general module and than more specialized ones:

# first run a segmentation 
python -m nkululeko.segment --config myconf.ini 
# then rename the data-file in the config.ini and
# run some data exploration
python -m nkululeko.explore --config myconf.ini 
# then run a machine learning experiment
python -m nkululeko.nkululeko --config myconf.ini 

Each run will add some contents to the report


If you use modules, feature-extractors or models that use torchaudio with Nkululeko, like e.g . Resampler or Squim model, you need to install the nightly version.

pip uninstall -y torch torchvision torchaudio
pip install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu

Nkululeko: get some statistics on correlation and effect size

With nkululeko since version 0.64.0, some statistics are printed as part of the plot's titles.
With the explore module, you can plot correlations between the target (e.g. emotion or age) and other variables that are in the database, e.g. gender or duration, or everything you might have predicted with the predict module.
You need to differentiate if any of your variables is categorical/nominal (strings) or real valued (numbers).

If you plot the distribution of two categories, the Chi2 statistics is used to estimate if the correlation is significant. A p-value is given in the title, like e.g. in this plot:

If you plot the distribution of two real-valued variables, the correlation will be estimated by the Pearson's correlation coefficient:

If the target is categorical and the variable real-valued, we use Cohen's d to print out the maximal effect size of a category combination and the variable:

If the target is also real-valued, it will be binned (made categorical) per default:

If you want to prevent that, you can set a value in the configuration:

bin_reals = False

and you get a different plot: