Meta parameters are the specific configurations of a classifier, e.g. the number of layers in a neural network or the kernel function with a support vector machine.
Like everything else, these can (and should) be optimized for the best result, a process called meta parameter optimization / tuning.
With respect to meta parameter optimization (i.e. find the optimal classifier configuration), nkululeko offers three possibilities.
Table of Contents
Using nkululeko module with tuning params
To be used for simple optimization with few parameters on a grid search (using sklearn methods under the hood).
With linear classifiers that are derived from sklearn, you can simply state your variants for a meta parameter in the ini file:
[MODEL]
type = svm
tuning_params = ['C']
scoring = recall_macro
C = [10, 1, 0.1, 0.01, 0.001, 0.0001]
This will iterate the C parameter of the SVM classifier by the stated values and choose the best performing model.
You can have several "tuning_params" and them a grid search (combining everything with each other) will be performed.
Here's an example for XGB classifier:
[MODEL]
type = xgb
tuning_params = ['subsample', 'n_estimators', 'max_depth']
subsample = [.5, .7]
n_estimators = [50, 80, 200]
max_depth = [1, 6]
The optim module
There is a special module for this: the optim module, which offers search in a random space.
Besides grid search, it offers the following search strategies:
- random: Random sampling from parameter distributions
- halving_grid: Successive halving with grid search
- halving_random: Successive halving with random search
Here is an example for a config file (with a dedicated OPTIM section).
[MODEL]
type = svm
[OPTIM]
model = svm
search_strategy = halving_random
n_iter = 15
cv_folds = 3
C_val = [0.1, 1.0, 10.0, 100.0, 1000.0]
kernel = ["linear", "rbf", "poly"]
gamma = ["scale", "auto", 0.001, 0.01, 0.1, 1.0]
metric = uar
You call it like this:
python -m nkululekooptim --config myConfig.ini
Use nkululeko as a package in your own python code
If you want to compare models or feature sets, you will need to run several experiments. This could be done manually, with a bash-script, or in Python using nkululeko as a package.
Here's one idea how to find the optimal values for 2 layers of an MLP net with nkululeko:
- store your meta-parameters in arrays
- loop over them and initialize an experiment each time
- keep the experiment name but change your parameters and the plot name
- this way you can re-use your extracted features and do not get your harddisk cluttered.
Here's some python code to illustrate this idea:
def main(config_file):
# load one configuration per experiment
config = configparser.ConfigParser()
config.read(config_file)
util = Util()
l1s = [32, 64, 128]
l2s = [16, 32, 64]
for l1 in l1s:
for l2 in l2s:
# create a new experiment
expr = exp.Experiment(config)
plotname = f'{util.get_exp_name()}_{l1}_{l2}'
util.set_config_val('PLOT', 'name', plotname)
print(f'running {expr.name} with layers {l1} and {l2}')
layers = {'l1':l1, 'l2':l2}
util.set_config_val('MODEL', 'layers', layers)
# load the data
expr.load_datasets()
# split into train and test
expr.fill_train_and_tests()
util.debug(f'train shape : {expr.df_train.shape}, test shape:{expr.df_test.shape}')
# extract features
expr.extract_feats()
util.debug(f'train feats shape : {expr.feats_train.df.shape}, test feats shape:{expr.feats_test.df.shape}')
# initialize a run manager
expr.init_runmanager()
# run the experiment
expr.run()
print('DONE')
Keep in mind though that meta parameter optimization like done here is in itself a learning problem. It is usually not feasible to systematically try out all combinations of possible values and thus some kind of stochastic approach is preferable.