modnet.hyper_opt.fit_genetic module¶

class modnet.hyper_opt.fit_genetic.Individual(max_feat, num_classes, multi_label, loss='mae', targets=None, weights=None, **fit_params)¶

Bases: object

Class representing a set of hyperparameters for the genetic algorithm.

Parameters:

max_feat (int) – Maximum number of features
num_classes (dict) – MODData num_classes parameter.Used for distinguishing between regression and classification.
multi_label (bool) – whether the task is a classification multi-label problem.
loss (Union[str, Callable]) – The built-in tf.keras loss to pass to compile(...).
targets (List) – Optional (for joint learning only). A nested list of targets names that defines the hierarchy of the output layers.
weights (Dict[str, float]) – Optional (for joint learning only). The relative loss weights to apply for each target.
fit_params – Any additional parameters to pass to MODNetModel.fit(...),

Return type:

Individual

crossover(partner)¶

Does the crossover of two parents and returns a ‘child’ which has a mix of the parents hyperparams.

Parameters:: partner (Individual) – Partner individual.
Returns:: Child.
Return type:: Individual

mutation(prob_mut)¶

Performs mutation in the hyper parameters in order to maintain diversity in the population.

Parameters:: prob_mut (float) – Probability [0,1] of mutation.
Return type:: None

Returns: None (inplace operator).

evaluate(train_data, val_data, fast=False)¶

Internally evaluates the validation loss by setting self.val_loss

Parameters:

train_data (MODData) – Training MODData
val_data (MODData) – Validation MODData
fast (bool, optional) – Limited epoch for testing or debugging only. Defaults to False.

refit_model(data, n_models=10, n_jobs=1, fast=False)¶

Refit inner model on specified data. :param data: Training data :type data: MODData :param fast: Limited epoch for testing or debugging only. Defaults to False. :type fast: bool, optional

Parameters:

data (MODData) –
fast (bool) –

class modnet.hyper_opt.fit_genetic.FitGenetic(data, custom_data=None, targets=None, weights=None, sample_threshold=5000, ignore_names=[])¶

Bases: object

Class optimizing the model parameters using a genetic algorithm.

Genetic algorithm hyperparameter optimization for MODNet.

Parameters:

data (MODData) – Training MODData
custom_data (np.ndarray) – Optional array of shape (n_sampels, n_custom_props) that will be appended to the targets (columns wise). This can be useful for defining custom loss functions.
targets (List) – Optional (for joint learning only). A nested list of targets names that defines the hierarchy of the output layers.
weights (Dict[str, float]) – Optional (for joint learning only). The relative loss weights to apply for each target.
sample_threshold (int, optional) – If the dataset size exceeds this threshold, individuals are trained on sampled subsets of this size. Defaults to 5000.
ignore_names (List) – Optional list of property names to ignore during feature selection. Feature selection will be performed w.r.t. all properties except the ones in ignore_names.

initialization_population(size_pop, multi_label, loss='mae', **fit_params)¶

Initializes the initial population (Generation 0).

Parameters:

size_pop (int) – Size of population.
multi_label (bool) – Whether the problem (if classification) is multi-label. In this case the softmax output-activation is replaced by a sigmoid.
loss (Union[str, Callable]) – The built-in tf.keras loss to pass to compile(...).
fit_params – Any additional parameters to pass to MODNetModel.fit(...),

Return type:

None

function_fitness(pop, n_jobs, nested=5, val_fraction=0.1, multi_label=False, fast=False)¶

Calculates the fitness of each model, which has the parameters contained in the pop argument. The function returns a list containing respectively the MAE calculated on the validation set, the model, and the parameters of that model.

Parameters:

pop (List[Individual]) – List of individuals
n_jobs (int) – number of jobs to parallelize on.
nested (int, optional) – CV fold size. Defaults to 5. Use <=0 for hold-out validation.
val_fraction (float, optional) – Validation fraction if no CV is used. Defaults to 0.1.
multi_label (Optional[bool]) – Whether the problem (if classification) is multi-label. In this case the softmax output-activation is replaced by a sigmoid.
fast (bool, optional) – Limited epochs for testing and debugging only. Defaults to False.

Returns:

val_losses, models, individuals

Return type:

None

run(size_pop=20, num_generations=10, prob_mut=None, nested=5, multi_label=False, loss='mae', n_jobs=None, early_stopping=4, refit=5, fast=False, **fit_params)¶

Run the GA and return best model.

Parameters:

size_pop (int, optional) – Size of the population per generation.. Defaults to 20.
num_generations (int, optional) – Size of the population per generation. Defaults to 10.
prob_mut (Optional[int], optional) – Probability of mutation. Defaults to None.
nested (Optional[int], optional) – CV fold size. Use 0 for hold-out validation (fraction of 0.1). Negative values and a value of 1 are equivalent to the default (5).
multi_label (bool) – Whether the problem (if classification) is multi-label. In this case the softmax output-activation is replaced by a sigmoid.
loss (Union[str, Callable]) – The built-in tf.keras loss to pass to compile(...).
n_jobs (Optional[int], optional) – Number of jobs to parallelize on. Defaults to None.
early_stopping (Optional[int], optional) – Number of successive generations without improvement before stopping. Defaults to 4.
refit (Optional[int], optional) – Whether to refit (>0) the best hyperparameters on the whole dataset or use the best Individual instead (=0). The amount corresponds to the number of models used in the ensemble. Defaults to 5.
fast (bool, optional) – Use only for debugging and testing. A fast GA run with small number of epochs, generations, individuals and folds. Overrides the size_pop, num_generation and nested arguments.. Defaults to False.
fit_params – Any additional parameters to pass to MODNetModel.fit(...),

Returns:

Fitted model with best hyperparameters

Return type:

EnsembleMODNetModel