modnet.hyper_opt.fit_genetic module

class modnet.hyper_opt.fit_genetic.Individual(max_feat, num_classes, multi_label, loss='mae', targets=None, weights=None, **fit_params)

Bases: object

Class representing a set of hyperparameters for the genetic algorithm.

Parameters:
  • max_feat (int) – Maximum number of features

  • num_classes (dict) – MODData num_classes parameter.Used for distinguishing between regression and classification.

  • multi_label (bool) – whether the task is a classification multi-label problem.

  • loss (Union[str, Callable]) – The built-in tf.keras loss to pass to compile(...).

  • targets (List) – Optional (for joint learning only). A nested list of targets names that defines the hierarchy of the output layers.

  • weights (Dict[str, float]) – Optional (for joint learning only). The relative loss weights to apply for each target.

  • fit_params – Any additional parameters to pass to MODNetModel.fit(...),

Return type:

Individual

crossover(partner)

Does the crossover of two parents and returns a ‘child’ which has a mix of the parents hyperparams.

Parameters:

partner (Individual) – Partner individual.

Returns:

Child.

Return type:

Individual

mutation(prob_mut)

Performs mutation in the hyper parameters in order to maintain diversity in the population.

Parameters:

prob_mut (float) – Probability [0,1] of mutation.

Return type:

None

Returns: None (inplace operator).

evaluate(train_data, val_data, fast=False)

Internally evaluates the validation loss by setting self.val_loss

Parameters:
  • train_data (MODData) – Training MODData

  • val_data (MODData) – Validation MODData

  • fast (bool, optional) – Limited epoch for testing or debugging only. Defaults to False.

refit_model(data, n_models=10, n_jobs=1, fast=False)

Refit inner model on specified data. :param data: Training data :type data: MODData :param fast: Limited epoch for testing or debugging only. Defaults to False. :type fast: bool, optional

Parameters:
class modnet.hyper_opt.fit_genetic.FitGenetic(data, custom_data=None, targets=None, weights=None, sample_threshold=5000, ignore_names=[])

Bases: object

Class optimizing the model parameters using a genetic algorithm.

Genetic algorithm hyperparameter optimization for MODNet.

Parameters:
  • data (MODData) – Training MODData

  • custom_data (np.ndarray) – Optional array of shape (n_sampels, n_custom_props) that will be appended to the targets (columns wise). This can be useful for defining custom loss functions.

  • targets (List) – Optional (for joint learning only). A nested list of targets names that defines the hierarchy of the output layers.

  • weights (Dict[str, float]) – Optional (for joint learning only). The relative loss weights to apply for each target.

  • sample_threshold (int, optional) – If the dataset size exceeds this threshold, individuals are trained on sampled subsets of this size. Defaults to 5000.

  • ignore_names (List) – Optional list of property names to ignore during feature selection. Feature selection will be performed w.r.t. all properties except the ones in ignore_names.

initialization_population(size_pop, multi_label, loss='mae', **fit_params)

Initializes the initial population (Generation 0).

Parameters:
  • size_pop (int) – Size of population.

  • multi_label (bool) – Whether the problem (if classification) is multi-label. In this case the softmax output-activation is replaced by a sigmoid.

  • loss (Union[str, Callable]) – The built-in tf.keras loss to pass to compile(...).

  • fit_params – Any additional parameters to pass to MODNetModel.fit(...),

Return type:

None

function_fitness(pop, n_jobs, nested=5, val_fraction=0.1, multi_label=False, fast=False)

Calculates the fitness of each model, which has the parameters contained in the pop argument. The function returns a list containing respectively the MAE calculated on the validation set, the model, and the parameters of that model.

Parameters:
  • pop (List[Individual]) – List of individuals

  • n_jobs (int) – number of jobs to parallelize on.

  • nested (int, optional) – CV fold size. Defaults to 5. Use <=0 for hold-out validation.

  • val_fraction (float, optional) – Validation fraction if no CV is used. Defaults to 0.1.

  • multi_label (Optional[bool]) – Whether the problem (if classification) is multi-label. In this case the softmax output-activation is replaced by a sigmoid.

  • fast (bool, optional) – Limited epochs for testing and debugging only. Defaults to False.

Returns:

val_losses, models, individuals

Return type:

None

run(size_pop=20, num_generations=10, prob_mut=None, nested=5, multi_label=False, loss='mae', n_jobs=None, early_stopping=4, refit=5, fast=False, **fit_params)

Run the GA and return best model.

Parameters:
  • size_pop (int, optional) – Size of the population per generation.. Defaults to 20.

  • num_generations (int, optional) – Size of the population per generation. Defaults to 10.

  • prob_mut (Optional[int], optional) – Probability of mutation. Defaults to None.

  • nested (Optional[int], optional) – CV fold size. Use <=0 for hold-out validation. Defaults to 5.

  • multi_label (bool) – Whether the problem (if classification) is multi-label. In this case the softmax output-activation is replaced by a sigmoid.

  • loss (Union[str, Callable]) – The built-in tf.keras loss to pass to compile(...).

  • n_jobs (Optional[int], optional) – Number of jobs to parallelize on. Defaults to None.

  • early_stopping (Optional[int], optional) – Number of successive generations without improvement before stopping. Defaults to 4.

  • refit (Optional[int], optional) – Whether to refit (>0) the best hyperparameters on the whole dataset or use the best Individual instead (=0). The amount corresponds to the number of models used in the ensemble. Defaults to 0.

  • fast (bool, optional) – Use only for debugging and testing. A fast GA run with small number of epochs, generations, individuals and folds. Overrides the size_pop, num_generation and nested arguments.. Defaults to False.

  • fit_params – Any additional parameters to pass to MODNetModel.fit(...),

Returns:

Fitted model with best hyperparameters

Return type:

EnsembleMODNetModel