modnet.hyper_opt package

Submodules

Module contents

class modnet.hyper_opt.FitGenetic(data, custom_data=None, targets=None, weights=None, sample_threshold=5000, ignore_names=[])

Bases: object

Class optimizing the model parameters using a genetic algorithm.

Genetic algorithm hyperparameter optimization for MODNet.

Parameters:
  • data (MODData) – Training MODData

  • custom_data (np.ndarray) – Optional array of shape (n_sampels, n_custom_props) that will be appended to the targets (columns wise). This can be useful for defining custom loss functions.

  • targets (List) – Optional (for joint learning only). A nested list of targets names that defines the hierarchy of the output layers.

  • weights (Dict[str, float]) – Optional (for joint learning only). The relative loss weights to apply for each target.

  • sample_threshold (int, optional) – If the dataset size exceeds this threshold, individuals are trained on sampled subsets of this size. Defaults to 5000.

  • ignore_names (List) – Optional list of property names to ignore during feature selection. Feature selection will be performed w.r.t. all properties except the ones in ignore_names.

initialization_population(size_pop, multi_label, loss='mae', **fit_params)

Initializes the initial population (Generation 0).

Parameters:
  • size_pop (int) – Size of population.

  • multi_label (bool) – Whether the problem (if classification) is multi-label. In this case the softmax output-activation is replaced by a sigmoid.

  • loss (Union[str, Callable]) – The built-in tf.keras loss to pass to compile(...).

  • fit_params – Any additional parameters to pass to MODNetModel.fit(...),

Return type:

None

function_fitness(pop, n_jobs, nested=5, val_fraction=0.1, multi_label=False, fast=False)

Calculates the fitness of each model, which has the parameters contained in the pop argument. The function returns a list containing respectively the MAE calculated on the validation set, the model, and the parameters of that model.

Parameters:
  • pop (List[Individual]) – List of individuals

  • n_jobs (int) – number of jobs to parallelize on.

  • nested (int, optional) – CV fold size. Defaults to 5. Use <=0 for hold-out validation.

  • val_fraction (float, optional) – Validation fraction if no CV is used. Defaults to 0.1.

  • multi_label (Optional[bool]) – Whether the problem (if classification) is multi-label. In this case the softmax output-activation is replaced by a sigmoid.

  • fast (bool, optional) – Limited epochs for testing and debugging only. Defaults to False.

Returns:

val_losses, models, individuals

Return type:

None

run(size_pop=20, num_generations=10, prob_mut=None, nested=5, multi_label=False, loss='mae', n_jobs=None, early_stopping=4, refit=5, fast=False, **fit_params)

Run the GA and return best model.

Parameters:
  • size_pop (int, optional) – Size of the population per generation.. Defaults to 20.

  • num_generations (int, optional) – Size of the population per generation. Defaults to 10.

  • prob_mut (Optional[int], optional) – Probability of mutation. Defaults to None.

  • nested (Optional[int], optional) – CV fold size. Use 0 for hold-out validation (fraction of 0.1). Negative values and a value of 1 are equivalent to the default (5).

  • multi_label (bool) – Whether the problem (if classification) is multi-label. In this case the softmax output-activation is replaced by a sigmoid.

  • loss (Union[str, Callable]) – The built-in tf.keras loss to pass to compile(...).

  • n_jobs (Optional[int], optional) – Number of jobs to parallelize on. Defaults to None.

  • early_stopping (Optional[int], optional) – Number of successive generations without improvement before stopping. Defaults to 4.

  • refit (Optional[int], optional) – Whether to refit (>0) the best hyperparameters on the whole dataset or use the best Individual instead (=0). The amount corresponds to the number of models used in the ensemble. Defaults to 5.

  • fast (bool, optional) – Use only for debugging and testing. A fast GA run with small number of epochs, generations, individuals and folds. Overrides the size_pop, num_generation and nested arguments.. Defaults to False.

  • fit_params – Any additional parameters to pass to MODNetModel.fit(...),

Returns:

Fitted model with best hyperparameters

Return type:

EnsembleMODNetModel