modnet.models.ensemble module

This submodule implements the EnsembleMODNetModel, an extension of the vanilla model that bootstraps uncertainties from multiple MODNet models, trained in parallel.

class modnet.models.ensemble.EnsembleMODNetModel(*args, n_models=100, bootstrap=True, models=None, modnet_models=None, random_state=None, **kwargs)

Bases: MODNetModel

Container class for n_model (Bootstrap) MODNetModels, that handles setting up the architecture, activations, training and learning curve.


The number of features used in the model.


The relative loss weights for each target.


The list of column names used in training the model.


The keras.model.Model of the network itself.


The list of targets names that the model was trained for.

  • *args – See MODNetModel

  • n_models – number of inner MODNetModels, each model has the same architecture defined by the args nd kwargs.

  • bootstrap – whether to bootstrap the samples for each inner MODNet fit.

  • models – List of user provided MODNetModels. Enables to have different architectures. n_models is discarded in this case.

  • random_state (Optional[int]) – fix a random state for use with this model.

  • modnet_model – Deprecated. Same argument as models. For backward compatibility only.

  • **kwargs – See MODNetModel

can_return_uncertainty = True
fit(training_data, n_jobs=1, **kwargs)

Train the model on the passed training MODData object.

Parameters match those of


training_data (MODData) –

Return type:


predict(test_data, return_unc=False, return_prob=False, remap_out_of_bounds=True)

Predict the target values for the passed MODData.

  • test_data (MODData) – A featurized and feature-selected MODData object containing the descriptors used in training.

  • return_prob (bool) – For a classification task only: whether to return the probability of each class OR only return the most probable class.

  • return_unc (bool) – whether to return a second dataframe containing the uncertainties

  • remap_out_of_bounds (bool) – whether to remap out-of-bounds values to the nearest bound.


A pandas.DataFrame containing the predicted values of the targets.

Return type:



Evaluates the target values for the passed MODData by returning the corresponding loss.


test_data (MODData) – A featurized and feature-selected MODData object containing the descriptors used in training.


Loss score

Return type:


fit_preset(data, presets=None, val_fraction=0.15, verbose=0, classification=False, refit=False, fast=False, nested=5, callbacks=None, n_jobs=1)

Chooses an optimal hyper-parametered MODNet model from different presets.

This function implements the “inner loop” of a cross-validation workflow. By modifying the nested argument, it can be run in full nested mode (i.e. train n_fold * n_preset models) or just with a simple random hold-out set.

The data is first fitted on several well working MODNet presets with a validation set (10% of the furnished data by default).

Sets the self.models attribute to the model with the lowest mean validation loss across all folds.

Note: Inner models (presets) are 5-model bootstraps. The final (refit) model will be a self.n_model bootstrap.

  • data (MODData) – MODData object contain training and validation samples.

  • presets (List[Dict[str, Any]]) – A list of dictionaries containing custom presets.

  • verbose (int) – The verbosity level to pass to tf.keras

  • val_fraction (float) – The fraction of the data to use for validation.

  • classification (bool) – Whether or not we are performing classification.

  • refit (bool) – Whether or not to refit the final model for each fold with the best-performing settings.

  • fast (bool) – Used for debugging. If True, only fit the first 2 presets, use 1-model ensembles and reduce the number of epochs.

  • nested (int) – integer specifying whether or not to perform a full nested CV. If 0, a simple validation split is performed based on val_fraction argument. If an integer, use this number of inner CV folds, ignoring the val_fraction argument. Note: If set to 1, the value will be overwritten to a default of 5 folds.

  • n_jobs (int) – number of concurrent processes to use when multiprocessing

  • callbacks (List[Any]) –


  • A list of length num_outer_folds containing lists of MODNet models of length num_inner_folds.

  • A list of validation losses achieved by the best model for each fold during validation (excluding refit).

  • The learning curve of the final (refitted) model (or None if refit is False)

  • A nested list of learning curves for each trained model of lengths (num_outer_folds, num_inner folds).

  • The settings of the best-performing preset.

Return type:

Tuple[List[List[Any]], numpy.ndarray, Optional[List[float]], List[List[float]], Dict[str, Any]]