modnet.matbench.benchmark module

modnet.matbench.benchmark.matbench_kfold_splits(data, n_splits=5, classification=False)

Return the pre-defined k-fold splits to use when reporting matbench results.

Parameters:

data (MODData) – The featurized MODData.

modnet.matbench.benchmark.matbench_benchmark(data, target, target_weights, fit_settings=None, ga_settings=None, classification=False, model_type=<class 'modnet.models.vanilla.MODNetModel'>, save_folds=False, save_models=False, hp_optimization=True, hp_strategy='fit_preset', inner_feat_selection=True, use_precomputed_cross_nmi=True, presets=None, fast=False, n_jobs=None, nested=False, **model_init_kwargs)

Train and cross-validate a model against Matbench data splits, optionally performing hyperparameter optimisation.

Parameters:
  • data (MODData) – The entire dataset as a MODData.

  • target (List[str]) – The list of target names to train on.

  • target_weights (Dict[str, float]) – The target weights to use for the MODNetModel.

  • fit_settings (Optional[Dict[str, Any]]) – Any settings to pass to model.fit(...) directly (typically when not performing hyperparameter optimisation).

  • classification (bool) – Whether all tasks are classification rather than regression.

  • model_type (Type[MODNetModel]) – The type of the model to create and benchmark.

  • save_folds (bool) – Whether to save dataframes with pre-processed fold data (e.g. feature selection).

  • save_models (bool) – Whether to pickle all trained models according to their fold index and performance.

  • hp_optimization (bool) – Whether to perform hyperparameter optimisation.

  • hp_strategy (str) – Which optimization strategy to choose. Use either “fit_preset” or “ga”.

  • inner_feat_selection (bool) – Whether to perform split-level feature selection or try to use pre-computed values.

  • use_precomputed_cross_nmi (bool) – Whether to use the precmputed cross NMI from the Materials Project dataset, or recompute per fold.

  • presets (Optional[List[dict]]) – Override the built-in hyperparameter grid with these presets.

  • fast (bool) – Whether to perform debug training, i.e. reduced presets and epochs, for the fit_preset strategy.

  • n_jobs (Optional[int]) – Try to parallelize the inner fit_preset over this number of processes. Maxes out at number_of_presets*nested_folds

  • nested (bool) – Whether to perform nested CV for hyperparameter optimisation.

  • **model_init_kwargs – Additional arguments to pass to the model on creation.

  • ga_settings (Optional[Dict[str, float]]) –

Returns:

A dictionary containing all the results from the training, broken

down by model and by fold.

Return type:

dict

modnet.matbench.benchmark.train_fold(fold, target, target_weights, fit_settings, ga_settings, model_type=<class 'modnet.models.vanilla.MODNetModel'>, presets=None, hp_optimization=True, hp_strategy='fit_preset', classification=False, save_folds=False, fast=False, save_models=False, nested=False, n_jobs=None, **model_kwargs)

Train one fold of a CV. Unless stated, all arguments have the same meaning as in matbench_benchmark(...).

Parameters:
Returns:

A dictionary summarising the fold results.

Return type:

dict