modnet.matbench.benchmark module¶

modnet.matbench.benchmark.matbench_kfold_splits(data, n_splits=5, classification=False)¶

Return the pre-defined k-fold splits to use when reporting matbench results.

Parameters:: data (MODData) – The featurized MODData.

modnet.matbench.benchmark.matbench_benchmark(data, target, target_weights, fit_settings=None, ga_settings=None, classification=False, model_type=<class 'modnet.models.vanilla.MODNetModel'>, save_folds=False, save_models=False, hp_optimization=True, hp_strategy='fit_preset', inner_feat_selection=True, use_precomputed_cross_nmi=True, presets=None, fast=False, n_jobs=None, nested=False, **model_init_kwargs)¶

Train and cross-validate a model against Matbench data splits, optionally performing hyperparameter optimisation.

Parameters:

data (MODData) – The entire dataset as a MODData.
target (List[str]) – The list of target names to train on.
target_weights (Dict[str, float]) – The target weights to use for the MODNetModel.
fit_settings (Optional[Dict[str, Any]]) – Any settings to pass to model.fit(...) directly (typically when not performing hyperparameter optimisation).
classification (bool) – Whether all tasks are classification rather than regression.
model_type (Type[MODNetModel]) – The type of the model to create and benchmark.
save_folds (bool) – Whether to save dataframes with pre-processed fold data (e.g. feature selection).
save_models (bool) – Whether to pickle all trained models according to their fold index and performance.
hp_optimization (bool) – Whether to perform hyperparameter optimisation.
hp_strategy (str) – Which optimization strategy to choose. Use either “fit_preset” or “ga”.
inner_feat_selection (bool) – Whether to perform split-level feature selection or try to use pre-computed values.
use_precomputed_cross_nmi (bool) – Whether to use the precmputed cross NMI from the Materials Project dataset, or recompute per fold.
presets (Optional[List[dict]]) – Override the built-in hyperparameter grid with these presets.
fast (bool) – Whether to perform debug training, i.e. reduced presets and epochs, for the fit_preset strategy.
n_jobs (Optional[int]) – Try to parallelize the inner fit_preset over this number of processes. Maxes out at number_of_presets*nested_folds
nested (bool) – Whether to perform nested CV for hyperparameter optimisation.
**model_init_kwargs – Additional arguments to pass to the model on creation.
ga_settings (Optional[Dict[str, float]]) –

Returns:

A dictionary containing all the results from the training, broken: down by model and by fold.

Return type:

dict

modnet.matbench.benchmark.train_fold(fold, target, target_weights, fit_settings, ga_settings, model_type=<class 'modnet.models.vanilla.MODNetModel'>, presets=None, hp_optimization=True, hp_strategy='fit_preset', classification=False, save_folds=False, fast=False, save_models=False, nested=False, n_jobs=None, **model_kwargs)¶

Train one fold of a CV. Unless stated, all arguments have the same meaning as in matbench_benchmark(...).

Parameters:

fold (Tuple[int, Tuple[MODData, MODData]]) – A tuple containing the fold index, and another tuple of the training MODData and test MODData.
target (List[str]) –
target_weights (Dict[str, float]) –
fit_settings (Dict[str, Any]) –
ga_settings (Dict[str, float]) –
model_type (Type[MODNetModel]) –

Returns:

A dictionary summarising the fold results.

Return type:

dict