modnet.featurizers package

Subpackages

Submodules

Module contents

This submodule defines some MODNet featurizers, which are generally collections of matminer featurizers (or other compatible objects).

class modnet.featurizers.MODFeaturizer(n_jobs=None, drop_allnan=True)

Bases: ABC

Base class for multiple featurization across structure, composition and sites.

Child classes must provide iterables of matminer featurizer objects to be applied to the structure, composition and sites of the structures in the input dataframe.

composition_featurizers

Optional iterable of featurizers to apply to the ‘composition’ column (which will be generated if missing).

Type:

Optional[Iterable[matminer.featurizers.base.BaseFeaturizer]]

oxid_composition_featurizers

Optional iterable of featurizers to apply to the ‘composition_oxid’ column generated by the CompositionToOxidComposition converter.

Type:

Optional[Iterable[matminer.featurizers.base.BaseFeaturizer]]

structure_featurizers

Optional iterable of featurizers to apply to the structure as SiteStatsFingerprint objects. Uses the site_stats attribute to determine which statistics are calculated.

Type:

Optional[Iterable[matminer.featurizers.base.BaseFeaturizer]]

site_stats

Iterable of string statistic names to be used by the SiteStatsFingerprint objects.

Type:

Tuple[str]

featurizer_mode

Whether or not to apply all featurizers at once (“multi”), i.e., parallelising over structures, or one-at-a-time (“single”), i.e., parallelising over featurisers.

Type:

str

Initialise the MODFeaturizer object with a requested number of threads to use during featurization.

Parameters:
  • n_jobs – The number of threads to use. If None, matminer

  • default. (will use multiprocessing.cpu_count() by) –

  • drop_allnan (bool) – if True, features that are fully NaNs will be removed.

composition_featurizers: Optional[Iterable[matminer.featurizers.base.BaseFeaturizer]] = None
oxid_composition_featurizers: Optional[Iterable[matminer.featurizers.base.BaseFeaturizer]] = None
structure_featurizers: Optional[Iterable[matminer.featurizers.base.BaseFeaturizer]] = None
site_featurizers: Optional[Iterable[matminer.featurizers.base.BaseFeaturizer]] = None
site_stats: Tuple[str] = ('mean', 'std_dev')
featurizer_mode: str = 'multi'
set_n_jobs(n_jobs)

Set the no. of threads to pass to matminer for featurizer initialisation.

Parameters:
  • n_jobs (Optional[int]) – The number of threads to use. If None, matminer

  • default. (will use multiprocessing.cpu_count() by) –

set_drop_allnan(drop_allnan=True)
Parameters:

drop_allnan (bool) –

featurize(df)

Run all of the preset featurizers on the input dataframe.

Parameters:

df (pandas.DataFrame) – the input dataframe with a "structure" column containing pymatgen Structure objects.

Returns:

The featurized DataFrame.

Return type:

pandas.DataFrame

featurize_composition(df)

Decorate input pandas.DataFrame of structures with composition features from matminer, specified by the MODFeaturizer preset.

Currently applies the set of all matminer composition features.

Parameters:

df (pandas.DataFrame) – the input dataframe with a "structure" column containing pymatgen Structure objects.

Returns:

the decorated DataFrame, or an empty

DataFrame if no composition/oxidation featurizers exist for this class.

Return type:

pandas.DataFrame

featurize_structure(df)

Decorate input pandas.DataFrame of structures with structural features from matminer, specified by the MODFeaturizer preset.

Currently applies the set of all matminer structure features.

Parameters:

df (pandas.DataFrame) – the input dataframe with a "structure" column containing pymatgen Structure objects.

Returns:

the decorated DataFrame.

Return type:

pandas.DataFrame

featurize_site(df, aliases=None)

Decorate input pandas.DataFrame of structures with site features, specified by the MODFeaturizer preset.

Parameters:
  • df (pandas.DataFrame) – the input dataframe with a "structure" column containing pymatgen Structure objects.

  • aliases (Optional[Dict[str, str]]) – optional dictionary to map matminer output column names to new aliases, mostly used for backwards-compatibility.

Returns:

the decorated DataFrame.

Return type:

pandas.DataFrame

modnet.featurizers.clean_df(df, drop_allnan=True)

Cleans dataframe by dropping missing values, replacing NaN’s and infinities and selecting only columns containing numerical data.

Parameters:
  • df (pd.DataFrame) – the dataframe to clean.

  • drop_allnan (bool) – if True, clean_df will remove features that are fully NaNs.

Returns:

the cleaned dataframe.

Return type:

pandas.DataFrame