modnet.sklearn module¶
sklearn API of modnet
This version implements the RR class of the sklearn API Still TODO: MODNetFeaturizer and MODNet classes
The general pipeline will be: from sklearn.pipeline import Pipeline modnet_featurizer = MODNetFeaturizer(…arguments here…) rr_analysis = RR(…arguments here…) modnet_model = MODNet(…arguments here…) p = Pipeline([(‘featurizer’, modnet_featurizer), (‘rr’, rr_analysis), (‘modnet’, modnet_model)])
One note about scikit learn’s steps when performing cross-validation: A given transformer (e.g. PCA, or RR) will not be executed at each step of the cross validation if you cache it, i.e. scikit-learn detects that the inputs are the same (if they are indeed the same) and does not do it several times (see https://scikit-learn.org/stable/modules/compose.html#caching-transformers-avoid-repeated-computation).
Another note about the fit method of a Pipeline object: It is possible to pass fit parameters to each step of a Pipeline object (see https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html#sklearn.pipeline.Pipeline.fit)
Another note about the __init__ method of sklearn derived estimators: The method should ONLY set the instance variables. In particular, “every keyword argument accepted by __init__ should correspond to an attribute on the instance” (as stated in sklearn’s developer documentation) (see https://scikit-learn.org/stable/developers/develop.html#instantiation).
- class modnet.sklearn.MODNetFeaturizer(*args, **kwargs)¶
Bases:
TransformerMixin
,BaseEstimator
Constructor for MODNetFeaturizer
- fit(X, y=None)¶
Probably not needed except for some matminer featurizers (e.g. I think SOAP needs to be fitted before).
- transform(X, y=None)¶
Transform the input data (i.e. containing the composition and/or structure to the features
- classmethod from_preset(preset)¶
Initializes a MODNetFeaturizer class based on some preset.
- Parameters:
preset – Name of the preset (e.g. “DeBreuck2020” could trigger the Structure+Composition featurizers)
Notes
See in matminer how it is done, e.g. SiteStatsFingerprint.from_preset method.
- class modnet.sklearn.RR(*args, **kwargs)¶
Bases:
TransformerMixin
,BaseEstimator
Relevance-Redundancy (RR) feature selection. Features are ranked and selected following a relevance-redundancy ratio as developed by De Breuck et al. (2020), see https://arxiv.org/abs/2004.14766.
Use the fit method for computing the most important features. Then use the transform method to truncate the input data to those features.
- n_feat¶
int number of features to keep
- optimal_descriptors¶
list of length (n_feat) ordered list of best descriptors
Constructor for RR transformer.
- Parameters:
n_feat (Union[None, int]) – Number of features to keep and reorder using the RR procedure (default: None, i.e. all features).
rr_parameters (Union[None, Dict]) –
Allows tuning of p and c parameters. Currently allows fixing of p and c to constant values instead of using the dynamical evaluation. Expects to find keys
"p"
and"c"
,- containing either a callable that takes
n
as an argument and returns the desiredp
orc
, or another dictionary containing the key
"value"
that stores a constant value ofp
orc
.
- containing either a callable that takes
- fit(X, y, nmi_feats_target=None, cross_nmi_feats=None)¶
Ranking of the features. This is based on relevance and redundancy provided as NMI dataframes. If not provided (i.e set to None), the NMIs are computed here. Nevertheless, it is strongly recommended to compute them separately and store them locally.
- Parameters:
X – Training input pandas dataframe of shape (n_samples,n_features)
y – Training output pandas dataframe of shape (n_samples,n_features)
nmi_feats_target – NMI between features and targets, pandas dataframe
cross_nmi_feats – NMI between features, pandas dataframe
- Returns:
object Fitted RR transformer
- Return type:
self
- transform(X, y=None)¶
Transform the inputs X based on a fitted RR analysis. The best n_feat features are kept and returned.
- Parameters:
X – input pandas dataframe of shape (n_samples,n_features)
y – ignored
- Returns:
X data containing n_feat rows (best features) as a pandas dataframe
- class modnet.sklearn.MODNet(*args, **kwargs)¶
Bases:
RegressorMixin
,BaseEstimator
MODNet model.
Blabla.
Notes
No assumption on the features here, just a list of numbers. What makes a MODNet model special with respect to using directly keras ? I would say that it is always doing some joint learning, maybe something else ?
Constructor for MODNet model.
Needs some thinking of what to put in __init__ and fit
- fit(X, y)¶
Fit a MODNet regression model.
- predict(X)¶
Predict output based on a fitted MODNet regression model