Skip to content

Preprocessing

Most datasets need to be preprocessed/transformed before they can be passed to the model. This module includes common transformers that are compatible with sklearn Pipeline or ColumnTransformer.

DateTransformer

Bases: TransformerMixin, BaseEstimator

Transform date features by deriving useful date/time attributes:

  • date attributes: Year, Month, Week, Day, Dayofweek, Dayofyear, Is_month_end, Is_month_start, Is_quarter_end, Is_quarter_start, Is_year_end, Is_year_start.
  • time attributes: Hour, Minute, Second.

__init__(date_feats=None, time=False, drop=True)

Parameters:

Name Type Description Default
date_feats Iterable

Date features to transform. If None, all features with datetime64 data type will be used.

None
time bool

Whether to add time-related derived features such as Hour/Minute/...

False
drop bool

Whether to drop date features used.

True

fit(X, y=None)

Populate date features if not provided at initialization.

Parameters:

Name Type Description Default
X DataFrame

Dataframe that has the date features to transform.

required
y array | DataFrame | None

Included for completeness to be compatible with scikit-learn transformers and pipelines but will not be used.

None

Returns:

Name Type Description
self DateTransformer

Fitted date transformer.

transform(X, y=None)

Derive the date/time attributes for all date features.

Parameters:

Name Type Description Default
X DataFrame

Dataframe that has the date features to transform.

required
y array | DataFrame | None

Included for completeness to be compatible with scikit-learn transformers and pipelines but will not be used.

None

Returns:

Name Type Description
X_tr DataFrame

Dataframe with derived date/time features and NaN indicators.