Default value None is converted to `GapEncoder(n_components=30)`. high_card_cat_transformer: Transformer or str or None, default=None Transformer used on categorical/string features with high cardinality (threshold is defined by `cardinality_threshold`). `OneHotEncoder()`), a `Pipeline` containing the preprocessing steps, None to apply `remainder`, 'drop' for dropping the columns, or 'passthrough' to return the unencoded columns. Can either be a transformer object instance (e.g. Default value None is converted to `OneHotEncoder()`. low_card_cat_transformer: Transformer or str or None, default=None Transformer used on categorical/string features with low cardinality (threshold is defined by `cardinality_threshold`). Different encoders will be applied to these two groups, defined by the parameters `low_card_cat_transformer` and `high_card_cat_transformer` respectively. versionadded:: 0.2.0 Parameters - cardinality_threshold: int, default=40 Two lists of features will be created depending on this value: strictly under this value, the low cardinality categorical values, and above or equal, the high cardinality categorical values. It provides a simplified interface for scikit-learn's `ColumnTransformer`. For this it transforms each column depending on its data type. class SuperVectorizer ( ColumnTransformer ): """ Easily transforms a heterogeneous data table (such as a dataframe) to a numerical array for machine learning. name if dtype_name = 'category' and ( value not in df. Series : """ Takes a Series with string data, replaces the missing values, and returns it. Series, value : str = "missing" ) -> pd. ![]() isnull ()) def _replace_missing_in_col ( df : pd. ![]() """ # Author: Lilian Boulard | import sklearn import numpy as np import pandas as pd from warnings import warn from typing import Union, Optional, List from sklearn.base import BaseEstimator, clone from pose import ColumnTransformer from sklearn.preprocessing import OneHotEncoder from sklearn import _version_ as sklearn_version from dirty_cat import GapEncoder, DatetimeEncoder from dirty_cat.utils import Version, check_input def _has_missing_values ( df : Union ) -> bool : """ Returns True if `array` contains missing values, False otherwise. """ This class implements the SuperVectorizer, which is a preprocessor used to automatically apply encoders to different types of data, without the need to manually categorize them beforehand, or construct complex Pipelines.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |