gpt4 book ai didi

python - 我可以向 Scikit learn Pipeline 添加异常值检测和删除吗?

转载 作者:太空宇宙 更新时间:2023-11-04 06:40:40 25 4
gpt4 key购买 nike

我想在 Scikit-Learn 中创建一个管道,其中一个特定步骤是异常值检测和删除,允许将转换后的数据传递给其他转换器和估算器。

我已搜索 SE 但无法在任何地方找到此答案。这可能吗?

最佳答案

是的。子类化 TransformerMixin 并构建自定义转换器。这是对现有异常值检测方法之一的扩展:

from sklearn.pipeline import Pipeline, TransformerMixin
from sklearn.neighbors import LocalOutlierFactor

class OutlierExtractor(TransformerMixin):
def __init__(self, **kwargs):
"""
Create a transformer to remove outliers. A threshold is set for selection
criteria, and further arguments are passed to the LocalOutlierFactor class

Keyword Args:
neg_conf_val (float): The threshold for excluding samples with a lower
negative outlier factor.

Returns:
object: to be used as a transformer method as part of Pipeline()
"""

self.threshold = kwargs.pop('neg_conf_val', -10.0)

self.kwargs = kwargs

def transform(self, X, y):
"""
Uses LocalOutlierFactor class to subselect data based on some threshold

Returns:
ndarray: subsampled data

Notes:
X should be of shape (n_samples, n_features)
"""
X = np.asarray(X)
y = np.asarray(y)
lcf = LocalOutlierFactor(**self.kwargs)
lcf.fit(X)
return (X[lcf.negative_outlier_factor_ > self.threshold, :],
y[lcf.negative_outlier_factor_ > self.threshold])

def fit(self, *args, **kwargs):
return self

然后创建一个管道:

pipe = Pipeline([('outliers', OutlierExtraction()), ...])

关于python - 我可以向 Scikit learn Pipeline 添加异常值检测和删除吗?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52346725/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com