gpt4 book ai didi

python - 将额外参数传递给 sklearn 管道中的自定义评分函数

转载 作者:太空狗 更新时间:2023-10-30 02:54:15 28 4
gpt4 key购买 nike

我需要在 sklearn 中使用自定义分数执行单变量特征选择,因此我使用的是 GenericUnivariateSelect。但是,正如在文档中一样,

选择器的模式:{‘percentile’, ‘k_best’, ‘fpr’, ‘fdr’, ‘fwe’}

在我的例子中,我需要选择分数高于某个值的特征,所以我实现了:

from sklearn.feature_selection.univariate_selection import _clean_nans
from sklearn.feature_selection.univariate_selection import f_classif
import numpy as np
import pandas as pd
from sklearn.feature_selection import GenericUnivariateSelect
from sklearn.metrics import make_scorer
from sklearn.feature_selection.univariate_selection import _BaseFilter
from sklearn.pipeline import Pipeline



class SelectMinScore(_BaseFilter):
# Sklearn documentation: modes for selectors : {‘percentile’, ‘k_best’, ‘fpr’, ‘fdr’, ‘fwe’}
# custom selector:
# select features according to the k highest scores.
def __init__(self, score_func=f_classif, minScore=0.7):
super(SelectMinScore, self).__init__(score_func)
self.minScore = minScore
self.score_func=score_func
[...]
def _get_support_mask(self):
check_is_fitted(self, 'scores_')

if self.minScore == 'all':
return np.ones(self.scores_.shape, dtype=bool)
else:
scores = _clean_nans(self.scores_)
mask = np.zeros(scores.shape, dtype=bool)

# Custom part
# only score above the min
mask=scores>self.minScore
if not np.any(mask):
mask[np.argmax(scores)]=True
return mask

但是,我还需要使用自定义评分函数,它必须在此处接收额外的参数 (XX):不幸的是,我无法使用 make_scorer

解决
def Custom_Score(X,Y,XX):
return 1

class myclass():
def mymethod(self,_XX):

custom_filter=GenericUnivariateSelect(Custom_Score(XX=_XX),mode='MinScore',param=0.7)
custom_filter._selection_modes.update({'MinScore': SelectMinScore})
MyProcessingPipeline=Pipeline(steps=[('filter_step', custom_filter)])
# finally
X=pd.DataFrame(data=np.random.rand(500,3))
y=pd.DataFrame(data=np.random.rand(500,1))
MyProcessingPipeline.fit(X,y)
MyProcessingPipeline.transform(X,y)

_XX=np.random.rand(500,1
C=myclass()
C.mymethod(_XX)

这会引发以下错误:

Traceback (most recent call last):

File "<ipython-input-37-f493745d7e1b>", line 1, in <module>
runfile('C:/Users/_____/Desktop/pd-sk-integration.py', wdir='C:/Users/_____/Desktop')
File "C:\Users\______\AppData\Local\Continuum\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 866, in runfile
execfile(filename, namespace)
File "C:\Users\\______\\AppData\Local\Continuum\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 87, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)=
File "C:/Users/______/Desktop/pd-sk-integration.py", line 65, in <module>
C.mymethod()
File "C:/Users/______/Desktop/pd-sk-integration.py", line 55, in mymethod
custom_filter=GenericUnivariateSelect(Custom_Score(XX=_XX),mode='MinScore',param=0.7)
TypeError: Custom_Score() takes exactly 3 arguments (1 given)

编辑:

我尝试通过向我的 SelectMinScore 函数的 fit() 添加额外的 kwarg (XX) 并通过传递它作为一个合适的参数。正如@TomDLT 所建议的那样,

custom_filter = SelectMinScore(minScore=0.7)
pipe = Pipeline(steps=[('filter_step', custom_filter)])
pipe.fit(X,y, filter_step__XX=XX)

但是,如果我这样做

line 291, in set_params
(key, self.__class__.__name__))
ValueError: Invalid parameter XX for estimator SelectMinScore. Check the list of available parameters with `estimator.get_params().keys()`.

最佳答案

正如您在 the code 中看到的那样, scorer 函数不会用额外的参数调用,所以目前在 scikit-learn 中没有简单的方法来传递你的样本属性 XX

对于您的问题,一个稍微老套的方法可能是更改 SelectMinScore 中的函数 fit,添加一个额外的参数 XX:

def fit(self, X, y, XX):
"""..."""
X, y = check_X_y(X, y, ['csr', 'csc'], multi_output=True)

if not callable(self.score_func):
raise TypeError("The score function should be a callable, %s (%s) "
"was passed."
% (self.score_func, type(self.score_func)))

self._check_params(X, y)
score_func_ret = self.score_func(X, y, XX)
if isinstance(score_func_ret, (list, tuple)):
self.scores_, self.pvalues_ = score_func_ret
self.pvalues_ = np.asarray(self.pvalues_)
else:
self.scores_ = score_func_ret
self.pvalues_ = None

self.scores_ = np.asarray(self.scores_)

return self

然后您可以使用 extra fit params 调用管道:

custom_filter = SelectMinScore(minScore=0.7)
pipe = Pipeline(steps=[('filter_step', custom_filter)])
pipe.fit(X,y, filter_step__XX=XX)

关于python - 将额外参数传递给 sklearn 管道中的自定义评分函数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46606855/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com