gpt4 book ai didi

python-3.x - 在自定义类中使用 train_test_split 时出现类型错误(单例数组...)

转载 作者:行者123 更新时间:2023-12-03 14:13:29 27 4
gpt4 key购买 nike

TypeError: Singleton array array(<__main__.AZHU_EmailClassifier_2object at 0x000001D6E7A680D0>, dtype=object) cannot be considered avalid collection.


当我尝试在我的自定义 AZHU_EmailClassifier_2 类中运行 train_test_split 函数时出现此错误。
我的课:
class AZHU_EmailClassifier_2:
import os
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

def __init__(self):
pass

def retrain_model(self, csv_file):

MIN_ROW_NUMBER = 500
TEST_SIZE = 0.25
RANDOM_STATE = 42

self.os.chdir(r"c:\LORI\PROJECTS\ALLIANZ\INCOMING_CHANNELS") # <---- a retraining file mappaja

df=self.pd.read_excel(csv_file,error_bad_lines=False, header=None)

df.dropna(axis=0,how='any', inplace=True)

rows_no=df.shape[0]
if rows_no<MIN_ROW_NUMBER:
print("Insufficient number of rows (<35.000)! RETRAINING ABORTED")
return None

X=df[0]
y=df[1]

X_train, X_test, y_train, y_test=self.train_test_split(X,y)
#X_train, X_test, y_train, y_test=self.train_test_split(X,y,test_size=TEST_SIZE, random_state=RANDOM_STATE, stratify=y)

return X_train

当我运行 train_test_split 函数时触发错误。
整个错误信息:

--------------------------------------------------------------------------- TypeError Traceback (most recent calllast) in 1 instance = AZHU_EmailClassifier_2()2----> 3 instance.retrain_model("retraining_dummy.xlsx")

in retrain_model(self, csv_file)28 y=df[1]29---> 30 X_train, X_test, y_train, y_test=self.train_test_split(X,y)31 #X_train, X_test, y_train, y_test=self.train_test_split(X,y,test_size=TEST_SIZE,random_state=RANDOM_STATE, stratify=y)32

~\Anaconda3\lib\site-packages\sklearn\model_selection_split.py intrain_test_split(*arrays, **options) 2125 raiseTypeError("Invalid parameters passed: %s" % str(options)) 2126-> 2127 arrays = indexable(*arrays) 2128 2129 n_samples = _num_samples(arrays[0])

~\Anaconda3\lib\site-packages\sklearn\utils\validation.py inindexable(*iterables)291 """292 result = [_make_indexable(X) for X in iterables]--> 293 check_consistent_length(*result)294 return result295

~\Anaconda3\lib\site-packages\sklearn\utils\validation.py incheck_consistent_length(*arrays)251 """252--> 253 lengths = [_num_samples(X) for X in arrays if X is not None]254 uniques = np.unique(lengths)255 if len(uniques) > 1:

~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in(.0)251 """252--> 253 lengths = [_num_samples(X) for X in arrays if X is not None]254 uniques = np.unique(lengths)255 if len(uniques) > 1:

~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in_num_samples(x)194 if hasattr(x, 'shape') and x.shape is not None:195 if len(x.shape) == 0:--> 196 raise TypeError("Singleton array %r cannot be considered"197 " a valid collection." % x)198 # Check that shape is returning an integer or default to len

TypeError: Singleton array array(<main.AZHU_EmailClassifier_2object at 0x000001D6E7A68F10>,dtype=object) cannot be considered a valid collection.


我不知道为什么它会抛出这个错误。你能指出我正确的方向吗?任何帮助表示赞赏!

最佳答案

您收到此错误是因为您导入了 train_test_split因此,在类内,train_test_split成为绑定(bind)方法而不是函数,并且每当调用该方法时,实例将作为第一个参数传递。这是可以重建情况的最小示例

class test():

from sklearn.model_selection import train_test_split

def retrain_model(self):
print(self.train_test_split)
print(self.train_test_split())

test_instance = test()
test_instance.retrain_model()
运行此脚本后,您将获得 TypeError
TypeError: Singleton array array(<__main__.test object at 0x7ffa473ae438>, dtype=object) cannot be considered a valid collection.
self.train_test_split 的位置内存中也是 0x7ffa473ae438 .
根据 PEP8

Imports are always put at the top of the file, just after any modulecomments and docstrings, and before module globals and constants.


因此,最简单的解决方案是导入类外的所有内容并调用 train_test_split。直接地
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

class AZHU_EmailClassifier_2():

def __init__(self):
pass

def retrain_model(self,):

MIN_ROW_NUMBER = 20
TEST_SIZE = 0.25
RANDOM_STATE = 42

df = pd.DataFrame({0:np.linspace(1,100,100),1:np.random.rand(100)})
X=df[0];y=df[1]
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=TEST_SIZE,random_state=RANDOM_STATE)

return X_train

test = AZHU_EmailClassifier_2()
test.retrain_model()

关于python-3.x - 在自定义类中使用 train_test_split 时出现类型错误(单例数组...),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65082283/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com