gpt4 book ai didi

python - 使用 multiprocessing -Pool- 和 -sklearn-,代码运行但核心不显示任何工作

转载 作者:行者123 更新时间:2023-11-28 18:08:39 28 4
gpt4 key购买 nike

我正在尝试对大约 31000 行和 1000 列进行一些机器学习。这需要很长时间,所以我认为我可以并行化这项工作,所以我把它变成了一个函数,并尝试在我的 Windows 10 和 jupyter notebook 上使用这个工具。但它只是工作,当我在任务管理器上查看我的核心时,它们不工作。代码有问题还是只是不受支持?

from sklearn.model_selection import train_test_split
X_dev, X_test, y_dev, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

from sklearn.model_selection import KFold
from sklearn.tree import DecisionTreeClassifier
from sklearn.preprocessing import Imputer
from sklearn.metrics import accuracy_score
from multiprocessing import Pool
from datetime import datetime as dt

def tree_paralel(x):
tree = DecisionTreeClassifier(criterion="gini", max_depth= x, random_state=1)
accuracy_ = []
for train_idx, val_idx in kfolds.split(X_dev, y_dev):

X_train, y_train, = X_dev.iloc[train_idx], y_dev.iloc[train_idx]
X_val, y_val = X_dev.iloc[val_idx], y_dev.iloc[val_idx]

X_train = pd.DataFrame(im.fit_transform(X_train),index = X_train.index)
X_val = pd.DataFrame(im.transform(X_val), index = X_val.index)
tree.fit(X_train, y_train)
y_pred = tree.predict(X_val)
accuracy_.append(accuracy_score(y_val, y_pred))
print("This was the "+str(x)+" iteration", (dt.now() - start).total_seconds())
return accuracy_

然后使用多处理工具:

kfolds = KFold(n_splits=10)
accuracy = []
im = Imputer()

p = Pool(5)

input_ = range(1,11)
output_ = []
start = dt.now()
for result in p.imap(tree_paralel, input_):
output_.append(result)
p.close()
print("Time:", (dt.now() - start).total_seconds())

最佳答案

这是使用交互式 python 时的一个已知问题。
引用 the note from Using a pool of workers multiprocessing 文档部分:

Note: Functionality within this package requires that the __ main__ module be importable by the children. This is covered in Programming guidelines however it is worth pointing out here. This means that some examples, such as the multiprocessing.pool.Pool examples will not work in the interactive interpreter.

另见 multiprocessing Programming Guidelines .

顺便说一句,我没有得到你需要用你的代码完成什么。将 GridSearchCVn_jobs=5 一起使用不会解决您的问题(并大大简化您的代码)吗?

关于python - 使用 multiprocessing -Pool- 和 -sklearn-,代码运行但核心不显示任何工作,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52045028/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com