gpt4 book ai didi

multiprocessing - 破管错误 : [WinError 109] The pipe has been ended during data extraction

转载 作者:行者123 更新时间:2023-12-05 03:10:42 26 4
gpt4 key购买 nike

我是 python 多处理的新手。我正在从 70,000 个 URL 的列表中提取一些特征。我从 2 个不同的文件中得到它们。在特征提取过程之后,我将结果传递给一个列表,然后传递给一个 CSV 文件。

代码运行但随后因错误而停止。我试图捕获错误但它产生了另一个错误。

Python 版本 = 3.5

from feature_extractor import Feature_extraction

import pandas as pd

from pandas.core.frame import DataFrame

import sys

from multiprocessing.dummy import Pool as ThreadPool

import threading as thread

from multiprocessing import Process,Manager,Array

import time

class main():

lst = None

def __init__(self):
manager = Manager()
self.lst = manager.list()
self.dostuff()
self.read_lst()

def feature_extraction(self,url):
if self.lst is None:
self.lst = []

features = Feature_extraction(url)
self.lst.append(features.get_features())
print(len(self.lst))



def Pool(self,url):
pool = ThreadPool(8)
results = pool.map(self.feature_extraction, url)

def dostuff(self):
df = pd.read_csv('verified_online.csv',encoding='latin-1')
df['label'] = df['phish_id'] * 0
mal_urls = df['url']

df2 = pd.read_csv('new.csv')
df2['label'] = df['phish_id']/df['phish_id']
ben_urls = df2['urls']
t = Process(target=self.Pool,args=(mal_urls,))
t2 = Process(target=self.Pool,args=(ben_urls,))
t.start()
t2.start()
t.join()
t2.join

def read_lst(self):
nw_df = DataFrame(list(self.lst))

nw_df.columns = ['Redirect count','ssl_classification','url_length','hostname_length','subdomain_count','at_sign_in_url','exe_extension_in_request_url','exe_extension_in_landing_url',
'ip_as_domain_name','no_of_slashes_in requst_url','no_of_slashes_in_landing_url','no_of_dots_in_request_url','no_of_dots_in_landing_url','tld_value','age_of_domain',
'age_of_last_modified','content_length','same_landing_and_request_ip','same_landing_and_request_url']
frames = [df['label'],df2['label']]
new_df = pd.concat(frames)
new_df = new_df.reset_index()
nw_df['label'] = new_df['label']
nw_df.to_csv('dataset.csv', sep=',', encoding='latin-1')

if __name__ == '__main__':



start_time = time.clock()
try:
main()

except BrokenPipeError:
print("broken pipe....")
pass

print (time.clock() - start_time, "seconds")

错误回溯

Process Process-3:
Traceback (most recent call last):
File "F:\Continuum\Anaconda3\lib\multiprocessing\connection.py", line 312, in _recv_bytes
nread, err = ov.GetOverlappedResult(True)
BrokenPipeError: [WinError 109] The pipe has been ended

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "F:\Continuum\Anaconda3\lib\multiprocessing\process.py", line 249, in _bootstrap
self.run()
File "F:\Continuum\Anaconda3\lib\multiprocessing\process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "H:\Projects\newoproject\src\main.py", line 33, in Pool
results = pool.map(self.feature_extraction, url)
File "F:\Continuum\Anaconda3\lib\multiprocessing\pool.py", line 260, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "F:\Continuum\Anaconda3\lib\multiprocessing\pool.py", line 608, in get
raise self._value
File "F:\Continuum\Anaconda3\lib\multiprocessing\pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "F:\Continuum\Anaconda3\lib\multiprocessing\pool.py", line 44, in mapstar
return list(map(*args))
File "H:\Projects\newoproject\src\main.py", line 26, in feature_extraction
self.lst.append(features.get_features())
File "<string>", line 2, in append
File "F:\Continuum\Anaconda3\lib\multiprocessing\managers.py", line 717, in _callmethod
kind, result = conn.recv()
File "F:\Continuum\Anaconda3\lib\multiprocessing\connection.py", line 250, in recv
buf = self._recv_bytes()
File "F:\Continuum\Anaconda3\lib\multiprocessing\connection.py", line 321, in _recv_bytes
raise EOFError
EOFError

最佳答案

我的回复晚了,没有直接解决发布的问题;但希望能为遇到类似错误的其他人提供线索。

我遇到的错误:破管错误WinError 109 管道已结束 &WinError 232 管道正在关闭

在 Windows 7 上使用 Python 36 观察到,当时:(1) 同一个异步函数被多次提交,每次都使用多处理数据存储的不同实例,在我的例子中是一个队列 (multiprocessing.Manager().Queue())和(2) 队列的引用保存在包络函数中的短生命期局部变量中。

尽管与成功生成和执行的异步函数共享的队列有项目并且在异常时仍处于事件状态(put() 和 get()),但错误仍在发生。

当使用队列的第二个实例第二次调用相同的 async_func 时,错误始终发生。在函数的 apply_async() 之后,与第一次提供给 async_func 的第一个队列的连接将立即断开。

当对队列的引用保存在包络函数中的非重叠(如队列列表)和生命周期更长的变量(如返回到调用堆栈中更高层函数的变量)时,问题得到解决。

关于multiprocessing - 破管错误 : [WinError 109] The pipe has been ended during data extraction,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39078025/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com