gpt4 book ai didi

python - 使用 Pandas 进行多处理时出现 "The process has forked"错误

转载 作者:行者123 更新时间:2023-12-04 10:50:31 26 4
gpt4 key购买 nike

我有一个巨大的数据框,我想通过 Multiprocess 拆分它,做一些工作并将结果写入文件。但是,当我运行代码时,出现以下错误:

The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec().
Break on __THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY___YOU_MUST_EXEC__() to debug
代码是这样的:
def func(d):
first = d.iloc[0].id
_f = open('output_' + str(first) + '.json', 'w')
options = webdriver.chrome.options.Options()
options.headless = True
options.add_argument("--no-sandbox")
options.add_argument("--disable-dev-shm-usage")
driver = webdriver.Chrome('chromedriver', options=options)
driver.set_page_load_timeout(5)
for index, row in d.iterrows():
res = dfApply(row, driver, _f)
# in this function I use selenium to scrape a website and write some results in the
# json file. If the scraping returns True (got result), I delete the row. If not, I
# let it there in the dataframe.
if res:
d.drop(index, inplace=True)
return row


if __name__ == '__main__':

df = pd.read_csv('rest.csv', nrows = 100)
print('Dataframe size:', df.shape)

num_processes = multiprocessing.cpu_count()

chunk_size = int(df.shape[0]/num_processes)

chunks = [df.ix[df.index[i:i + chunk_size]] for i in range(0, df.shape[0], chunk_size)]

pool = multiprocessing.Pool(processes=num_processes)

result = pool.map(func, chunks)

最佳答案

这是一个与 macOS 相关的警告错误,因为在没有 exec() 的情况下尝试使用 fork (Python < 3.8 的默认值) (覆盖进程镜像)作为新进程的启动方法。

摘自bug-tracker :

... The problem is in higher-level APIs (CoreFoundation, Foundation, AppKit, ...), and appears to be related to using multi-threading in those libraries without spending effort on pre/post fork handlers to ensure that new processes are in a sane state after fork(). In older macOS versions this could result in hard to debug issues, in newer versions APIs seem to guard against this by aborting when the detect that the pid changed.



将新进程的启动方法切换为“生成”以解决它:
...
if __name__ == '__main__':
multiprocessing.set_start_method("spawn")

对于 Python 3.8+ macOS 的启动方法默认为“spawn”。

关于python - 使用 Pandas 进行多处理时出现 "The process has forked"错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59500295/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com