gpt4 book ai didi

apache-spark - 为什么 Spark 作业失败并显示 "too many open files"?

转载 作者:行者123 更新时间:2023-12-03 10:25:54 27 4
gpt4 key购买 nike

在我的 Spark 作业的 shuffle 阶段,我收到“太多打开的文件”。为什么我的工作打开这么多文件?我可以采取哪些步骤来尝试使我的工作取得成功。

最佳答案

This has been answered on the spark user list :

The best way is definitely just to increase the ulimit if possible, this is sort of an assumption we make in Spark that clusters will be able to move it around.

You might be able to hack around this by decreasing the number of reducers [or cores used by each node] but this could have some performance implications for your job.

In general if a node in your cluster has C assigned cores and you run a job with X reducers then Spark will open C*X files in parallel and start writing. Shuffle consolidation will help decrease the total number of files created but the number of file handles open at any time doesn't change so it won't help the ulimit problem.

-Patrick Wendell

关于apache-spark - 为什么 Spark 作业失败并显示 "too many open files"?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25707629/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com