gpt4 book ai didi

pyspark - Dataproc上的PySpark通过SocketTimeoutException停止

转载 作者:行者123 更新时间:2023-12-03 22:06:46 26 4
gpt4 key购买 nike

我们目前正在尝试使用PySpark 2.2.0在Dataproc集群上运行Spark作业,除了Spark作业在看似随机的时间流逝并出现以下错误消息后停止:

17/07/25 00:52:48 ERROR org.apache.spark.api.python.PythonRDD: Error while sending iterator
java.net.SocketTimeoutException: Accept timed out
at java.net.PlainSocketImpl.socketAccept(Native Method)
at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
at java.net.ServerSocket.implAccept(ServerSocket.java:545
at java.net.ServerSocket.accept(ServerSocket.java:513)
at org.apache.spark.api.python.PythonRDD$$anon$2.run(PythonRDD.scala:702)


该错误有时可能只需要花费几分钟,或者可能需要3个小时。根据个人经验,Spark作业会运行30分钟到1个小时,然后再出现错误。

一旦Spark作业遇到错误,它就会停止。无论我等待多长时间,它都不会输出任何内容。在YARN ResourceManager上,应用程序状态仍标记为“正在运行”,我必须按Ctrl + C才能终止程序。此时,该应用程序被标记为“完成”。

我在主节点的控制台上使用 /path/to/spark/bin/spark-submit --jars /path/to/jar/spark-streaming-kafka-0-8-assembly_2.11-2.2.0.jar spark_job.py命令运行Spark作业。该JAR文件是必需的,因为Spark作业流化来自Kafka的消息(在与Spark作业相同的群集上运行),并将某些消息推回到同一Kafka上以另一个主题。

我已经查看了该站点上的其他答案(主要是 thisthis),它们虽然有所帮助,但是我们无法在日志中找到可能导致执行者执行错误的原因。死。到目前为止,我已经通过YARN ResourceManager监视了任务期间的节点,并遍历了每个节点中 /var/logs/hadoop-yarn目录中的日志。我在日志中可以找到的唯一“线索”是 org.apache.spark.executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM,这是唯一写入死执行者日志的行。

作为最后的努力,我们试图增加群集的内存大小,希望问题不会消失。最初,该群集在具有4vCPU,15GB内存的1主2工作者群集上运行。我们创建了一个新的Dataproc集群,这次只有1个master和3个worker,每个worker具有8vCPU 52GB内存(master具有与以前相同的规格)。

我们想知道的是:
1.在哪里/如何查看导致执行程序终止的异常?
2.这与Spark的配置有关吗?
3. Dataproc映像版本为“预览”。这可能是错误的原因吗?
最终,
4.我们如何解决这个问题?我们还能采取什么其他步骤?

此Spark作业需要在不确定的时间内连续不断地从Kafka流式传输,因此我们希望此错误得以解决,而不是延长该错误发生所需的时间。

以下是YARN ResourceManager的一些屏幕截图,用于演示我们所看到的内容:

集群指标
Cluster Metrics

执行器摘要
enter image description here

屏幕截图来自Spark作业因错误而停止之前。

这是位于 /path/to/spark/conf/spark-defaults.conf中的Spark配置文件(未更改Dataproc的默认设置):

spark.master yarn
spark.submit.deployMode client
spark.yarn.jars=local:/usr/lib/spark/jars/*
spark.eventLog.enabled true
spark.eventLog.dir hdfs://highmem-m/user/spark/eventlog

# Dynamic allocation on YARN
spark.dynamicAllocation.enabled true
spark.dynamicAllocation.minExecutors 1
spark.executor.instances 10000
spark.dynamicAllocation.maxExecutors 10000
spark.shuffle.service.enabled true
spark.scheduler.minRegisteredResourcesRatio 0.0

spark.yarn.historyServer.address highmem-m:18080
spark.history.fs.logDirectory hdfs://highmem-m/user/spark/eventlog

spark.executor.cores 2
spark.executor.memory 4655m
spark.yarn.executor.memoryOverhead 465

# Overkill
spark.yarn.am.memory 4655m
spark.yarn.am.memoryOverhead 465

spark.driver.memory 3768m
spark.driver.maxResultSize 1884m
spark.rpc.message.maxSize 512

# Add ALPN for Bigtable
spark.driver.extraJavaOptions
spark.executor.extraJavaOptions

# Disable Parquet metadata caching as its URI re-encoding logic does
# not work for GCS URIs (b/28306549). The net effect of this is that
# Parquet metadata will be read both driver side and executor side.
spark.sql.parquet.cacheMetadata=false

# User-supplied properties.
#Mon Jul 24 23:12:12 UTC 2017
spark.executor.cores=4
spark.executor.memory=18619m
spark.driver.memory=3840m
spark.driver.maxResultSize=1920m
spark.yarn.am.memory=640m
spark.executorEnv.PYTHONHASHSEED=0


我不太确定 User-supplied properties的来源。

编辑:
有关群集的一些其他信息:
我按照Zookeeper-> kafka-> jupyter的顺序使用在 https://github.com/GoogleCloudPlatform/dataproc-initialization-actions上找到的zookeeper,kafka和jupyter初始化操作脚本(不幸的是,我现在没有足够的声誉来发布两个以上的链接)

编辑2:
通过@Dennis的有见地的问题,我们运行了Spark工作,同时特别注意使用的On Heap Storage Memory较高的执行程序。我注意到的是,与其他执行程序相比,始终是工作者#0的执行程序具有显着更高的存储内存使用率。工人#0的执行者的标准输出文件始终为空。这三行在stderr中重复了很多遍:

17/07/27 16:32:01 INFO kafka.utils.VerifiableProperties: Verifying properties
17/07/27 16:32:01 INFO kafka.utils.VerifiableProperties: Property group.id is overridden to
17/07/27 16:32:01 INFO kafka.utils.VerifiableProperties: Property zookeeper.connect is overridden to
17/07/27 16:32:04 INFO kafka.utils.VerifiableProperties: Verifying properties
17/07/27 16:32:04 INFO kafka.utils.VerifiableProperties: Property group.id is overridden to
17/07/27 16:32:04 INFO kafka.utils.VerifiableProperties: Property zookeeper.connect is overridden to
17/07/27 16:32:07 INFO kafka.utils.VerifiableProperties: Verifying properties
17/07/27 16:32:07 INFO kafka.utils.VerifiableProperties: Property group.id is overridden to
17/07/27 16:32:07 INFO kafka.utils.VerifiableProperties: Property zookeeper.connect is overridden to
17/07/27 16:32:09 INFO kafka.utils.VerifiableProperties: Verifying properties
17/07/27 16:32:09 INFO kafka.utils.VerifiableProperties: Property group.id is overridden to
17/07/27 16:32:09 INFO kafka.utils.VerifiableProperties: Property zookeeper.connect is overridden to
17/07/27 16:32:10 INFO kafka.utils.VerifiableProperties: Verifying properties
17/07/27 16:32:10 INFO kafka.utils.VerifiableProperties: Property group.id is overridden to
17/07/27 16:32:10 INFO kafka.utils.VerifiableProperties: Property zookeeper.connect is overridden to
17/07/27 16:32:13 INFO kafka.utils.VerifiableProperties: Verifying properties
17/07/27 16:32:13 INFO kafka.utils.VerifiableProperties: Property group.id is overridden to
17/07/27 16:32:13 INFO kafka.utils.VerifiableProperties: Property zookeeper.connect is overridden to
17/07/27 16:32:14 INFO kafka.utils.VerifiableProperties: Verifying properties
17/07/27 16:32:14 INFO kafka.utils.VerifiableProperties: Property group.id is overridden to
17/07/27 16:32:14 INFO kafka.utils.VerifiableProperties: Property zookeeper.connect is overridden to
17/07/27 16:32:15 INFO kafka.utils.VerifiableProperties: Verifying properties
17/07/27 16:32:15 INFO kafka.utils.VerifiableProperties: Property group.id is overridden to
17/07/27 16:32:15 INFO kafka.utils.VerifiableProperties: Property zookeeper.connect is overridden to
17/07/27 16:32:18 INFO kafka.utils.VerifiableProperties: Verifying properties
17/07/27 16:32:18 INFO kafka.utils.VerifiableProperties: Property group.id is overridden to
17/07/27 16:32:18 INFO kafka.utils.VerifiableProperties: Property zookeeper.connect is overridden to


它似乎每1〜3秒重复一次。

至于来自其他工作程序节点的其他执行程序的stdout和stderr,它们为空。

编辑3:
正如@Dennis的评论中提到的那样,我们保留了Spark作业消耗的Kafka主题,其复制因子为1。我还发现我忘记了在Kafka配置文件中将worker#2添加到zookeeper.connect中,也忘记了为使用者提供来自Spark中来自Kafka的流消息的组ID。我已经修复了这些位置(复制因子为3的重新生成的主题),并观察到现在工作负载主要集中在工作人员#1上。遵循@Dennis的建议,在SSH切换到worker#1之后,我运行了 sudo jps并获得以下输出:

[Removed this section to save character space; it was only the error messages from a failed call to jmap so it didn't hold any useful information]

编辑4:
我现在在工作者#1执行者的stdout文件中看到这一点:

2017-07-27 22:16:24
Full thread dump OpenJDK 64-Bit Server VM (25.131-b11 mixed mode):
===Truncated===
Heap
PSYoungGen total 814592K, used 470009K [0x000000063c180000, 0x000000069e600000, 0x00000007c0000000)
eden space 799744K, 56% used [0x000000063c180000,0x0000000657e53598,0x000000066ce80000)
from space 14848K, 97% used [0x000000069d780000,0x000000069e5ab1b8,0x000000069e600000)
to space 51200K, 0% used [0x0000000698200000,0x0000000698200000,0x000000069b400000)
ParOldGen total 574464K, used 180616K [0x0000000334400000, 0x0000000357500000, 0x000000063c180000)
object space 574464K, 31% used [0x0000000334400000,0x000000033f462240,0x0000000357500000)
Metaspace used 49078K, capacity 49874K, committed 50048K, reserved 1093632K
class space used 6054K, capacity 6263K, committed 6272K, reserved 1048576K




2017-07-27 22:06:44
Full thread dump OpenJDK 64-Bit Server VM (25.131-b11 mixed mode):
===Truncated===
Heap
PSYoungGen total 608768K, used 547401K [0x000000063c180000, 0x000000066a280000, 0x00000007c0000000)
eden space 601088K, 89% used [0x000000063c180000,0x000000065d09c498,0x0000000660c80000)
from space 7680K, 99% used [0x0000000669b00000,0x000000066a2762c8,0x000000066a280000)
to space 36864K, 0% used [0x0000000665a80000,0x0000000665a80000,0x0000000667e80000)
ParOldGen total 535552K, used 199304K [0x0000000334400000, 0x0000000354f00000, 0x000000063c180000)
object space 535552K, 37% used [0x0000000334400000,0x00000003406a2340,0x0000000354f00000)
Metaspace used 48810K, capacity 49554K, committed 49792K, reserved 1093632K
class space used 6054K, capacity 6263K, committed 6272K, reserved 1048576K


错误发生时,工作人员2的执行者收到 SIGNAL TERM并被标记为死亡。这时,它是唯一死掉的执行者。

奇怪的是,大约10分钟后,Spark作业又恢复了。查看Spark UI界面,只有工作人员#1的执行者处于活动状态,其余执行者都死了。第一次发生这种情况。

编辑5:
同样,按照@Dennis的建议(谢谢@Dennis!),这次运行 sudo -u yarn jmap -histo <pid>。大约10分钟后,这是 CoarseGrainedExecutorBackend中内存占用最多的十大类:

 num     #instances         #bytes  class name
----------------------------------------------
1: 244824 358007944 [B
2: 194242 221184584 [I
3: 2062554 163729952 [C
4: 746240 35435976 [Ljava.lang.Object;
5: 738 24194592 [Lorg.apache.spark.unsafe.memory.MemoryBlock;
6: 975513 23412312 java.lang.String
7: 129645 13483080 java.io.ObjectStreamClass
8: 451343 10832232 java.lang.StringBuilder
9: 38880 10572504 [Z
10: 120807 8698104 java.lang.reflect.Field


另外,我遇​​到了一种新型错误,导致执行者死亡。它产生了一些失败的任务,这些任务在Spark UI中突出显示,并在执行程序的stderr中找到了:

17/07/28 00:44:03 ERROR org.apache.spark.executor.Executor: Exception in task 0.0 in stage 6821.0 (TID 2585)
java.lang.AssertionError: assertion failed
at scala.Predef$.assert(Predef.scala:156)
at org.apache.spark.storage.BlockInfo.checkInvariants(BlockInfoManager.scala:84)
at org.apache.spark.storage.BlockInfo.readerCount_$eq(BlockInfoManager.scala:66)
at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2$$anonfun$apply$2.apply(BlockInfoManager.scala:367)
at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2$$anonfun$apply$2.apply(BlockInfoManager.scala:366)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2.apply(BlockInfoManager.scala:366)
at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2.apply(BlockInfoManager.scala:361)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:361)
at org.apache.spark.storage.BlockManager.releaseAllLocksForTask(BlockManager.scala:736)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:342)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
17/07/28 00:44:03 ERROR org.apache.spark.executor.Executor: Exception in task 0.1 in stage 6821.0 (TID 2586)
java.lang.AssertionError: assertion failed
at scala.Predef$.assert(Predef.scala:156)
at org.apache.spark.storage.BlockInfo.checkInvariants(BlockInfoManager.scala:84)
at org.apache.spark.storage.BlockInfo.readerCount_$eq(BlockInfoManager.scala:66)
at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2$$anonfun$apply$2.apply(BlockInfoManager.scala:367)
at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2$$anonfun$apply$2.apply(BlockInfoManager.scala:366)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2.apply(BlockInfoManager.scala:366)
at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2.apply(BlockInfoManager.scala:361)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:361)
at org.apache.spark.storage.BlockManager.releaseAllLocksForTask(BlockManager.scala:736)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:342)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
17/07/28 00:44:03 ERROR org.apache.spark.util.Utils: Uncaught exception in thread stdout writer for /opt/conda/bin/python
java.lang.AssertionError: assertion failed: Block rdd_5480_0 is not locked for reading
at scala.Predef$.assert(Predef.scala:170)
at org.apache.spark.storage.BlockInfoManager.unlock(BlockInfoManager.scala:299)
at org.apache.spark.storage.BlockManager.releaseLock(BlockManager.scala:720)
at org.apache.spark.storage.BlockManager$$anonfun$1.apply$mcV$sp(BlockManager.scala:516)
at org.apache.spark.util.CompletionIterator$$anon$1.completion(CompletionIterator.scala:46)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:35)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:509)
at org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:333)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1954)
at org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:269)
17/07/28 00:44:03 ERROR org.apache.spark.util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[stdout writer for /opt/conda/bin/python,5,main]
java.lang.AssertionError: assertion failed: Block rdd_5480_0 is not locked for reading
at scala.Predef$.assert(Predef.scala:170)
at org.apache.spark.storage.BlockInfoManager.unlock(BlockInfoManager.scala:299)
at org.apache.spark.storage.BlockManager.releaseLock(BlockManager.scala:720)
at org.apache.spark.storage.BlockManager$$anonfun$1.apply$mcV$sp(BlockManager.scala:516)
at org.apache.spark.util.CompletionIterator$$anon$1.completion(CompletionIterator.scala:46)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:35)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:509)
at org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:333)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1954)
at org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:269)


编辑6:
这次,我在运行40分钟后选择了 jmap

 num     #instances         #bytes  class name
----------------------------------------------
1: 23667 391136256 [B
2: 25937 15932728 [I
3: 159174 12750016 [C
4: 334 10949856 [Lorg.apache.spark.unsafe.memory.MemoryBlock;
5: 78437 5473992 [Ljava.lang.Object;
6: 125322 3007728 java.lang.String
7: 40931 2947032 java.lang.reflect.Field
8: 63431 2029792 com.esotericsoftware.kryo.Registration
9: 20897 1337408 com.esotericsoftware.kryo.serializers.UnsafeCacheFields$UnsafeObjectField
10: 20323 975504 java.util.HashMap


这些是 ps ux的结果:

  USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
yarn 601 0.8 0.9 3008024 528812 ? Sl 16:12 1:17 /usr/lib/jvm/java-8-openjdk-amd64/bin/java -Dproc_nodema
yarn 6086 6.3 0.0 96764 24340 ? R 18:37 0:02 /opt/conda/bin/python -m pyspark.daemon
yarn 8036 8.2 0.0 96296 24136 ? S 18:37 0:00 /opt/conda/bin/python -m pyspark.daemon
yarn 8173 9.4 0.0 97108 24444 ? S 18:37 0:00 /opt/conda/bin/python -m pyspark.daemon
yarn 8240 9.0 0.0 96984 24576 ? S 18:37 0:00 /opt/conda/bin/python -m pyspark.daemon
yarn 8329 7.6 0.0 96948 24720 ? S 18:37 0:00 /opt/conda/bin/python -m pyspark.daemon
yarn 8420 8.5 0.0 96240 23788 ? R 18:37 0:00 /opt/conda/bin/python -m pyspark.daemon
yarn 8487 6.0 0.0 96864 24308 ? S 18:37 0:00 /opt/conda/bin/python -m pyspark.daemon
yarn 8554 0.0 0.0 96292 23724 ? S 18:37 0:00 /opt/conda/bin/python -m pyspark.daemon
yarn 8564 0.0 0.0 19100 2448 pts/0 R+ 18:37 0:00 ps ux
yarn 31705 0.0 0.0 13260 2756 ? S 17:56 0:00 bash /hadoop/yarn/nm-local-dir/usercache/<user_name>/app
yarn 31707 0.0 0.0 13272 2876 ? Ss 17:56 0:00 /bin/bash -c /usr/lib/jvm/java-8-openjdk-amd64/bin/java
yarn 31713 0.4 0.7 2419520 399072 ? Sl 17:56 0:11 /usr/lib/jvm/java-8-openjdk-amd64/bin/java -server -Xmx6
yarn 31771 0.0 0.0 13260 2740 ? S 17:56 0:00 bash /hadoop/yarn/nm-local-dir/usercache/<user_name>/app
yarn 31774 0.0 0.0 13284 2800 ? Ss 17:56 0:00 /bin/bash -c /usr/lib/jvm/java-8-openjdk-amd64/bin/java
yarn 31780 11.1 1.4 21759016 752132 ? Sl 17:56 4:31 /usr/lib/jvm/java-8-openjdk-amd64/bin/java -server -Xmx1
yarn 31883 0.1 0.0 96292 27308 ? S 17:56 0:02 /opt/conda/bin/python -m pyspark.daemon


在这种情况下, pidCoarseGrainedExecutorBackEnd31780

编辑7:
在Spark设置中增加 heartbeatInterval并没有任何改变,这事后看来很有意义。

我创建了一个简短的bash脚本,该脚本通过控制台使用者从Kafka读取5秒钟,并将消息写入文本文件。文本文件将上传到Hadoop,Spark从中流式传输。我们通过这种方法测试了Timeout是否与Kafka有关。


从Hadoop流传输并从Spark输出到Kafka导致SocketTimeout
直接从Kafka流式传输而不从Spark输出到Kafka导致SocketTimeout
从Hadoop流式传输但未从Spark输出到Kafka导致SocketTimeout


因此,我们继续假设Kafka与超时无关。

我们安装了Stackdriver Monitoring以查看发生超时时的内存使用情况。指标没有什么真正有趣的;整个内存使用情况看起来相对稳定(最繁忙的节点最多在10%到15%之间徘徊)。

我们猜测可能与工作节点之间的通信有关,这可能是导致问题的原因。目前,我们的数据流量非常低,因此,即使是一名工人也可以相对轻松地处理所有工作负载。

在来自不同群集的Kafka代理进行流式传输时,在单节点群集上运行Spark作业似乎已经停止了SocketTimeout ...,除了上面记录的 AssertionError现在经常发生。

根据@Dennis的建议,这次我创建了一个没有jupyter初始化脚本的新集群(也是单节点),这意味着Spark现在可以在Python v2.7.9上运行(没有Anaconda)。第一次运行时,Spark在短短15秒内遇到了 SocketTimeoutException。第二次运行仅2个多小时,以相同的 AssertionError失败。我开始怀疑这是否是Spark内部的问题。第三次运行大约40分钟,然后进入 SocketTimeoutException

最佳答案

我的一个客户发现Google Cloud Dataproc中的各种生产Pyspark作业(Spark版本2.2.1)间歇性地失败,并且堆栈跟踪与您的非常相似:

ERROR org.apache.spark.api.python.PythonRDD: Error while sending iterator
java.net.SocketTimeoutException: Accept timed out
at java.net.PlainSocketImpl.socketAccept(Native Method)
at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
at java.net.ServerSocket.implAccept(ServerSocket.java:545)
at java.net.ServerSocket.accept(ServerSocket.java:513)
at org.apache.spark.api.python.PythonRDD$$anon$2.run(PythonRDD.scala:711)


我发现在Dataproc群集VM上禁用ipv6似乎可以解决此问题。一种方法是将这些行添加到Dataproc初始化脚本中,以便它们在集群创建时运行:

printf "\nnet.ipv6.conf.default.disable_ipv6 = 1\nnet.ipv6.conf.all.disable_ipv6=1\n" >> /etc/sysctl.conf
sysctl -p

关于pyspark - Dataproc上的PySpark通过SocketTimeoutException停止,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45315055/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com