gpt4 book ai didi

sparkRPC超时造成任务异常Attemptedtogetexecutorlossreasonforexecutorid17atRPCaddress192.168.48.172:59070,butgotnoresponse.Markingasslavelost.

转载 作者:我是一只小鸟 更新时间:2023-01-15 06:31:21 26 4
gpt4 key购买 nike

日志信息如下

                        
                           Attempted to get executor loss reason for executor id 17 at RPC address 192.168.48.172:59070, but got no response. Marking as slave lost. java.io.IOException: Failed to send RPC 9102760012410878153 to /192.168.48.172:59047: java.nio.channels.ClosedChannelException at org.apache.spark.network.client.TransportClient.lambda$sendRpc$2(TransportClient.java:237) ~[spark-network-common_2.11-2.2.0.jar:2.2.0] at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507) ~[netty-all-4.0.43.Final.jar:4.0.43.Final] at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481) ~[netty-all-4.0.43.Final.jar:4.0.43.Final] at io.netty.util.concurrent.DefaultPromise.access$000(DefaultPromise.java:34) ~[netty-all-4.0.43.Final.jar:4.0.43.Final] at io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:431) ~[netty-all-4.0.43.Final.jar:4.0.43.Final] at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:399) ~[netty-all-4.0.43.Final.jar:4.0.43.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:446) ~[netty-all-4.0.43.Final.jar:4.0.43.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131) ~[netty-all-4.0.43.Final.jar:4.0.43.Final] at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144) ~[netty-all-4.0.43.Final.jar:4.0.43.Final] at java.lang.Thread.run(Thread.java:745) [?:1.8.0_101] Caused by: java.nio.channels.ClosedChannelException at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source) ~[netty-all-4.0.43.Final.jar:4.0.43.Final] 
                        
                      

现象

driver端显示日志内容为RPC通信错误,从而认为心跳超时,执行器被yarn杀掉,该问题有两种解决思路 。

  1. driver或executor内存不足,GC时无法进行RPC通信从而心跳超时,定位方法
  • driver端:查询driver的pid,jstat -gcutil pid查看内存使用情况,或jmap -heap pid查看内存使用
  • executor端:查询executor的pid(可以从spark UI的执行器页面查看到执行器的ip和端口,通过ip和端口查询到executor所在的服务器和pid),根据pid查看内存使用情况
  1. driver所在服务器与executor所在服务器之间的时间相差较多,相差1分钟以上就应该及时修改时间了,究其根本原因也很简单,两台服务器时间相差过大,造成本来就1ms内完成的通信,由于两个java进程计算的时间戳不同,造成driver认为响应超时,目前看大部分文章给的解决方式都是第一种,直接加executor内存,未必能解决问题,我们大部分集群都做了时钟同步,为什么还会造成时间相差很大呢,此时需要查看服务器是否开启了chronyd,如果你使用的是ntp,chronyd会对ntp有干扰,可以关闭chronyd 。

    关闭chronyd方法 。

                                
                                   systemctl disable chronyd systemctl stop chronyd systemctl enable ntpd systemctl start ntpd 
                                
                              

最后此篇关于sparkRPC超时造成任务异常Attemptedtogetexecutorlossreasonforexecutorid17atRPCaddress192.168.48.172:59070,butgotnoresponse.Markingasslavelost.的文章就讲到这里了,如果你想了解更多关于sparkRPC超时造成任务异常Attemptedtogetexecutorlossreasonforexecutorid17atRPCaddress192.168.48.172:59070,butgotnoresponse.Markingasslavelost.的内容请搜索CFSDN的文章或继续浏览相关文章,希望大家以后支持我的博客! 。

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com