gpt4 book ai didi

apache-spark - Pyspark 卡在简单的命令上

转载 作者:行者123 更新时间:2023-12-05 07:32:51 24 4
gpt4 key购买 nike

Pyspark 因以下输入而挂起。注意它不会卡在 Scala 控制台上。

Python 3.6.5 (default, Jun 17 2018, 12:13:06) 
[GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
2018-06-21 10:27:37 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.3.1
/_/

Using Python version 3.6.5 (default, Jun 17 2018 12:13:06)
SparkSession available as 'spark'.
>>> sc.parallelize((1,1)).count() <-----------HANGS!

有人知道为什么会这样吗?我尝试重新安装所有东西,java、spark、自制软件,删除整个 /usr/local 目录。完全没有想法。

不同的测试程序

from pyspark import SparkContext
sc = SparkContext.getOrCreate()
x = sc.parallelize((1,1)).count()
print("count: ", x)

Spark 提交的输出

Spark-Submit output, with a similar test python file output
2018-06-21 10:31:47 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-06-21 10:31:47 INFO SparkContext:54 - Running Spark version 2.3.1
2018-06-21 10:31:47 INFO SparkContext:54 - Submitted application: test_spark.py
2018-06-21 10:31:47 INFO SecurityManager:54 - Changing view acls to: jonedoe
2018-06-21 10:31:47 INFO SecurityManager:54 - Changing modify acls to: jonedoe
2018-06-21 10:31:47 INFO SecurityManager:54 - Changing view acls groups to:
2018-06-21 10:31:47 INFO SecurityManager:54 - Changing modify acls groups to:
2018-06-21 10:31:47 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jonedoe); groups with view permissions: Set(); users with modify permissions: Set(jonedoe); groups with modify permissions: Set()
2018-06-21 10:31:47 INFO Utils:54 - Successfully started service 'sparkDriver' on port 61556.
2018-06-21 10:31:47 INFO SparkEnv:54 - Registering MapOutputTracker
2018-06-21 10:31:47 INFO SparkEnv:54 - Registering BlockManagerMaster
2018-06-21 10:31:47 INFO BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2018-06-21 10:31:47 INFO BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up
2018-06-21 10:31:47 INFO DiskBlockManager:54 - Created local directory at /private/var/folders/gq/tm5q47gn6x363h5m_c86my_00000gp/T/blockmgr-5c0bfcf2-9009-46b5-bcd7-4fa5ec605a89
2018-06-21 10:31:47 INFO MemoryStore:54 - MemoryStore started with capacity 366.3 MB
2018-06-21 10:31:47 INFO SparkEnv:54 - Registering OutputCommitCoordinator
2018-06-21 10:31:48 INFO log:192 - Logging initialized @2297ms
2018-06-21 10:31:48 INFO Server:346 - jetty-9.3.z-SNAPSHOT
2018-06-21 10:31:48 INFO Server:414 - Started @2378ms
2018-06-21 10:31:48 INFO AbstractConnector:278 - Started ServerConnector@84802a{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2018-06-21 10:31:48 INFO Utils:54 - Successfully started service 'SparkUI' on port 4040.
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@79c67e6f{/jobs,null,AVAILABLE,@Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6889c329{/jobs/json,null,AVAILABLE,@Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3a8c9a58{/jobs/job,null,AVAILABLE,@Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6e04f8ff{/jobs/job/json,null,AVAILABLE,@Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4832ee9d{/stages,null,AVAILABLE,@Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1632f399{/stages/json,null,AVAILABLE,@Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@398a3a30{/stages/stage,null,AVAILABLE,@Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2eb62024{/stages/stage/json,null,AVAILABLE,@Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4685c478{/stages/pool,null,AVAILABLE,@Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@31053558{/stages/pool/json,null,AVAILABLE,@Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@537d3185{/storage,null,AVAILABLE,@Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4c559cce{/storage/json,null,AVAILABLE,@Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@249b3738{/storage/rdd,null,AVAILABLE,@Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3c2c6906{/storage/rdd/json,null,AVAILABLE,@Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6e7861f{/environment,null,AVAILABLE,@Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@66b4d9e1{/environment/json,null,AVAILABLE,@Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1b6b10f8{/executors,null,AVAILABLE,@Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@44502eca{/executors/json,null,AVAILABLE,@Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7ebd8f21{/executors/threadDump,null,AVAILABLE,@Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3e862ac6{/executors/threadDump/json,null,AVAILABLE,@Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7d29113e{/static,null,AVAILABLE,@Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@388c37ce{/,null,AVAILABLE,@Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@22374681{/api,null,AVAILABLE,@Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@dcbeb70{/jobs/job/kill,null,AVAILABLE,@Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@322ceede{/stages/stage/kill,null,AVAILABLE,@Spark}
2018-06-21 10:31:48 INFO SparkUI:54 - Bound SparkUI to 0.0.0.0, and started at http://ip-192-168-65-180.ec2.internal:4040
2018-06-21 10:31:48 INFO SparkContext:54 - Added file file:/Users/jonedoe/code/test_spark.py at file:/Users/jonedoe/code/test_spark.py with timestamp 1529602308500
2018-06-21 10:31:48 INFO Utils:54 - Copying /Users/jonedoe/code/test_spark.py to /private/var/folders/gq/tm5q47gn6x363h5m_c86my_00000gp/T/spark-99983724-420e-4bc0-ad1f-3bc41bba9114/userFiles-999bdcde-1e5d-4e9a-98ce-c6ecdaee0739/test_spark.py
2018-06-21 10:31:48 INFO Executor:54 - Starting executor ID driver on host localhost
2018-06-21 10:31:48 INFO Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 61557.
2018-06-21 10:31:48 INFO NettyBlockTransferService:54 - Server created on ip-192-168-65-180.ec2.internal:61557
2018-06-21 10:31:48 INFO BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2018-06-21 10:31:48 INFO BlockManagerMaster:54 - Registering BlockManager BlockManagerId(driver, ip-192-168-65-180.ec2.internal, 61557, None)
2018-06-21 10:31:48 INFO BlockManagerMasterEndpoint:54 - Registering block manager ip-192-168-65-180.ec2.internal:61557 with 366.3 MB RAM, BlockManagerId(driver, ip-192-168-65-180.ec2.internal, 61557, None)
2018-06-21 10:31:48 INFO BlockManagerMaster:54 - Registered BlockManager BlockManagerId(driver, ip-192-168-65-180.ec2.internal, 61557, None)
2018-06-21 10:31:48 INFO BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, ip-192-168-65-180.ec2.internal, 61557, None)
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2d1fafea{/metrics/json,null,AVAILABLE,@Spark}
2018-06-21 10:31:49 INFO SparkContext:54 - Starting job: count at /Users/jonedoe/code/test_spark.py:4
2018-06-21 10:31:49 INFO DAGScheduler:54 - Got job 0 (count at /Users/jonedoe/code/test_spark.py:4) with 8 output partitions
2018-06-21 10:31:49 INFO DAGScheduler:54 - Final stage: ResultStage 0 (count at /Users/jonedoe/code/test_spark.py:4)
2018-06-21 10:31:49 INFO DAGScheduler:54 - Parents of final stage: List()
2018-06-21 10:31:49 INFO DAGScheduler:54 - Missing parents: List()
2018-06-21 10:31:49 INFO DAGScheduler:54 - Submitting ResultStage 0 (PythonRDD[1] at count at /Users/jonedoe/code/test_spark.py:4), which has no missing parents
2018-06-21 10:31:49 INFO MemoryStore:54 - Block broadcast_0 stored as values in memory (estimated size 5.0 KB, free 366.3 MB)
2018-06-21 10:31:49 INFO MemoryStore:54 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 3.4 KB, free 366.3 MB)
2018-06-21 10:31:49 INFO BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on ip-192-168-65-180.ec2.internal:61557 (size: 3.4 KB, free: 366.3 MB)
2018-06-21 10:31:49 INFO SparkContext:54 - Created broadcast 0 from broadcast at DAGScheduler.scala:1039
2018-06-21 10:31:49 INFO DAGScheduler:54 - Submitting 8 missing tasks from ResultStage 0 (PythonRDD[1] at count at /Users/jonedoe/code/test_spark.py:4) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7))
2018-06-21 10:31:49 INFO TaskSchedulerImpl:54 - Adding task set 0.0 with 8 tasks
2018-06-21 10:31:49 INFO TaskSetManager:54 - Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7839 bytes)
2018-06-21 10:31:49 INFO TaskSetManager:54 - Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 7839 bytes)
2018-06-21 10:31:49 INFO TaskSetManager:54 - Starting task 2.0 in stage 0.0 (TID 2, localhost, executor driver, partition 2, PROCESS_LOCAL, 7839 bytes)
2018-06-21 10:31:49 INFO TaskSetManager:54 - Starting task 3.0 in stage 0.0 (TID 3, localhost, executor driver, partition 3, PROCESS_LOCAL, 7858 bytes)
2018-06-21 10:31:49 INFO TaskSetManager:54 - Starting task 4.0 in stage 0.0 (TID 4, localhost, executor driver, partition 4, PROCESS_LOCAL, 7839 bytes)
2018-06-21 10:31:49 INFO TaskSetManager:54 - Starting task 5.0 in stage 0.0 (TID 5, localhost, executor driver, partition 5, PROCESS_LOCAL, 7839 bytes)
2018-06-21 10:31:49 INFO TaskSetManager:54 - Starting task 6.0 in stage 0.0 (TID 6, localhost, executor driver, partition 6, PROCESS_LOCAL, 7839 bytes)
2018-06-21 10:31:49 INFO TaskSetManager:54 - Starting task 7.0 in stage 0.0 (TID 7, localhost, executor driver, partition 7, PROCESS_LOCAL, 7858 bytes)
2018-06-21 10:31:49 INFO Executor:54 - Running task 3.0 in stage 0.0 (TID 3)
2018-06-21 10:31:49 INFO Executor:54 - Running task 2.0 in stage 0.0 (TID 2)
2018-06-21 10:31:49 INFO Executor:54 - Running task 4.0 in stage 0.0 (TID 4)
2018-06-21 10:31:49 INFO Executor:54 - Running task 1.0 in stage 0.0 (TID 1)
2018-06-21 10:31:49 INFO Executor:54 - Running task 6.0 in stage 0.0 (TID 6)
2018-06-21 10:31:49 INFO Executor:54 - Running task 7.0 in stage 0.0 (TID 7)
2018-06-21 10:31:49 INFO Executor:54 - Running task 0.0 in stage 0.0 (TID 0)
2018-06-21 10:31:49 INFO Executor:54 - Running task 5.0 in stage 0.0 (TID 5)
2018-06-21 10:31:49 INFO Executor:54 - Fetching file:/Users/jonedoe/code/test_spark.py with timestamp 1529602308500
2018-06-21 10:31:49 INFO Utils:54 - /Users/jonedoe/code/test_spark.py has been previously copied to /private/var/folders/gq/tm5q47gn6x363h5m_c86my_00000gp/T/spark-99983724-420e-4bc0-ad1f-3bc41bba9114/userFiles-999bdcde-1e5d-4e9a-98ce-c6ecdaee0739/test_spark.py
2018-06-21 10:31:49 INFO PythonRunner:54 - Times: total = 397, boot = 389, init = 8, finish = 0
2018-06-21 10:31:49 INFO PythonRunner:54 - Times: total = 399, boot = 396, init = 3, finish = 0
2018-06-21 10:31:49 INFO PythonRunner:54 - Times: total = 406, boot = 403, init = 3, finish = 0
2018-06-21 10:31:49 INFO PythonRunner:54 - Times: total = 413, boot = 410, init = 3, finish = 0
2018-06-21 10:31:49 INFO PythonRunner:54 - Times: total = 420, boot = 417, init = 3, finish = 0
2018-06-21 10:31:49 INFO PythonRunner:54 - Times: total = 426, boot = 423, init = 2, finish = 1
2018-06-21 10:31:49 INFO PythonRunner:54 - Times: total = 433, boot = 430, init = 3, finish = 0
2018-06-21 10:31:49 INFO PythonRunner:54 - Times: total = 441, boot = 437, init = 3, finish = 1
2018-06-21 10:31:49 INFO Executor:54 - Finished task 5.0 in stage 0.0 (TID 5). 1267 bytes result sent to driver
2018-06-21 10:31:49 INFO Executor:54 - Finished task 2.0 in stage 0.0 (TID 2). 1267 bytes result sent to driver
2018-06-21 10:31:49 INFO Executor:54 - Finished task 3.0 in stage 0.0 (TID 3). 1267 bytes result sent to driver
2018-06-21 10:31:49 INFO Executor:54 - Finished task 6.0 in stage 0.0 (TID 6). 1267 bytes result sent to driver
2018-06-21 10:31:49 INFO Executor:54 - Finished task 7.0 in stage 0.0 (TID 7). 1267 bytes result sent to driver
2018-06-21 10:31:49 INFO Executor:54 - Finished task 4.0 in stage 0.0 (TID 4). 1267 bytes result sent to driver
2018-06-21 10:31:49 INFO Executor:54 - Finished task 1.0 in stage 0.0 (TID 1). 1310 bytes result sent to driver
2018-06-21 10:31:49 INFO Executor:54 - Finished task 0.0 in stage 0.0 (TID 0). 1310 bytes result sent to driver
2018-06-21 10:31:49 INFO TaskSetManager:54 - Finished task 5.0 in stage 0.0 (TID 5) in 580 ms on localhost (executor driver) (1/8)
2018-06-21 10:31:49 INFO TaskSetManager:54 - Finished task 3.0 in stage 0.0 (TID 3) in 586 ms on localhost (executor driver) (2/8)
2018-06-21 10:31:49 INFO TaskSetManager:54 - Finished task 2.0 in stage 0.0 (TID 2) in 587 ms on localhost (executor driver) (3/8)
2018-06-21 10:31:49 INFO TaskSetManager:54 - Finished task 6.0 in stage 0.0 (TID 6) in 583 ms on localhost (executor driver) (4/8)
2018-06-21 10:31:50 INFO TaskSetManager:54 - Finished task 4.0 in stage 0.0 (TID 4) in 586 ms on localhost (executor driver) (5/8)
2018-06-21 10:31:50 INFO TaskSetManager:54 - Finished task 7.0 in stage 0.0 (TID 7) in 584 ms on localhost (executor driver) (6/8)
2018-06-21 10:31:50 INFO TaskSetManager:54 - Finished task 0.0 in stage 0.0 (TID 0) in 608 ms on localhost (executor driver) (7/8)
2018-06-21 10:31:50 INFO TaskSetManager:54 - Finished task 1.0 in stage 0.0 (TID 1) in 590 ms on localhost (executor driver) (8/8)
2018-06-21 10:31:50 INFO TaskSchedulerImpl:54 - Removed TaskSet 0.0, whose tasks have all completed, from pool
2018-06-21 10:31:50 INFO DAGScheduler:54 - ResultStage 0 (count at /Users/jonedoe/code/test_spark.py:4) finished in 0.774 s
2018-06-21 10:31:50 INFO DAGScheduler:54 - Job 0 finished: count at /Users/jonedoe/code/test_spark.py:4, took 0.825530 s

卡在此处.........

最佳答案

看来我的杀毒软件 (Bitdefender) 是罪魁祸首。

出于某种原因,它阻止了 Spark 。

关于apache-spark - Pyspark 卡在简单的命令上,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50974452/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com