gpt4 book ai didi

python - 运行我的第一个Spark Python程序错误

转载 作者:行者123 更新时间:2023-12-02 20:52:44 25 4
gpt4 key购买 nike

我一直在eclipse上使用Python在Spark(基于Hadoop 2.7)中工作,我正在尝试运行示例示例“字数统计”,这是我的代码:
#进口
#注意未使用的导入(以及未使用的变量),
#请全部注释,否则,执行时会出现任何错误。
#请注意,指令“@PydevCodeAnalysisIgnore”和“@UnusedImport”都没有
#将能够解决该问题。
#from pyspark.mllib.clustering导入KMeans
从pyspark导入SparkConf,SparkContext
导入操作系统

# Configure the Spark environment
sparkConf = SparkConf().setAppName("WordCounts").setMaster("local")
sc = SparkContext(conf = sparkConf)

# The WordCounts Spark program
textFile = sc.textFile(os.environ["SPARK_HOME"] + "/README.md")
wordCounts = textFile.flatMap(lambda line: line.split()).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a+b)
for wc in wordCounts.collect(): print wc

然后我得到以下错误:
17/08/07 12:28:13 WARN NativeCodeLoader: Unable to load native-hadoop     library for your platform... using builtin-java classes where applicable
17/08/07 12:28:16 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Traceback (most recent call last):
File "/home/hduser/eclipse-workspace/PythonSpark/src/WordCounts.py", line 12, in <module>
sc = SparkContext(conf = sparkConf)
File "/usr/local/spark/python/pyspark/context.py", line 118, in __init__
conf, jsc, profiler_cls)
File "/usr/local/spark/python/pyspark/context.py", line 186, in _do_init
self._accumulatorServer = accumulators._start_update_server()
File "/usr/local/spark/python/pyspark/accumulators.py", line 259, in _start_update_server
server = AccumulatorServer(("localhost", 0), _UpdateRequestHandler)
File "/usr/lib/python2.7/SocketServer.py", line 417, in __init__
self.server_bind()
File "/usr/lib/python2.7/SocketServer.py", line 431, in server_bind
self.socket.bind(self.server_address)
File "/usr/lib/python2.7/socket.py", line 228, in meth
return getattr(self._sock,name)(*args)
socket.gaierror: [Errno -3] Temporary failure in name resolution

任何帮助?我可以使用spark-shell使用Scala运行任何spark项目,并且在eclipse上也可以运行任何(non spark)python程序,而不会出现错误
我认为我的问题是与pyspark有关系吗?

最佳答案

你可以试试看,只要创建SparkContext就足够了,它可以正常工作。

sc = SparkContext()
# The WordCounts Spark program
textFile = sc.textFile("/home/your/path/Test.txt")// OR on File-->right click get the path paste here
wordCounts = textFile.flatMap(lambda line: line.split()).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a+b)
for wc in wordCounts.collect():
print wc

关于python - 运行我的第一个Spark Python程序错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45544920/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com