gpt4 book ai didi

python - spark python 脚本不写入 hbase

转载 作者:太空宇宙 更新时间:2023-11-04 03:19:38 27 4
gpt4 key购买 nike

我正在尝试从此 blog 运行脚本

import sys  
import json
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
def SaveRecord(rdd):
host = 'sparkmaster.example.com'
table = 'cats'
keyConv = "org.apache.spark.examples.pythonconverters.StringToImmutableBytesWritableConverter"
valueConv = "org.apache.spark.examples.pythonconverters.StringListToPutConverter"
conf = {"hbase.zookeeper.quorum": host,
"hbase.mapred.outputtable": table,
"mapreduce.outputformat.class": "org.apache.hadoop.hbase.mapreduce.TableOutputFormat",
"mapreduce.job.output.key.class": "org.apache.hadoop.hbase.io.ImmutableBytesWritable",
"mapreduce.job.output.value.class": "org.apache.hadoop.io.Writable"}
datamap = rdd.map(lambda x: (str(json.loads(x)["id"]),[str(json.loads(x)["id"]),"cfamily","cats_json",x]))
datamap.saveAsNewAPIHadoopDataset(conf=conf,keyConverter=keyConv,valueConverter=valueConv)

if __name__ == "__main__":
if len(sys.argv) != 3:
print("Usage: StreamCatsToHBase.py <hostname> <port>")
exit(-1)

sc = SparkContext(appName="StreamCatsToHBase")
ssc = StreamingContext(sc, 1)
lines = ssc.socketTextStream(sys.argv[1], int(sys.argv[2]))
lines.foreachRDD(SaveRecord)

ssc.start() # Start the computation
ssc.awaitTermination() # Wait for the computation to terminate

我无法运行它。我已经尝试了三种不同的命令行选项,但没有一个产生输出,也没有将数据写入 hbase 表

这是我尝试过的命令行选项

spark-submit --jars/usr/local/spark/lib/spark-examples-1.5.2-hadoop2.4.0.jar --jars/usr/local/hbase/lib/hbase-examples- 1.1.2.jar sp_json.py localhost 2389 > sp_json.log

spark-submit --driver-class-path/usr/local/spark/lib/spark-examples-1.5.2-hadoop2.4.0.jar sp_json.py localhost 2389 > sp_json.log

spark-submit --driver-class-path/usr/local/spark/lib/spark-examples-1.5.2-hadoop2.4.0.jar --jars/usr/local/hbase/lib/hbase-examples-1.1.2.jar sp_json.py 本地主机 2389 > sp_json.log

这是 logfile .它太冗长了。 Apache spark 吐出的信息太多,这也是调试困难的原因之一。

最佳答案

最后使用以下命令语法让它工作 spark-submit --jars/usr/local/spark/lib/spark-examples-1.5.2-hadoop2.4.0.jar,/usr/local/hbase/lib/hbase-examples-1.1.2.jar sp_json.py localhost 2399 > sp_json.log

关于python - spark python 脚本不写入 hbase,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35097686/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com