gpt4 book ai didi

python - pyspark使用saveAsNewAPIHadoopFile将dstream数据写入es得到警告

转载 作者:行者123 更新时间:2023-12-01 08:11:45 24 4
gpt4 key购买 nike

这是我的代码:

  es_write_conf = {
¦ "es.nodes" : ES_IP,
¦ "es.port" : ES_PORT,
¦ "es.resource" : "%s/%s" % (index, doc_type),
¦ "es.input.json": "true",
¦ # "es.mapping.rich.date": "true"
¦ # "es.mapping.id": "guid"
}

dstream.foreachRDD(lambda es_rdd: es_rdd.saveAsNewAPIHadoopFile(
¦ path="-",
¦ outputFormatClass="org.elasticsearch.hadoop.mr.EsOutputFormat",
¦ keyClass="org.apache.hadoop.io.NullWritable",
¦ valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable",
¦ conf=es_write_conf))

我收到了这个警告

WARN EsOutputFormat: Speculative execution enabled for reducer - consider disabling it to prevent data corruption

如何解决这个警告?

最佳答案

我通过这个解决了问题

     es_write_conf = {
¦ "es.nodes" : ES_IP,
¦ "es.port" : ES_PORT,
¦ "es.resource" : "%s/%s" % (index, doc_type),
¦ "es.input.json": "true",
¦ "mapred.reduce.tasks.speculative.execution": "false",
¦ "mapred.map.tasks.speculative.execution": "false",
¦ # "es.mapping.rich.date": "true"
¦ # "es.mapping.id": "guid"
}

dstream.foreachRDD(lambda es_rdd: es_rdd.saveAsNewAPIHadoopFile(
¦ path="-",
¦ outputFormatClass="org.elasticsearch.hadoop.mr.EsOutputFormat",
¦ keyClass="org.apache.hadoop.io.NullWritable",
¦ valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable",
¦ conf=es_write_conf))

关于python - pyspark使用saveAsNewAPIHadoopFile将dstream数据写入es得到警告,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55205679/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com