gpt4 book ai didi

scala - 如何使用Scala在Spark中表示 Elasticsearch DSL查询?

转载 作者:行者123 更新时间:2023-12-03 01:58:44 25 4
gpt4 key购买 nike

如何使用scala表示以下在spark中显示的 Elasticsearch 查询:

请求

GET importsmethods/typeimportsmethods/_search?search_type=count
{
"size": 0,
"aggs": {
"group_by_imports": {
"terms": {
"field": "tokens.importName"
}
}
}

}

响应
{
"took": 2064,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1297362,
"max_score": 0,
"hits": []
},
"aggregations": {
"group_by_imports": {
"doc_count_error_upper_bound": 4939,
"sum_other_doc_count": 1960640,
"buckets": [
{
"key": "java.util.list",
"doc_count": 129986
},
{
"key": "java.util.map",
"doc_count": 103525
}
]
}
}
}

Spark代码
val conf = new SparkConf().setMaster("local[2]").setAppName("test")

conf.set("es.nodes", "localhost")
conf.set("es.port", "9200")
conf.set("es.index.auto.create","true")
conf.set("es.resource","importsmethods/typeimportsmethods/_search")
conf.set("es.query","""?search_type=count&ignore_unavailable=true {
"size": 0,
"aggs": {
"group_by_imports": {
"terms": {
"field": "tokens.importName"
}
}
}
}""")

sc = new SparkContext(conf)
val importMethodsRDD = sc.esRDD();
val rddVal = importMethodsRDD.map(x => x._2)

rddVal.saveAsTextFile("../")

异​​常

Exception in thread "main" org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Index [importsmethods/typeimportsmethods/_search] missing and settings [es.field.read.empty.as.null] is set to false

最佳答案

您只需要修复以下行,es.resource仅应为 index/type ,而无需添加_search端点

conf.set("es.resource","importsmethods/typeimportsmethods")

另外,在 es.query中,您不需要查询字符串,只需要查询DSL部分:
conf.set("es.query","""{
"size": 0,
"aggs": {
"group_by_imports": {
"terms": {
"field": "tokens.importName"
}
}
}
}""")

关于scala - 如何使用Scala在Spark中表示 Elasticsearch DSL查询?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34606449/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com