gpt4 book ai didi

scala - Scala:使用Spark读取Elasticsearch中的数组值

转载 作者:行者123 更新时间:2023-12-03 00:34:51 37 4
gpt4 key购买 nike

我正在尝试从Elasticsearch读取数据,但是我想读取的文档包含一个嵌套数组(我想读取)。

我通过以下方式添加了选项“es.read.field.as.array.include”:

val dataframe = reader
.option("es.read.field.as.array.include","arrayField")
.option("es.query", "someQuery")
.load("Index/Document")

但是得到了错误
java.lang.ClassCastException: scala.collection.convert.Wrappers$JListWrapper cannot be cast to java.lang.Float

我应该如何映射数组以读取它?

来自ES的数据样本:
{
"_index": "Index",
"_type": "Document",
"_id": "ID",
"_score": 1,
"_source": {
"currentTime": 1516211640000,
"someField": someValue,
"arrayField": [
{
"id": "000",
"field1": 14,
"field2": 20.23871387052084,
"innerArray": [[ 55.2754,25.1909],[ 55.2754,25.190929],[ 55.27,25.190]]
}, ...
],
"meanError": 0.3082,

}
}

最佳答案

您的样本数据内部数组需要为2个数组列

你可以试试这个样本

val es = spark.read.format("org.elasticsearch.spark.sql")
.option("es.read.field.as.array.include","arrayField,arrayField.innerArray:2")
.option("es.query", "someQuery")
.load("Index/Document")

|-- arrayField: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- field1: long (nullable = true)
| | |-- field2: float (nullable = true)
| | |-- id: string (nullable = true)
| | |-- innerArray: array (nullable = true)
| | | |-- element: array (containsNull = true)
| | | | |-- element: float (containsNull = true)
|-- currentTime: long (nullable = true)
|-- meanError: float (nullable = true)
|-- someField: string (nullable = true)


+--------------------+-------------+---------+---------+
| arrayField| currentTime|meanError|someField|
+--------------------+-------------+---------+---------+
|[[14,20.238714,00...|1516211640000| 0.3082|someValue|
+--------------------+-------------+---------+---------+

关于scala - Scala:使用Spark读取Elasticsearch中的数组值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49820665/

37 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com