gpt4 book ai didi

elasticsearch - 在 Elasticsearch 中查找不同的内部对象

转载 作者:行者123 更新时间:2023-12-02 23:01:36 24 4
gpt4 key购买 nike

我们试图在 Elasticsearch 中找到不同的内部对象。这将是我们案例的最小示例。我们遇到了类似以下映射的问题(更改类型或索引或添加新字段不会有问题,但结构应保持原样):

{
"building": {
"properties": {
"street": {
"type": "string",
"store": "yes",
"index": "not_analyzed"
},
"house number": {
"type": "string",
"store": "yes",
"index": "not_analyzed"
},
"city": {
"type": "string",
"store": "yes",
"index": "not_analyzed"
},
"people": {
"type": "object",
"store": "yes",
"index": "not_analyzed",
"properties": {
"firstName": {
"type": "string",
"store": "yes",
"index": "not_analyzed"
},
"lastName": {
"type": "string",
"store": "yes",
"index": "not_analyzed"
}
}
}
}
}
}

假设我们有这个示例数据:

{
"buildings": [
{
"street": "Baker Street",
"house number": "221 B",
"city": "London",
"people": [
{
"firstName": "John",
"lastName": "Doe"
},
{
"firstName": "Jane",
"lastName": "Doe"
}
]
},
{
"street": "Baker Street",
"house number": "5",
"city": "London",
"people": [
{
"firstName": "John",
"lastName": "Doe"
}
]
},
{
"street": "Garden Street",
"house number": "1",
"city": "London",
"people": [
{
"firstName": "Jane",
"lastName": "Smith"
}
]
}
]
}

当我们查询街道“Baker Street”(以及所需的任何其他选项)时,我们希望获得以下列表:

[
{
"firstName": "John",
"lastName": "Doe"
},
{
"firstName": "Jane",
"lastName": "Doe"
}
]

格式没有太大关系,但我们应该能够解析名字和姓氏。只是,由于我们的实际数据集要大得多,我们需要条目是不同的。

我们正在使用 Elasticsearch 1.7。

最佳答案

我们终于解决了问题。

我们的解决方案(如我们所料)是一个预先计算的 people_all 字段。但是,我们没有使用 copy_totransform,而是像在导入数据时编写其他字段一样编写它。该字段如下所示:

"people": {
"type": "nested",
..
"properties": {
"firstName": {
"type": "string",
"store": "yes",
"index": "not_analyzed"
},
"lastName": {
"type": "string",
"store": "yes",
"index": "not_analyzed"
},
"people_all": {
"type": "string",
"index": "not_analyzed"
}
}
}

请注意people_all字段的"index": "not_analyzed"。这对于拥有完整的桶很重要。如果您不使用它,我们的示例将返回 3 个桶“john”、“jane”和“doe”。

写入这个新字段后,我们可以按如下方式运行聚合:

{
"size": 0,
"query": {
"term": {
"street": "Baker Street"
}
},
"aggs": {
"people_distinct": {
"nested": {
"path": "people"
},
"aggs": {
"people_all_distinct": {
"terms": {
"field": "people.people_all",
"size": 0
}
}
}
}
}
}

我们返回以下响应:

{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.0,
"hits": []
},
"aggregations": {
"people_distinct": {
"doc_count": 3,
"people_name_distinct": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "John Doe",
"doc_count": 2
},
{
"key": "Jane Doe",
"doc_count": 1
}
]
}
}
}
}

在响应中的桶中,我们现在能够创建不同的人物对象。

如果有更好的方法来实现我们的目标,请告诉我们。解析存储桶不是最佳解决方案,在每个存储桶中包含字段 firstNamelastName 会更奇特。

关于elasticsearch - 在 Elasticsearch 中查找不同的内部对象,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33349315/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com