gpt4 book ai didi

php - Elasticsearch 基于嵌套对象的过滤和计数操作

转载 作者:行者123 更新时间:2023-12-02 22:09:55 25 4
gpt4 key购买 nike

我是 Elasticsearch 的新手。尝试将其用于分析计算。我不知道,是否可以这样做,但是,我正在尝试寻找购买次数为 0 的客户。我将订单存储为每个客户的嵌套对象数组。在这里,您可能会找到客户索引的示例映射属性:

"first_name" => [
"type" => "text"
],
"last_name" => [
"type"=> "text"
],
"email" => [
"type"=> "text"
],
"total_spent" => [
"type"=> "text"
],
"aov" => [
"type"=> "float"
],
"orders_count" => [
"type"=> "integer"
],
"orders" => [
"type" => "nested",
"properties" => [
"order_id" => [
"type"=>"text"
],
"total_price" => [
"type"=>"float"
]
]
]
示例客户索引:
    [
{
"_index":"customers_index",
"_type":"_doc",
"_id":"1",
"_score":1,
"_source":{
"first_name":"Stephen",
"last_name":"Long",
"email":"egnition_sample_91@egnition.com",
"total_spent":"0.00",
"aov":0,
"orders":[]
}
},
{
"_index":"customers_index",
"_type":"_doc",
"_id":"2",
"_score":1,
"_source":{
"first_name":"Reece",
"last_name":"Dixon",
"email":"egnition_sample_57@egnition.com",
"total_spent":"0.10",
"aov":"0.1",
"orders":[
{
"total_price":"0.10",
"placed_at":"2020-09-24T20:08:35.000000Z",
"order_id":2723671867546
}
]
}
},
{
"_index":"customers_index",
"_type":"_doc",
"_id":"3",
"_score":1,
"_source":{
"first_name":"John",
"last_name":"Marshall",
"email":"egnition_sample_94@egnition.com",
"total_spent":"0.10",
"aov":"0.04",
"orders":[
{
"total_price":"0.10",
"placed_at":"2020-09-24T20:10:52.000000Z",
"order_id":2723675930778
},
{
"total_price":"0.30",
"placed_at":"2020-09-24T20:09:45.000000Z",
"order_id":2723673899162
},
{
"total_price":"0.10",
"placed_at":"2020-09-16T09:55:22.000000Z",
"order_id":2704717414554
}
]
}
}
]
首先,我想问一下,您认为这种映射是否符合 Elasticsearch 的性质?例如,我可以按特定日期范围对客户进行分组,并将 total_spent 总和作为聚合数据。但是,我想了解的是,是否可以通过特定日期范围的过滤嵌套订单数组找到没有订单的客户?你认为,这种查询,有一些性能问题吗?
我对 nosql 数据库不熟悉。我是一个 RDBMS 人。因此,我试图将 Elastic Search 的概念理解为分析数据库。
感谢您的回复
编辑:
我正在尝试计算对象之间指定日期范围的过滤器内的嵌套对象。在elasticsearch上这样做是否可能并且有意义?简单地说,我想查看在指定日期内输入的具有 1 个订单或多个订单的客户。
我知道如何获取每日客户数,但是如果我想计算在一组日结单中在指定日期范围内有 1 个订单的客户怎么办?
我预期的可能响应:
{
...
"aggregations":[
{
"date":"2020-09-01",
"total_customers_zero_purchased":15
}
...
]
}

最佳答案

这里提出了很多问题,所以我将专注于最重要的部分。
首先,习惯上制作 .keyword 类型的某些文本字段。所以我们以后可以对它们进行聚合。这意味着:

PUT customers_index
{
"mappings": {
"properties": {
"email": {
"type": "keyword" <--
}
}
}
}
之后我们可以继续我们的查询,但必须注意 当我们迭代日期范围时,我们需要指定一个日期字段 .意义:
  • 迭代范围是根据可用/当前值自动构建的(我们可以 filter 来限制其范围)
  • 以及当文档执行时 不是 包含给定范围内的日期,可以理解的是,跳过 .

  • 实际上,我们无法获得每日滚动聚合(因为我们不知道我们不知道什么),而只能获得单日指标。例如
    GET customers_index/_search
    {
    "size": 0,
    "aggs": {
    "multibucket_simulator": {
    "filters": {
    "filters": {
    "all": {
    "match_all": {}
    }
    }
    },
    "aggs": {
    "all_customers": {
    "cardinality": {
    "field": "email"
    }
    },
    "customers_who_purchased_at_date": {
    "filter": {
    "nested": {
    "path": "orders",
    "query": {
    "range": {
    "orders.placed_at": {
    "gte": "2020-09-16T00:00:00.000000Z",
    "lt": "2020-09-26T00:00:00.000000Z"
    }
    }
    }
    }
    },
    "aggs": {
    "customer_count": {
    "cardinality": {
    "field": "email"
    }
    }
    }
    },
    "total_customers_zero_purchased": {
    "bucket_script": {
    "buckets_path": {
    "all": "all_customers.value",
    "filtered": "customers_who_purchased_at_date>customer_count.value"
    },
    "script": "params.all - params.filtered"
    }
    }
    }
    }
    }
    }
    屈服
    "aggregations" : {
    "multibucket_simulator" : {
    ...
    "buckets" : {
    "all" : {
    ...
    "customers_who_purchased_at_date" : {
    ...
    },
    "all_customers" : {
    ...
    },
    "total_customers_zero_purchased" : { <---
    "value" : 1.0
    }
    }
    }
    }
    }
    从而回答这个问题:

    How many customers did not purchase anything between 09/16 and 09/25 inclusive?

    关于php - Elasticsearch 基于嵌套对象的过滤和计数操作,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64289136/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com