gpt4 book ai didi

elasticsearch - Elasticsearch:具有过滤器的查询是否会受到不在过滤器中的记录的相关性的影响?

转载 作者:行者123 更新时间:2023-12-02 22:21:23 25 4
gpt4 key购买 nike

想象一下,我有三组数据(SetA,SetB,SetC)和三个客户。我的第一个客户可以访问SetA和SetB,第二个客户可以访问SetA和SetC,第三个客户可以使用SetB和SetC。我可以为每个客户创建一个Elasticsearch索引,因此每个索引将具有以下数据集...

索引1索引2索引3
------ ------ ------
SetA SetA SetB
SetB SetC SetC

然后,我根据客户查询正确的索引。这很简单,但是确实涉及数据的重复。

取而代之的是,我可以创建包含所有三组数据的单个索引。

指数
-----
SetA
SetB
SetC

然后,我将在查询中添加过滤,以便只考虑来自正确集合的记录作为结果。这会起作用,但是我担心这个单索引解决方案不会为查询提供与多索引方法相同的结果。

我认为,但很高兴能纠正错误,该索引将在涉及内部评分(如相关性和频率)时考虑到索引中的所有记录。因此,带有过滤的单索引将不会获得与多索引方法相同的结果。这个假设正确吗?

最佳答案

如果您首先是根据客户ID过滤结果,然后仅进行搜索,则不会对相关性和产生任何影响,因此应将这些数据合并到Elasticsearch中,而不是为此目的创建3个不同的索引。

您可以阅读有关query and filter contex t和their impact on the score.的更多信息

让我通过一个小例子向您展示:

索引定义

{
"mappings": {
"properties": {
"setA": {
"type": "text"
},
"setB": {
"type": "text"
},
"setC": {
"type": "text"
},
"customer-id": {
"type": "long"
}
}
}
}

为每个客户索引两个样本文档
{
"setA" : "first customer",
"setB" : "first customer",
"setC" : "",
"customer-id" : 1
}

{
"setA" : "first customer set A",
"setB" : "first customer set B",
"setC" : "",
"customer-id" : 1
}

{
"setA" : "second customer",
"setC" : "second customer",
"customer-id" : 2
}

{
"setA" : "second customer set A",
"setC" : "second customer set C",
"customer-id" : 2
}
{
"setB" : "third customer",
"setC" : "third customer",
"customer-id" : 3
}

{
"setB" : "third customer set A",
"setC" : "third customer set C",
"customer-id" : 3
}

首先过滤第一个客户,然后搜索相关性分数来搜索查询
{
"query": {
"bool": {
"must": [ --> this would match and order according to relevance score
{
"match": {
"setA": "first"
}
}
],
"filter": [ --> this is used for filtering all docs for cust-1
{
"term": {
"customer-id": 1
}
}
]
}
}
}

搜索结果
 "hits": [
{
"_index": "so_query_filter",
"_type": "_doc",
"_id": "1",
"_score": 0.8025915, --> relevance is high
"_source": {
"setA": "first customer",
"setB": "first customer",
"setC": "",
"customer-id": 1 --> only cust-1 doc
}
},
{
"_index": "so_query_filter",
"_type": "_doc",
"_id": "2",
"_score": 0.60996956, -> relavance is low as more words than first
"_source": {
"setA": "first customer set A",
"setB": "first customer set B",
"setC": "",
"customer-id": 1 --> only cust-1 doc
}
}
]

关于elasticsearch - Elasticsearch:具有过滤器的查询是否会受到不在过滤器中的记录的相关性的影响?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60911333/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com