gpt4 book ai didi

elasticsearch - 如何渗透 simple_query_string/query_string 查询

转载 作者:行者123 更新时间:2023-11-29 02:48:30 24 4
gpt4 key购买 nike

索引:

{
"settings": {
"index.percolator.map_unmapped_fields_as_text": true,
},
"mappings": {
"properties": {
"query": {
"type": "percolator"
}
}
}
}

这个测试过滤器查询有效

{
"query": {
"match": {
"message": "blah"
}
}
}

这个查询不起作用

{
"query": {
"simple_query_string": {
"query": "bl*"
}
}
}

结果:

{"took":15,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":{"value":1,"relation":"eq"},"max_score":0.13076457,"hits":[{"_index":"my-index","_type":"_doc","_id":"1","_score":0.13076457,"_source":{"query":{"match":{"message":"blah"}}},"fields":{"_percolator_document_slot":[0]}}]}}

为什么这个 simple_query_string 查询与文档不匹配?

最佳答案

我也不明白你在问什么。可能是你不是很了解percolator?这是我刚刚试过的一个例子。

假设您有一个索引 - 我们称之为 test - 您希望在其中索引一些文档。该索引具有以下映射(只是我测试设置中的随机测试索引):

{  
"settings": {
"analysis": {
"filter": {
"email": {
"type": "pattern_capture",
"preserve_original": true,
"patterns": [
"([^@]+)",
"(\\p{L}+)",
"(\\d+)",
"@(.+)",
"([^-@]+)"
]
}
},
"analyzer": {
"email": {
"tokenizer": "uax_url_email",
"filter": [
"email",
"lowercase",
"unique"
]
}
}
}
},
"mappings": {
"properties": {
"code": {
"type": "long"
},
"date": {
"type": "date"
},
"part": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"val": {
"type": "long"
},
"email": {
"type": "text",
"analyzer": "email"
}
}
}
}

您注意到它有一个自定义的 email 分析器,可以将类似 foo@bar.com 的内容拆分为这些标记:foo@bar.com , foo, bar.com, bar, com.

如文档所述,您可以创建一个单独的过滤器索引,它只包含您的过滤器查询,而不是文档本身。而且,即使过滤器索引不包含文档本身,它也应该保存应该保存文档的索引的映射(在我们的例子中是 test)。

这是过滤器索引(我称之为 percolator_index)的映射,它还有用于拆分 email 字段的特殊分析器:

{  
"settings": {
"analysis": {
"filter": {
"email": {
"type": "pattern_capture",
"preserve_original": true,
"patterns": [
"([^@]+)",
"(\\p{L}+)",
"(\\d+)",
"@(.+)",
"([^-@]+)"
]
}
},
"analyzer": {
"email": {
"tokenizer": "uax_url_email",
"filter": [
"email",
"lowercase",
"unique"
]
}
}
}
},
"mappings": {
"properties": {
"query": {
"type": "percolator"
},
"code": {
"type": "long"
},
"date": {
"type": "date"
},
"part": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"val": {
"type": "long"
},
"email": {
"type": "text",
"analyzer": "email"
}
}
}
}

它的映射和设置与我的原始索引几乎相同,唯一的区别是添加到映射中的附加 query 字段是 percolator 类型。

您感兴趣的查询 - simple_query_string - 应该进入 percolator_index 内的文档。像这样:

PUT /percolator_index/_doc/1?refresh
{
"query": {
"simple_query_string" : {
"query" : "month foo@bar.com",
"fields": ["part", "email"]
}
}
}

为了让它更有趣,我在其中添加了 email 字段,以便在查询中专门搜索(默认情况下,所有这些都被搜索)。

现在,目标是针对来自您的过滤器索引的此 simple_query_string 查询来测试最终应进入 test 索引的文档。例如:

GET /percolator_index/_search
{
"query": {
"percolate": {
"field": "query",
"document": {
"date":"2004-07-31T11:57:52.000Z","part":"month","code":109,"val":0,"email":"foo@bar.com"
}
}
}
}

document 下的内容显然是您 future (尚不存在)的文档。这将与上面定义的 simple_query_string 相匹配,并将产生匹配:

{
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.39324823,
"hits": [
{
"_index": "percolator_index",
"_type": "_doc",
"_id": "1",
"_score": 0.39324823,
"_source": {
"query": {
"simple_query_string": {
"query": "month foo@bar.com",
"fields": [
"part",
"email"
]
}
}
},
"fields": {
"_percolator_document_slot": [
0
]
}
}
]
}
}

如果我改为过滤此文档会怎样:

{
"query": {
"percolate": {
"field": "query",
"document": {
"date":"2004-07-31T11:57:52.000Z","part":"month","code":109,"val":0,"email":"foo"
}
}
}
}

(注意电子邮件只是foo)这是结果:

{
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.26152915,
"hits": [
{
"_index": "percolator_index",
"_type": "_doc",
"_id": "1",
"_score": 0.26152915,
"_source": {
"query": {
"simple_query_string": {
"query": "month foo@bar.com",
"fields": [
"part",
"email"
]
}
}
},
"fields": {
"_percolator_document_slot": [
0
]
}
}
]
}
}

请注意,分数比第一个渗透文档要低一些。这可能是因为 foo(我的电子邮件)仅匹配我分析的 foo@bar.com 中的一个术语,而 foo@bar.com 会匹配所有这些(从而给出更好的分数)

虽然不确定您在说什么分析仪。我认为上面的例子涵盖了我认为可能有点困惑的唯一“分析器”问题/未知。

关于elasticsearch - 如何渗透 simple_query_string/query_string 查询,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58651658/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com