gpt4 book ai didi

elasticsearch - 2词的模糊匹配

转载 作者:行者123 更新时间:2023-12-03 02:25:41 25 4
gpt4 key购买 nike

这个:

 {
""query"": {
""match"": {
""attachment.content"": {
""query"": ""hello world"",
""minimum_should_match"": 2,
""fuzziness"": 1
}
}
}
}

旨在返回包含以下内容的项目:
hello world
hello Vorld
pello world

换句话说,最大一个字符是不同的。可能还会返回仅包含以下内容的项目:
hello

为什么根据指定minimum_should_match = 2-即强加AND?

PS:

相关映射的一部分:
{
"my_great_index" : {
"mappings" : {
"properties" : {
"attachment" : {
"properties" : {
"author" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"containsMetadata" : {
"type" : "boolean"
},
"content" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"content_length" : {
"type" : "long"
},
"content_type" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"date" : {
"type" : "date"
},
"detect_language" : {
"type" : "boolean"
},
"indexed_chars" : {
"type" : "long"
},
"keywords" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"language" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"title" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"something_else" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
....

PPS:

这就是我在C#中创建索引的方式:

https://www.elastic.co/blog/the-future-of-attachments-for-elasticsearch-and-dotnet
public static void CreateIndex(ElasticClient client, string indexName)
{
var createIndexResponse = client.Indices.Create(indexName, c => c
.Settings(s => s
.Analysis(a => a
.Analyzers(ad => ad
.Custom("windows_path_hierarchy_analyzer", ca => ca
.Tokenizer("windows_path_hierarchy_tokenizer")
)
)
.Tokenizers(t => t
.PathHierarchy("windows_path_hierarchy_tokenizer", ph => ph
.Delimiter('\\')
)
)
)
)
.Map<MyItem>(mp => mp
.AutoMap()
.Properties(ps => ps
.Text(s => s
.Name(n => n.Id)
.Analyzer("windows_path_hierarchy_analyzer")
)
.Object<Attachment>(a => a
.Name(n => n.Attachment)
.AutoMap()
)
)
)
);

var putPipelineResponse = client.Ingest.PutPipeline("attachments", p => p
.Description("Document attachment pipeline")
.Processors(pr => pr
.Attachment<MyItem>(a => a
.Field(f => f.Content)
.TargetField(f => f.Attachment)
)
.Remove<MyItem>(r => r
.Field(ff => ff
.Field(f => f.Content)
)
)
)
);
}

最佳答案

我只是在 Elasticsearch 7.6版上尝试了您的示例,它对我有用。您能否提供索引数据的方式,例如样本文档和Elasticsearch版本?

您提供的查询在语法上也不正确。

具有较少字段的索引def

{
"mappings": {
"properties": {
"attachment": {
"properties": {
"author": {
"type": "text"
},
"content": {
"type": "text"
}
}
}
}
}
}

索引了您期望的3个文档
{
"attachment.author": "bar",
"attachment.content": "pello world"
}

{
"attachment.author": "bar",
"attachment.content": "hello world"
}

{
"attachment.author": "bar",
"attachment.content": "hello vorld"
}

您使用正确的语法提供的相同搜索查询
{
"query": {
"match" : {
"attachment.content" : {
"query" : "hello world", --> properly closed quotes
"minimum_should_match": 2,
"fuzziness": 1
}
}
}
}

搜索结果
 "hits": [
{
"_index": "fuzzy",
"_type": "_doc",
"_id": "1",
"_score": 0.9400072,
"_source": {
"attachment.author": "foo",
"attachment.content": "hello world"
}
},
{
"_index": "fuzzy",
"_type": "_doc",
"_id": "2",
"_score": 0.8460065,
"_source": {
"attachment.author": "bar",
"attachment.content": "hello vorld"
}
},
{
"_index": "fuzzy",
"_type": "_doc",
"_id": "3",
"_score": 0.8460065,
"_source": {
"attachment.author": "bar",
"attachment.content": "pello world"
}
}
]

您的问题的另一部分是,文档仅包含 hello出现在搜索结果中,即使 minimum_should_match=2也可以正常工作,我将另一个文档索引为
{
"attachment.author": "bar",
"attachment.content": "my world" --> only world
}

同样,相同的搜索查询仅返回更早的3个文档,但是如果我们将 minimum_should_match更改为 1,它将返回所有4个文档。
{
"_index": "fuzzy",
"_type": "_doc",
"_id": "1",
"_score": 1.0498221,
"_source": {
"attachment.author": "foo",
"attachment.content": "hello world"
}
},
{
"_index": "fuzzy",
"_type": "_doc",
"_id": "2",
"_score": 0.9784871,
"_source": {
"attachment.author": "bar",
"attachment.content": "hello vorld"
}
},
{
"_index": "fuzzy",
"_type": "_doc",
"_id": "3",
"_score": 0.91119266,
"_source": {
"attachment.author": "bar",
"attachment.content": "pello world"
}
},
{
"_index": "fuzzy",
"_type": "_doc",
"_id": "4",
"_score": 0.35667494,
"_source": {
"attachment.author": "bar",
"attachment.content": "my world" --> note last 4 doc
}
}
]

关于elasticsearch - 2词的模糊匹配,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61122471/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com