gpt4 book ai didi

elasticsearch - Elasticsearch 中的多字段文本和关键字字段

转载 作者:行者123 更新时间:2023-12-02 22:16:36 25 4
gpt4 key购买 nike

我正在考虑从 solr 切换到 elasticsearch 并在不提供模式/映射的情况下将一堆文档编入索引,并且我之前在 solr 中设置为索引字符串的许多字段已设置为textkeyword使用 multi-fields 的字段.

拥有 keyword 有什么好处吗?字段也作为 text字段使用 multi-fields ?在我的例子中,字段中的大多数值都是单个单词,所以我想如果将它们发送到分析器并不重要,但 es 文档似乎暗示 keyword搜索时不考虑字段或至少区别对待?

如果我搜索术语“ipad”,那么如果它在关键字字段以及其他一些文本字段中包含“ipad”与没有关键字字段的同一文档相比,文档得分会更高?如果说“ipad”仅在关键字字段中,文档是否仍然匹配?

最佳答案

为了回答我自己的问题,我创建了一个快速测试,搜索时几乎所有关键字和文本字段都是等效的,而且多字段似乎获得与其主要类型相同的分数,所以我猜第二个字段对搜索评分没有影响

奇怪的是,关键字和文本字段中的多词值得到了相同的分数,我希望关键字字段的分数更低或根本没有,但出于我的目的,这很好,所以我不打算进一步调查它.

索引创建

PUT test_index
{
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"test_type" : {
"properties" : {
"multifield": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},

"keywordfield": {
"type": "keyword"
},

"textfield": {
"type": "text"
}

}
}
}
}

数据插入

POST /_bulk
{ "update": { "_index": "test_index", "_type": "test_type", "_id": 1 }
{ "doc" : { "multifield" : "ipad" }, "doc_as_upsert" : true }
{ "update": { "_index": "test_index", "_type": "test_type", "_id": 2 }
{ "doc" : { "keywordfield" : "ipad" }, "doc_as_upsert" : true }
{ "update": { "_index": "test_index", "_type": "test_type", "_id": 3 }
{ "doc" : { "keywordfield" : "a green ipad" }, "doc_as_upsert" : true }
{ "update": { "_index": "test_index", "_type": "test_type", "_id": 4 }
{ "doc" : { "textfield" : "a yellow ipad" }, "doc_as_upsert" : true }
{ "update": { "_index": "test_index", "_type": "test_type", "_id": 5 }
{ "doc" : { "keywordfield" : "ipad", "textfield" : "ipad" }, "doc_as_upsert" : true }
{ "update": { "_index": "test_index", "_type": "test_type", "_id": 6 }
{ "doc" : { "keywordfield" : "unrelated", "textfield" : "hopefully this wont show up" }, "doc_as_upsert" : true }
{ "update": { "_index": "test_index", "_type": "test_type", "_id": 7 }
{ "doc" : { "textfield" : "ipad" }, "doc_as_upsert" : true }

结果

GET /test_index/_search?q=ipad
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 6,
"max_score": 0.28122374,
"hits": [
{
"_index": "test_index",
"_type": "test_type",
"_id": "5",
"_score": 0.28122374,
"_source": {
"keywordfield": "ipad",
"textfield": "ipad"
}
},
{
"_index": "test_index",
"_type": "test_type",
"_id": "1",
"_score": 0.2734406,
"_source": {
"multifield": "ipad"
}
},
{
"_index": "test_index",
"_type": "test_type",
"_id": "2",
"_score": 0.2734406,
"_source": {
"keywordfield": "ipad"
}
},
{
"_index": "test_index",
"_type": "test_type",
"_id": "7",
"_score": 0.2734406,
"_source": {
"textfield": "ipad"
}
},
{
"_index": "test_index",
"_type": "test_type",
"_id": "3",
"_score": 0.16417998,
"_source": {
"keywordfield": "a green ipad"
}
},
{
"_index": "test_index",
"_type": "test_type",
"_id": "4",
"_score": 0.16417998,
"_source": {
"textfield": "a yellow ipad"
}
}
]
}
}

关于elasticsearch - Elasticsearch 中的多字段文本和关键字字段,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43592695/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com