gpt4 book ai didi

elasticsearch - 突出显示 ElasticSearch 自动完成

转载 作者:行者123 更新时间:2023-11-29 02:50:12 31 4
gpt4 key购买 nike

我有以下数据要在 ElasticSearch 上编制索引。

enter image description here

我想实现自动完成功能,并突出显示特定文档与查询匹配的原因。

这是我的索引设置:

{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 15
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"autocomplete_filter"
]
}
}
}
}
}

指数分析

  • 在单词边界上拆分文本。
  • 删除标点符号。
  • 小写
  • Edge NGrams 每个标记

所以倒排索引看起来像:

enter image description here

这就是我为名称字段定义映射的方式:

{
"index_type": {
"properties": {
"name": {
"type": "string",
"index_analyzer": "autocomplete",
"search_analyzer": "standard"
}
}
}
}

当我查询时:

GET http://localhost:9200/index/type/_search

{
"query": {
"match": {
"name": "soft"
}
},
"highlight": {
"fields" : {
"name" : {}
}
}
}

搜索:

应用Standard Tokenizer,“软”这个词,在倒排索引上查找。此搜索匹配文档:1、3、4、5、6、7 这是正确的,但我希望突出显示的部分是“软”而不是整个词:

{
"hits": [
{
"_source": {
"name": "SoftwareRocks everytime"
},
"highlight": {
"name": [
"<em>SoftwareRocks</em> everytime"
]
}
},
{
"_source": {
"name": "Software AG"
},
"highlight": {
"name": [
"<em>Software</em> AG"
]
}
},
{
"_source": {
"name": "Software AG2"
},
"highlight": {
"name": [
"<em>Software</em> AG2"
]
}
},
{
"_source": {
"name": "Op Software AG good software better"
},
"highlight": {
"name": [
"Op <em>Software</em> AG good <em>software</em> better"
]
}
},
{
"_source": {
"name": "Op Software AG"
},
"highlight": {
"name": [
"Op <em>Software</em> AG"
]
}
},
{
"_source": {
"name": "is soft ware ok"
},
"highlight": {
"name": [
"is <em>soft</em> ware ok"
]
}
}
]
}

搜索:软件公司

应用Standard Tokenizer,将“software ag”转化为“software”和“ag”,在倒排索引上查找。此搜索匹配文档:1、3、4、5、6,这是正确的,但我希望突出显示的部分是“软件”和“ag”,而不是围绕“软件”和“ag”的整个词:

{
"hits": [
{
"_source": {
"name": "Software AG"
},
"highlight": {
"name": [
"<em>Software</em> <em>AG</em>"
]
}
},
{
"_source": {
"name": "Software AG2"
},
"highlight": {
"name": [
"<em>Software</em> <em>AG2</em>"
]
}
},
{
"_source": {
"name": "Op Software AG"
},
"highlight": {
"name": [
"Op <em>Software</em> <em>AG</em>"
]
}
},
{
"_source": {
"name": "Op Software AG good software better"
},
"highlight": {
"name": [
"Op <em>Software</em> <em>AG</em> good <em>software</em> better"
]
}
},
{
"_source": {
"name": "SoftwareRocks everytime"
},
"highlight": {
"name": [
"<em>SoftwareRocks</em> everytime"
]
}
}
]
}

我阅读了 elasticsearch 上的高亮文档,但我无法理解高亮是如何执行的。对于上面的两个示例,我希望仅突出显示倒排索引中匹配的标记,而不是整个单词。谁能帮助如何只突出显示传递的值?

更新

所以,似乎在 ElasticSearch 上网站,服务器端的自动完成与我的实现类似。然而,他们似乎在客户端突出显示了匹配的查询。如果他们这样做,我开始认为在 ElasticSearch 端没有合适的解决方案,所以我在服务器端而不是在客户端(他们似乎这样做)实现了突出显示功能。

我在服务器端的实现(使用 PHP)是:

public function search($term)
{
$params = [
'index' => $this->getIndexName(),
'type' => $this->getIndexType(),
'body' => [
'query' => [
'match' => [
'name' => $term
]
]
]
];

$results = $this->client->search($params);

$hits = $results['hits']['hits'];

$data = [];

$wrapBefore = '<strong>';
$wrapAfter = '</strong>';

foreach ($hits as $hit) {
$data[] = [
$hit['_source']['id'],
$hit['_source']['name'],
preg_replace("/($term)/i", "$wrapBefore$1$wrapAfter", strip_tags($hit['_source']['name']))
];
}

return $data;
}

输出我针对这个问题的目标:

enter image description here

我添加了赏金以查看是否有 ElasticSearch 级别的解决方案来实现我上面描述的内容。

最佳答案

截至目前,对于最新版本的 elastic,这是不可能的,因为高亮文档没有为此引用任何设置或查询。我在 xhr 请求选项卡下的浏览器控制台中检查了弹性自动完成示例,发现关键字“att”自动完成响应的响应如下。

url - https://search.elastic.co/suggest?q=att
{
"current_page": 1,
"last_page": 4,
"total_hits": 49,
"hits": [
{
"tags": [],
"url": "/elasticon/tour/2016/jp/not-attending",
"section": "Elasticon",
"title": "Not <em>Attending</em> - JP"
},
{
"section": "Elasticon",
"title": "<em>Attending</em> from Training - JP",
"tags": [],
"url": "/elasticon/tour/2016/jp/attending-training"
},
{
"tags": [],
"url": "/elasticon/tour/2016/jp/attending-keynote",
"title": "<em>Attending</em> from Keynote - JP",
"section": "Elasticon"
},
{
"tags": [],
"url": "/elasticon/tour/2016/not-attending",
"section": "Elasticon",
"title": "Thank You - Not <em>Attending</em>"
},
{
"tags": [],
"url": "/elasticon/tour/2016/attending",
"section": "Elasticon",
"title": "Thank You - <em>Attending</em>"
},
{
"section": "Blog",
"title": "What It's Like to <em>Attend</em> Elastic Training",
"tags": [],
"url": "/blog/what-its-like-to-attend-elastic-training"
},
{
"tags": "Elasticsearch",
"url": "/guide/en/elasticsearch/plugins/5.0/mapper-attachments-highlighting.html",
"section": "Docs/",
"title": "Highlighting <em>attachments</em>"
},
{
"title": "<em>attachments</em> » email",
"section": "Docs/",
"tags": "Logstash",
"url": "/guide/en/logstash/5.0/plugins-outputs-email.html#plugins-outputs-email-attachments"
},
{
"section": "Docs/",
"title": "Configuring Email <em>Attachments</em> » Actions",
"tags": "Watcher",
"url": "/guide/en/watcher/2.4/actions.html#configuring-email-attachments"
},
{
"url": "/guide/en/watcher/2.4/actions.html#hipchat-action-attributes",
"tags": "Watcher",
"title": "HipChat Action <em>Attributes</em> » Actions",
"section": "Docs/"
},
{
"title": "Slack Action <em>Attributes</em> » Actions",
"section": "Docs/",
"tags": "Watcher",
"url": "/guide/en/watcher/2.4/actions.html#slack-action-attributes"
}
],
"aggs": {
"sections": [
{
"Elasticon": 5
},
{
"Blog": 1
},
{
"Docs/": 43
}
],
"top_tags": [
{
"XPack": 14
},
{
"Elasticsearch": 12
},
{
"Watcher": 9
},
{
"Logstash": 4
},
{
"Clients": 3
},
{
"Shield": 1
}
]
}
}

但在前端,他们只在自动建议结果中突出显示“att”。因此,他们在浏览器层处理高亮内容。

关于elasticsearch - 突出显示 ElasticSearch 自动完成,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40551024/

31 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com