gpt4 book ai didi

elasticsearch - 在关键字分析字段上应用html_strip和小写字母过滤器

转载 作者:行者123 更新时间:2023-12-02 22:31:05 25 4
gpt4 key购买 nike

我尝试将html_strip和小写过滤器应用于关键字分析字段。在搜索时,我注意到搜索结果与预期不符。

这是我们尝试创建的索引

PUT /test_index
{
"settings": {
"number_of_shards": 5,
"number_of_replicas": 0,
"analysis": {
"analyzer": {
"ExportPrimaryAnalyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": "lowercase",
"char_filter": "html_strip"
},
"ExportRawAnalyzer": {
"type": "custom",
"buffer_size": "1000",
"tokenizer": "keyword",
"filter": "lowercase",
"char_filter": "html_strip"
}
}
}
},
"mappings": {
"test_type": {
"properties": {
"city": {
"type": "string",
"analyzer" : "ExportPrimaryAnalyzer"
},
"city_raw":{
"type": "string",
"analyzer" : "ExportRawAnalyzer"
}
}
}
}
}

以下是数据示例:
PUT test_index/test_type/4
{
"city": "<p>I am from Pune</p>",
"city_raw": "<p>I am from Pune</p>"
}

当我们尝试对其进行通配符时,我们没有得到结果。以下是我们尝试触发的查询。
{
"query": {
"wildcard": {
"city_raw": "i am*"
}
}
}

任何帮助表示赞赏

最佳答案

html_strip_filter将用new-lines替换html块元素。
因此,如果您使用keyword-tokenizer,则需要一个附加的过滤器以将new-lines替换为空字符串。

范例:

PUT test
{
"settings": {
"number_of_shards": 5,
"number_of_replicas": 0,
"analysis": {
"char_filter": {
"remove_new_line": {
"type": "mapping",
"mappings": [
"\\n =>"
]
}
},
"analyzer": {
"ExportPrimaryAnalyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase"
],
"char_filter": [
"html_strip"
]
},
"ExportRawAnalyzer": {
"type": "custom",
"buffer_size": "1000",
"tokenizer": "keyword",
"filter": [
"lowercase"
],
"char_filter": [
"html_strip",
"remove_new_line"
]
}
}
}
},
"mappings": {
"test_type": {
"properties": {
"city": {
"type": "string",
"analyzer": "ExportPrimaryAnalyzer"
},
"city_raw": {
"type": "string",
"analyzer": "ExportRawAnalyzer"
}
}
}
}
}

PUT test/test_type/4
{
"city": "<p>I am from Bangalore I like Pune too</p>",
"city_raw": "<p>I am from Bangalore I like Pune too</p>"
}

post test/_search
{
"query": {
"wildcard": {
"city_raw": "i am *"
}
}
}

结果:
"hits": [
{
"_index": "test",
"_type": "test_type",
"_id": "4",
"_score": 1,
"_source": {
"city": "<p>I am from Bangalore I like Pune too</p>",
"city_raw": "<p>I am from Bangalore I like Pune too</p>"
}
}
]

关于elasticsearch - 在关键字分析字段上应用html_strip和小写字母过滤器,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39054721/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com