gpt4 book ai didi

django - 如何配置 Haystack/Elasticsearch 以处理单词开头附近的缩写和撇号

转载 作者:行者123 更新时间:2023-11-29 02:49:38 25 4
gpt4 key购买 nike

我很难处理单词开头或中间的撇号字符。我能够处理所有格英语,但我也在努力迎合法语并处理像“d'action”这样的词,其中撇号出现在单词的开头而不是像“her's”那样出现在结尾。

通过 haystack auto_query 搜索“d action”将返回结果,但“d'action”不返回任何结果。如果我直接查询 elasticsearch _search API (_search?q=D%27ACTION),我会得到“d'action”的结果。因此,我想知道这是否是 haystack 引擎问题。

我的配置:

'settings': {
"analysis": {
"char_filter": {
"quotes": {
"type": "mapping",
"mappings": [
"\\u0091=>\\u0027",
"\\u0092=>\\u0027",
"\\u2018=>\\u0027",
"\\u2019=>\\u0027",
"\\u201B=>\\u0027"
]
}
},
"analyzer": {
"ch_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ['ch_en_possessive_word_delimiter', 'ch_fr_stemmer'],
"char_filter": ['html_strip', 'quotes'],
},
},

"filter": {
"ch_fr_stemmer" : {
"type": "snowball",
"language": "French"
},
"ch_en_possessive_word_delimiter": {
"type": "word_delimiter",
"stem_english_possessive": True
}
}
}
}

我还对 ElasticsearchSearchBackend 和 BaseEngine 进行了子类化,因此我可以添加上述配置:

class ConfigurableESBackend(ElasticsearchSearchBackend):
# Word reserved by Elasticsearch for special use.
RESERVED_WORDS = (
'AND',
'NOT',
'OR',
'TO',
)

# Characters reserved by Elasticsearch for special use.
# The '\\' must come first, so as not to overwrite the other slash replacements.
RESERVED_CHARACTERS = (
'\\', '+', '-', '&&', '||', '!', '(', ')', '{', '}',
'[', ']', '^', '"', '~', '*', '?', ':',
)

def setup(self):
"""
Defers loading until needed.
"""
# Get the existing mapping & cache it. We'll compare it
# during the ``update`` & if it doesn't match, we'll put the new
# mapping.
try:
self.existing_mapping = self.conn.get_mapping(index=self.index_name)
except Exception:
if not self.silently_fail:
raise

unified_index = haystack.connections[self.connection_alias].get_unified_index()
self.content_field_name, field_mapping = self.build_schema(unified_index.all_searchfields())
current_mapping = {
'modelresult': {
'properties': field_mapping,
'_boost': {
'name': 'boost',
'null_value': 1.0
}
}
}

if current_mapping != self.existing_mapping:
try:
# Make sure the index is there first.
self.conn.create_index(self.index_name, settings.ELASTICSEARCH_INDEX_SETTINGS)
self.conn.put_mapping(self.index_name, 'modelresult', mapping=current_mapping)
self.existing_mapping = current_mapping
except Exception:
if not self.silently_fail:
raise

self.setup_complete = True

class CHElasticsearchSearchEngine(BaseEngine):
backend = ConfigurableESBackend
query = ElasticsearchSearchQuery

最佳答案

好的,所以这与配置无关,而是用于 haystack 索引的 .txt 模板的问题。

我有:

{{ object.some_model.name_en }}
{{ object.some_model.name_fr }}

导致 ' 等字符被转换为 html 标题 ('),导致搜索永远找不到结果。使用“安全”解决了这个问题:

{{ object.some_model.name_en|safe }}
{{ object.some_model.name_fr|safe }}

关于django - 如何配置 Haystack/Elasticsearch 以处理单词开头附近的缩写和撇号,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25667893/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com