gpt4 book ai didi

python - 如何读取搜索分析器的配置?

转载 作者:行者123 更新时间:2023-12-01 02:41:56 25 4
gpt4 key购买 nike

在我看来,Elasticsearch 有两个分析器,一个用于索引,另一个用于查询

我成功地添加了索引分析器来标记文档,但不知道如何将分析器添加到搜索查询中。

这是我用 Python 编写的初步文件

from elasticsearch import Elasticsearch
es_conn = Elasticsearch(config.ES_HOSTS)


def analyze_query(text, es_conn, index_name):
'''
analyzes any text with my_analyzer defined in es_settings.json
input:
- text: a query text
- es_conn: elasticsearch connection
- index_name: name of index
output:
- a list of tokens
'''

tokens = es_conn.indices.analyze(
index = index_name,
body = {"text": text},
# how to point the analyzer to the json file???
analyzer = 'my_analyzer')["tokens"]
return [token_row["token"].encode('utf-8') for token_row in tokens]

出现错误

elasticsearch.exceptions.RequestError: TransportError(400, u'illegal_argument_exception', u'[1aVYakX][127.0.0.1:9300][indices:admin/analyze[s]]')

问题出现在 analyzer = 'my_analyzer' 部分,我不确定如何使其指向定义 Elasticsearch 设置的 json 文件。

添加

json 文件

{
"settings": {
"number_of_shards": 1,
"analysis": {
"analyzer": {
"my_analyzer": {
"filter": [
"lowercase",
"stop"
],
"char_filter": [
"html_strip"
],
"type": "custom",
"tokenizer": "uax_url_email"
}
}
}
},
"mappings": {
"doc": {
"properties": {
"filename": {
"type": "keyword",
"index": false,
"doc_values": false
},
"path": {
"type": "keyword",
"index": false,
"doc_values": false
},
"text": {
"type": "text",
"analyzer": "my_analyzer",
"search_analyzer": "my_analyzer"
}
}
}
}
}

最佳答案

analyze 函数不将分析器作为参数。更多详细信息请参见:http://elasticsearch-py.readthedocs.io/en/master/api.html#elasticsearch.client.IndicesClient.analyze .

相反,您应该传递 body 参数。将analyze_query 函数替换为下面的函数。

def analyze_query(text, es_conn, index_name, settings_path = config.SETTINGS_PATH):

with open(settings_path) as json_data:
settings = json.load(json_data)["settings"]["analysis"]

filter_settings = settings["filter"]
analyzer_settings = settings["analyzer"]["my_analyzer"]

body = {}
body["text"] = text
body["tokenizer"] = analyzer_settings["tokenizer"]
if "char_filter" in analyzer_settings:
body["char_filter"] = analyzer_settings["char_filter"]

body["filter"] = [ filter_settings[f] if f in filter_settings else f \
for f in analyzer_settings["filter"]]
tokens = es_conn.indices.analyze(
index = index_name,
body = body)["tokens"]

return [token_row["token"].encode('utf-8') for token_row in tokens]

祝你作业顺利。

关于python - 如何读取搜索分析器的配置?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45602350/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com