gpt4 book ai didi

Django Haystack - 如何在没有词干的情况下强制进行精确的属性匹配?

转载 作者:行者123 更新时间:2023-11-29 02:55:19 24 4
gpt4 key购买 nike

我将 Django 1.5 与 django-haystack 2.0 和一个 elasticsearch 后端结合使用。我正在尝试通过精确的属性匹配进行搜索。但是,即使我同时使用 __exact 运算符和 Exact() 类,我也会得到“相似”的结果。我怎样才能防止这种行为?

例如:

# models.py
class Person(models.Model):
name = models.TextField()


# search_indexes.py
class PersonIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
name = indexes.CharField(model_attr="name")

def get_model(self):
return Person

def index_queryset(self, using=None):
return self.get_model().objects.all()


# templates/search/indexes/people/person_text.txt
{{ object.name }}


>>> p1 = Person(name="Simon")
>>> p1.save()
>>> p2 = Person(name="Simons")
>>> p2.save()

$ ./manage.py rebuild_index

>>> person_sqs = SearchQuerySet().models(Person)
>>> person_sqs.filter(name__exact="Simons")
[<SearchResult: people.person (name=u'Simon')>
<SearchResult: people.person (name=u'Simons')>]
>>> person_sqs.filter(name=Exact("Simons", clean=True))
[<SearchResult: people.person (name=u'Simon')>
<SearchResult: people.person (name=u'Simons')>]

我只想要“Simons”的搜索结果 - “Simon”结果不应显示。

最佳答案

Python3、Django 1.10、Elasticsearch 2.4.4。

TL;DR:定义自定义分词器(不是过滤器)


详细解释

a) 使用 EdgeNgramField:

# search_indexes.py
class PersonIndex(indexes.SearchIndex, indexes.Indexable):

text = indexes.EdgeNgramField(document=True, use_template=True)
...

b) 模板:

# templates/search/indexes/people/person_text.txt
{{ object.name }}

c) 创建自定义搜索后端:

# backends.py
from django.conf import settings

from haystack.backends.elasticsearch_backend import (
ElasticsearchSearchBackend,
ElasticsearchSearchEngine,
)


class CustomElasticsearchSearchBackend(ElasticsearchSearchBackend):

def __init__(self, connection_alias, **connection_options):
super(CustomElasticsearchSearchBackend, self).__init__(
connection_alias, **connection_options)

setattr(self, 'DEFAULT_SETTINGS', settings.ELASTICSEARCH_INDEX_SETTINGS)


class CustomElasticsearchSearchEngine(ElasticsearchSearchEngine):

backend = CustomElasticsearchSearchBackend

d) 定义自定义tokenizer(不是过滤器!):

# settings.py
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'apps.persons.backends.CustomElasticsearchSearchEngine',
'URL': 'http://127.0.0.1:9200/',
'INDEX_NAME': 'haystack',
},
}

ELASTICSEARCH_INDEX_SETTINGS = {
"settings": {
"analysis": {
"analyzer": {
"ngram_analyzer": {
"type": "custom",
"tokenizer": "custom_ngram_tokenizer",
"filter": ["asciifolding", "lowercase"]
},
"edgengram_analyzer": {
"type": "custom",
"tokenizer": "custom_edgengram_tokenizer",
"filter": ["asciifolding", "lowercase"]
}
},
"tokenizer": {
"custom_ngram_tokenizer": {
"type": "nGram",
"min_gram": 3,
"max_gram": 12,
"token_chars": ["letter", "digit"]
},
"custom_edgengram_tokenizer": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 12,
"token_chars": ["letter", "digit"]
}
}
}
}
}

HAYSTACK_DEFAULT_OPERATOR = 'AND'

e) 使用 AutoQuery(更通用):

# views.py
search_value = 'Simons'
...
person_sqs = \
SearchQuerySet().models(Person).filter(
content=AutoQuery(search_value)
)

f) 更改后重新索引:

$ ./manage.py rebuild_index

关于Django Haystack - 如何在没有词干的情况下强制进行精确的属性匹配?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18201147/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com