- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我正在尝试在Elasticsearch中名为email_address
的字段中添加“键入时搜索”功能。我对from the docs的理解是,如果我创建search_as_you_type
字段,它应该自动创建为查找部分匹配而优化的ngram子字段。
但是,它似乎没有按照我期望的方式工作,并且我似乎也没有从这种特殊字段类型中获得期望的 yield 。
首先,我创建了一个带有以下内容的索引:
$ curl -s -H 'Content-Type: application/json' -XPUT http://localhost:9200/mytestindex -d '
{
"mappings": {
"properties": {
"email_address": {"type": "search_as_you_type"}
}
}
}
'
当我请求新创建的电子邮件字段时,将看到以下内容:
$ curl -s -H 'Content-Type: application/json' http://localhost:9200/mytestindex/_mapping/field/email_address | json_pp
{
"mytestindex" : {
"mappings" : {
"email_address" : {
"full_name" : "email_address",
"mapping" : {
"email_address" : {
"max_shingle_size" : 3,
"type" : "search_as_you_type"
}
}
}
}
}
}
最后,我填充了一些示例数据:
$ curl -s -H 'Content-Type: application/json' http://localhost:9200/mytestindex/_doc -d '
{"email_address": "sam@example.com"}'
$ curl -s -H 'Content-Type: application/json' http://localhost:9200/mytestindex/_doc -d '
{"email_address": "sally@example.com"}'
$ curl -s -H 'Content-Type: application/json' http://localhost:9200/mytestindex/_doc -d '
{"email_address": "jane@example.com"}'
$ curl -s -H 'Content-Type: application/json' http://localhost:9200/mytestindex/_doc -d '
{"email_address": "samantha@example.com"}'
官方文档建议使用带有以下字段的
bool_prefix
multi_match
搜索:
email_address
,
email_address._2gram
和
email_address._3gram
。好奇地尝试子字段,我测试了仅包含子字段的搜索,但无法获得任何结果:
$ curl -s -H 'Content-Type: application/json' http://localhost:9200/mytestindex/_search -d '
{
"query": {
"multi_match": {
"query": "sa",
"type": "bool_prefix",
"fields": [
"email_address._2gram",
"email_address._3gram"
]
}
}
}
' | json_pp
{
"hits" : {
"hits" : [],
"max_score" : null,
"total" : {
"value" : 0,
"relation" : "eq"
}
},
"took" : 4,
"_shards" : {
"skipped" : 0,
"successful" : 1,
"total" : 1,
"failed" : 0
},
"timed_out" : false
}
我尝试了各种长度的部分查询(
s
,
sa
,
sam
等),但我从未得到任何结果。
email_address
字段本身时,我得到了所有期望的结果:
curl -s -H 'Content-Type: application/json' http://localhost:9200/mytestindex/_search -d '
{
"query": {
"multi_match": {
"query": "sa",
"type": "bool_prefix",
"fields": [
"email_address"
]
}
}
}
' | json_pp
{
"timed_out" : false,
"hits" : {
"max_score" : 1,
"total" : {
"relation" : "eq",
"value" : 3
},
"hits" : [
{
"_index" : "mytestindex",
"_id" : "gEbkCXUBC6_J-EeLAygM",
"_score" : 1,
"_type" : "_doc",
"_source" : {
"email_address" : "sam@example.com"
}
},
{
"_index" : "mytestindex",
"_source" : {
"email_address" : "sally@example.com"
},
"_score" : 1,
"_type" : "_doc",
"_id" : "gUbkCXUBC6_J-EeLWigu"
},
{
"_index" : "mytestindex",
"_id" : "jUb5CXUBC6_J-EeL1ij1",
"_type" : "_doc",
"_score" : 1,
"_source" : {
"email_address" : "samantha@example.com"
}
}
]
},
"took" : 2,
"_shards" : {
"failed" : 0,
"skipped" : 0,
"successful" : 1,
"total" : 1
}
}
结果,我不明白
_2gram
和
_3gram
子字段提供了什么好处。我设置不正确吗?还是我对这些 Realm 的实际目的感到困惑?
最佳答案
The search_as_you_type field type is a text-like field that isoptimized to provide support for queries that serve an as-you-typecompletion use case
{
"mappings": {
"properties": {
"title": {
"type": "search_as_you_type"
}
}
}
}
索引数据:
{"title": "how shingles are actually used"}
分析API
{
"tokens": [
{
"token": "how",
"start_offset": 0,
"end_offset": 3,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "shingles",
"start_offset": 4,
"end_offset": 12,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "are",
"start_offset": 13,
"end_offset": 16,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "actually",
"start_offset": 17,
"end_offset": 25,
"type": "<ALPHANUM>",
"position": 3
},
{
"token": "used",
"start_offset": 26,
"end_offset": 30,
"type": "<ALPHANUM>",
"position": 4
}
]
}
产生3个单词的带状疱疹
POST/_analyze
{
"tokenizer": "standard",
"filter": [
{
"type": "shingle",
"min_shingle_size": 3,
"max_shingle_size": 3,
"output_unigrams":false
}
],
"text": "how shingles are actually used"
}
生成的 token 为:
{
"tokens": [
{
"token": "how shingles are",
"start_offset": 0,
"end_offset": 16,
"type": "shingle",
"position": 0
},
{
"token": "shingles are actually",
"start_offset": 4,
"end_offset": 25,
"type": "shingle",
"position": 1
},
{
"token": "are actually used",
"start_offset": 13,
"end_offset": 30,
"type": "shingle",
"position": 2
}
]
}
搜索查询:
title._3gram - Wraps the analyzer of my_field with a shingle tokenfilter of shingle size 3
{
"query": {
"multi_match": {
"query": "shingles are actually",
"type": "bool_prefix",
"fields": [
"title._3gram"
]
}
}
}
搜索结果:
"hits": [
{
"_index": "test",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"title": "how shingles are actually used"
}
}
]
在您的情况下,考虑到
"text": "samantha@example.com"
,生成的各个 token 是:
samantha
和
example.com
当创建2个单词的带状疱疹时,生成的标记为:
{
"tokens": [
{
"token": "samantha example.com",
"start_offset": 0,
"end_offset": 20,
"type": "shingle",
"position": 0
}
]
}
因此,当您使用sa
搜索时,它将不匹配,因为不会生成与之相对应的 token 。email_address
字段上,由于" type": "bool prefix"
而匹配。阅读此内容以了解有关Match bool prefix query的更多信息。
sa
查询并获得所有结果,则可以使用
Completion suggestor,甚至可以遍历
UAX URL Email Tokenizer
关于elasticsearch - 对 `search_as_you_type` ngram子字段感到困惑,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64270635/
我正在尝试在Elasticsearch中名为email_address的字段中添加“键入时搜索”功能。我对from the docs的理解是,如果我创建search_as_you_type字段,它应该
我在按照此处的指南设置 search_as_you_type 字段并突出显示时遇到问题 https://www.elastic.co/guide/en/elasticsearch/reference/
我是一名优秀的程序员,十分优秀!