gpt4 book ai didi

ruby-on-rails - Elasticsearch Ngram Analyzer搜索零件Mac地址

转载 作者:行者123 更新时间:2023-12-02 23:37:08 24 4
gpt4 key购买 nike

使用ElasticSearch(和Rails),我尝试使用连字符作为分隔符,对包含mac地址的字段建立索引并执行搜索查询,但未成功:

24-A4-3C-02-37-26



搜索整个mac地址(未索引)时一切都很好,但使用自定义分析器无法正常工作。

我测试了许多选项,包括调整最小/最大val均未成功。

通过下面的映射,设置和查询,我得到以下结果:
Box.search(q: "24-A4-3C-02-37-26").results.map(&:macaddress)

产生一个奇怪的结果:
["24-A4-3C-02-37-xx", "DC-9F-DB-F6-B2-xx", "C4-10-8A-13-53-xx", "C4-10-8A-13-54-xx", "C4-10-8A-13-52-xx"]

如果我运行时删除了最后一个八位位组(“24-A4-3C-02-37”),则会得到以下信息:
["DC-9F-DB-F6-B2-xx", "C4-10-8A-13-53-xx", "C4-10-8A-13-52-xx"]

错了

我已经使用API​​检查了分析器,它看起来只是膨胀了:
curl "localhost:9205/boxes/_analyze?analyzer=ngram_analyzer&pretty=true" -d "24-A4-3C-02-37-26"

产生:
{
"tokens" : [ {
"token" : "24",
"start_offset" : 0,
"end_offset" : 2,
"type" : "word",
"position" : 1
}, {
"token" : "24-",
"start_offset" : 0,
"end_offset" : 3,
"type" : "word",
"position" : 2
}, {
"token" : "24-A",
"start_offset" : 0,
"end_offset" : 4,
"type" : "word",
"position" : 3
}, {
.........

因此,我只能猜测实际查询中有一些问题。我什至尝试用ascii或转义符代替连字符。
@search_definition[:query] = {
multi_match: {
query: options[:q],
fields: [
"macaddress.ngram",
"macaddress.sortable^5",
...

我的设置如下所示:
settings analysis: {
analyzer: {
ngram_analyzer: {
type: 'custom',
tokenizer: 'my_tokenizer',
}
},
tokenizer: {
my_tokenizer: {
type: "edgeNGram",
min_gram: 2,
max_gram: 17,
# token_chars: [ "letter", "digit" ]
}
}
} do

mapping do
indexes :macaddress, type: 'multi_field', fields: {
raw: { type: "string" },
sortable: { type: "string", index: "not_analyzed" },
ngram: { type: "string", index_analyzer: :ngram_analyzer } #, search_analyzer: 'keyword' }
}
end
end

有人可以建议我如何使它工作吗?

最佳答案

我已验证以下设置:

PUT test
{
"settings" : {
"analysis" : {
"analyzer" : {
"ngram_analyzer" : {
"type": "custom",
"tokenizer" : "my_tokenizer"
}
},
"tokenizer" : {
"my_tokenizer" : {
"type" : "edgeNGram",
"min_gram" : "2",
"max_gram" : "17"
}
}
}
},
"mappings": {
"boxes":{
"properties": {
"macaddress":{
"type": "multi_field",
"fields": {
"raw":{
"type": "string"
},
"sortable":{
"type": "string",
"index": "not_analyzed"
},
"ngram":{
"type": "string",
"index_analyzer": "ngram_analyzer"
}
}
}
}
}
}
}

以及一些示例数据:
PUT test/boxes/1
{
"macaddress":"24-A4-3C-02-37-26"
}
PUT test/boxes/2
{
"macaddress":"24-A4-3C-02-37-54"
}
PUT test/boxes/3
{
"macaddress":"24-A4-3C-02-38-23"
}
PUT test/boxes/4
{
"macaddress":"34-A4-3C-02-38-23"
}

和搜索查询:
GET test/boxes/_search
{
"query": {
"multi_match": {
"query": "24-A4-3C-02",
"fields": ["macaddress.ngram",
"macaddress.sortable^5"]
}
}
}

结果是:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0.047079325,
"hits": [
{
"_index": "test",
"_type": "boxes",
"_id": "1",
"_score": 0.047079325,
"_source": {
"macaddress": "24-A4-3C-02-37-26"
}
},
{
"_index": "test",
"_type": "boxes",
"_id": "2",
"_score": 0.047079325,
"_source": {
"macaddress": "24-A4-3C-02-37-54"
}
},
{
"_index": "test",
"_type": "boxes",
"_id": "3",
"_score": 0.047079325,
"_source": {
"macaddress": "24-A4-3C-02-38-23"
}
}
]
}
}

关于ruby-on-rails - Elasticsearch Ngram Analyzer搜索零件Mac地址,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29521371/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com