gpt4 book ai didi

elasticsearch - Elasticsearch:带有返回意外结果的术语的多面查询

转载 作者:行者123 更新时间:2023-12-02 23:05:37 25 4
gpt4 key购买 nike

我试图对我存储在ES中的某些日志运行多面查询。日志看起来像

{"severity": "informational","message_hash_value": "00016B15", "user-agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:15.0) Gecko/20100101 Firefox/15.0.1", "host": "192.168.8.225", "version": "1.0", "user": "User_1@test.co", "created_timestamp": "2013-03-01T15:34:00", "message": "User viewed contents", "inserted_timestamp": "2013-03-01T15:34:00"}

我正在尝试运行的查询是
curl -XGET 'http://127.0.0.1:9200/logs-*/logs/_search' 
-d {"from":0, "size":0,
"facets" : {
"user" : {
"terms" : {"field" : "user", "size" : 999999 } } } }

请注意,日志中的 "user"字段是一个电子邮件地址。现在的问题是,我使用的 terms-facet 搜索查询从用户字段返回了术语列表,如下所示。
u'facets': {u'user': {u'_type': u'terms', u'total': 2004, u'terms': [{u'count': 1002,u'term': u'test.co'}, {u'count': 320, u'term': u'user_1'}, {u'count': 295,u'term': u'user_2'}

请注意,该列表包含 term
{u'count': 1002,u'term': u'test.co'}

这是用户电子邮件地址的域名。为什么Elasticsearch将域视为一个单独的术语?

运行查询以检查映射
curl -XGET 'http://127.0.0.1:9200/logs-*/_mapping?pretty=true'

"user"字段产生以下内容
"user" : {
"type" : "string"
},

最佳答案

发生这种情况是因为Elasticsearch的默认全局分析器在索引时标记了“@”(除了空格和标点符号外)。您可以通过告诉elasticsearch不要在此字段上运行分析器来解决此问题,但是您必须重新索引所有数据。

创建新索引

curl -XPUT 'http://localhost:9200/logs-new'

在此新索引的映射中指定您不想分析“用户”字段
curl -XPUT 'http://localhost:9200/logs-new/logs/_mapping' -d '{
"logs" : {
"properties" : {
"user" : {
"type" : "string",
"index" : "not_analyzed"
}
}
}
}'

索引文件
curl -XPOST 'http://localhost:9200/logs-new/logs' -d '{
"created_timestamp": "2013-03-01T15:34:00",
"host": "192.168.8.225",
"inserted_timestamp": "2013-03-01T15:34:00",
"message": "User viewed contents",
"message_hash_value": "00016B15",
"severity": "informational",
"user": "User_1@test.co",
"user-agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:15.0) Gecko/20100101 Firefox/15.0.1",
"version": "1.0"
}'

Elasticsearch构面现在将显示整个电子邮件地址
curl -XGET 'http://localhost:9200/logs-new/logs/_search?pretty' -d '{
"from":0,
"size":0,
"facets" : {
"user" : {
"terms" : {
"field" : "user",
"size" : 999999
}
}
}
}'

结果:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ ]
},
"facets" : {
"user" : {
"_type" : "terms",
"missing" : 0,
"total" : 1,
"other" : 0,
"terms" : [ {
"term" : "User_1@test.co",
"count" : 1
} ]
}
}
}

引用文献:
核心类型: http://www.elasticsearch.org/guide/reference/mapping/core-types/
使用新映射重新索引: https://groups.google.com/forum/?fromgroups#!topic/elasticsearch/tCaXgjfUFVU

关于elasticsearch - Elasticsearch:带有返回意外结果的术语的多面查询,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/15812625/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com