gpt4 book ai didi

tsql - ElasticSearch中的多字段通配符搜索

转载 作者:行者123 更新时间:2023-12-03 01:14:51 25 4
gpt4 key购买 nike

考虑以下非常基本的T-SQL查询:

select * from Users
where FirstName like '%dm0e776467@mail.com%'
or LastName like '%dm0e776467@mail.com%'
or Email like '%dm0e776467@mail.com%'
我该如何在Lucene中写这个?
我尝试了以下方法:
  • 查询方式(根本不起作用,没有结果):
    {
    “查询”:{
    “ bool(boolean) ”:{
    “应该”: [
    {
    “通配符”:{
    “firstName”:“dm0e776467@mail.com”
    }
    },
    {
    “通配符”:{
    “lastName”:“dm0e776467@mail.com”
    }
    },
    {
    “通配符”:{
    “电子邮件”:“dm0e776467@mail.com”
    }
    }
    ]
    }
    }
    }
  • 多匹配方式(返回存在mail.com的任何内容)
    {
    “查询”:{
    “multi_match”:{
    “query”:“dm0e776467@mail.com”,
    “字段”:[
    “名字”,
    “姓”,
    “电子邮件”
    ]
    }
    }
    }
  • 第三次尝试(返回预期结果,但是如果我仅插入“mail”,则不会返回任何结果)
    {
    “查询”:{
    “请求参数”: {
    “query”:“” dm0e776467@mail.com“”,
    “字段”:[
    “名字”,
    “姓”,
    “电子邮件”
    ],
    “default_operator”:“或”,
    “allow_leading_wildcard”:是
    }
    }
    }

  • 在我看来,没有办法强制Elasticsearch强制查询将输入字符串用作 一个子字符串?

    最佳答案

    standard (默认)分析器将标记此电子邮件,如下所示:

    GET _analyze
    {
    "text": "dm0e776467@mail.com",
    "analyzer": "standard"
    }
    屈服
    {
    "tokens" : [
    {
    "token" : "dm0e776467",
    ...
    },
    {
    "token" : "mail.com",
    ...
    }
    ]
    }
    这解释了为什么多重匹配可以与任何 *mail.com后缀一起使用,以及通配符失败的原因。

    我建议根据 this answer对映射进行以下修改:
    PUT users
    {
    "settings": {
    "analysis": {
    "filter": {
    "email": {
    "type": "pattern_capture",
    "preserve_original": true,
    "patterns": [
    "([^@]+)",
    "(\\p{L}+)",
    "(\\d+)",
    "@(.+)",
    "([^-@]+)"
    ]
    }
    },
    "analyzer": {
    "email": {
    "tokenizer": "uax_url_email",
    "filter": [
    "email",
    "lowercase",
    "unique"
    ]
    }
    }
    }
    },
    "mappings": {
    "properties": {
    "email": {
    "type": "text",
    "analyzer": "email"
    },
    "firstName": {
    "type": "text",
    "fields": {
    "as_email": {
    "type": "text",
    "analyzer": "email"
    }
    }
    },
    "lastName": {
    "type": "text",
    "fields": {
    "as_email": {
    "type": "text",
    "analyzer": "email"
    }
    }
    }
    }
    }
    }
    请注意,我已经在 .as_emailfirst-字段上使用了 lastName字段-默认情况下,您可能不想强制将它们映射为电子邮件。
    然后在索引一些样本后:
    POST _bulk
    {"index":{"_index":"users","_type":"_doc"}}
    {"firstName":"abc","lastName":"adm0e776467@mail.coms","email":"dm0e776467@mail.com"}
    {"index":{"_index":"users","_type":"_doc"}}
    {"firstName":"xyz","lastName":"opr","email":"dm0e776467@mail.com"}
    {"index":{"_index":"users","_type":"_doc"}}
    {"firstName":"zyx","lastName":"dm0e776467@mail.com","email":"qwe"}
    {"index":{"_index":"users","_type":"_doc"}}
    {"firstName":"abc","lastName":"efg","email":"ijk"}
    通配符工作得很好:
    GET users/_search
    {
    "query": {
    "bool": {
    "should": [
    {
    "wildcard": {
    "email": "dm0e776467@mail.com"
    }
    },
    {
    "wildcard": {
    "lastName.as_email": "dm0e776467@mail.com"
    }
    },
    {
    "wildcard": {
    "firstName.as_email": "dm0e776467@mail.com"
    }
    }
    ]
    }
    }
    }
    请检查此 token 生成器的工作原理,以防止“令人惊讶”的查询结果:
    GET users/_analyze
    {
    "text": "dm0e776467@mail.com",
    "field": "email"
    }

    关于tsql - ElasticSearch中的多字段通配符搜索,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63020741/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com