gpt4 book ai didi

elasticsearch - ElasticSearch不返回带前缀或后缀带点或#的单词的文档

转载 作者:行者123 更新时间:2023-12-02 22:40:28 25 4
gpt4 key购买 nike

搜索查询,用于以给定的技能搜索文档。

     GET resume_index/_search
{
"query": {
"bool" : {
"must" : [ {
"ids" : {
"types" : [],
"values" : [ "176", "177", "178", "179", "180", "181", "182", "183", "184", "185", "186", "187", "188", "189", "190", "191", "192", "193", "194", "195", "196", "197", "198", "199", "200", "201", "202", "203", "204", "205", "206", "207", "208", "209", "210", "211", "212", "213", "214", "215", "216", "217", "218", "219", "220", "221", "222", "223", "224", "225", "226", "227", "228" ]
}
}, {
"bool" : {
"should" : [ {
"match" : {
"skills" : {
"query" : "c#",
"type" : "phrase",
"analyzer" : "synonym"


}
}
}, {
"match" : {
"skills" : {
"query" : "asp.net",
"type" : "phrase",
"analyzer" : "synonym"

}
}
} ],
"minimum_should_match" : "1"
}
}, {
"match" : {
"skills" : {
"query" : "c#",
"type" : "phrase",
"analyzer" : "synonym"


}
}
} ]
}
}
}

当我用Java之类的其他技能替换C#时,结果即将到来。但是,如果在搜索查询中提到C#或.net,则即使这些技能已被索引,结果也是空白。
  {
"took": 11,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0.40027505,
"hits": [
{
"_index": "resume_index",
"_type": "phrase",
"_id": "198",
"_score": 0.40027505,
"_source": {
"content": "",
"skills": [
"10g",
"c++",
"crystal report",
"ms-access",
"real estate",
"ui",
".net",
"ado",
"c#",
"database",
"java",
"mvc",
"oracle 10g",
"order management",
"software development life cycle",
"system development",
"technica",
"vb",
"web technologies",
"windows xp",
"wpf",
"c",
"client management",
"development and maintenance",
"development life cycle",
"html",
"jquery",
"mysql",
"oracle",
"r",
"software development",
"ssrs",
"testing",
"unit testing",
"wcf",
"web applications",
"asp.net",
"asp.net mvc",
"c programming",
"deployment",
"management",
"project",
"sales",
"windows",
"adobe photoshop",
"developing",
"java script",
"reports",
"script",
"silverlight",
"software development life cycle (sdlc)",
"sql",
"sql server"
]
}
},
{
"_index": "resume_index",
"_type": "phrase",
"_id": "199",
"_score": 0.3792688,
"_source": {
"content": "",
"skills": [
"application maintenance",
"client-server",
"cms",
"design",
"dos",
"e-commerce",
"erp",
"features",
"finance",
"knockout.js",
"mongodb",
"post-implementation",
"stocks",
".net",
"ado",
"application development",
"c#",
"debugging",
"documentation",
"insurance",
"integration",
"java",
"mvc",
"planning",
"software development life cycle",
"svn",
"technica",
"agile",
"android",
"architecture",
"automation testing",
"client management",
"coordination",
"development life cycle",
"functional specification",
"healthcare",
"html",
"iis",
"jquery",
"networking",
"requirement gathering",
"software development",
"testing",
"tfs",
"triaging",
"troubleshooting",
"visual source safe",
"wcf",
"xcode",
"xml",
"asp.net",
"channel",
"css",
"entity framework",
"implementation and testing",
"maintenance support",
"management",
"mobile application",
"oracle forms",
"project",
"sales",
"sdlc",
"technical support",
"windows",
"ajax",
"analytics",
"client interaction",
"code management tool",
"crm",
"customer",
"excellent communication",
"ipad",
"java script",
"linq",
"market",
"ms dos",
"reporting"
]
}
},
{
"_index": "resume_index",
"_type": "phrase",
"_id": "208",
"_score": 0.3556832,
"_source": {
"content": "",
"skills": [
"c++",
"control system",
"design",
"features",
"real estate",
".net",
"ado",
"c#",
"database",
"integration",
"java",
"mvc",
"rdlc",
"stored procedures",
"technica",
"test case",
"agile",
"architecture",
"automation testing",
"banking",
"c",
"debug",
"design patterns",
"excel",
"iis",
"jquery",
"object oriented programming",
"r",
"security",
"software development",
"t-sql",
"testing",
"unit testing",
"wcf",
"web applications",
"xml",
"asp.net",
"c programming",
"css",
"development methodologies",
"entity framework",
"management",
"project",
"rdlc reports",
"sales",
"triggers",
"windows",
"ajax",
"coding",
"customs",
"dbms",
"developing",
"java script",
"javascript",
"linq",
"reporting",
"reports",
"schemas",
"script",
"software engineering",
"sql",
"sql server",
"telerik",
"test case design",
"visual studio",
"web service",
"webform",
"webforms"
]
}
}
]
}
}

由于允许的最大字符,我删除了内容部分。
您可以看到C#已在文档中建立索引,但仍然无法获得结果。
Query Analysis Looks ok to me.    

{
"valid": true,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"explanations": [
{
"index": "resume_index",
"valid": true,
"explanation": "+ConstantScore(_uid:txt#176 _uid:txt#177 _uid:txt#178 _uid:txt#179 _uid:txt#180 _uid:txt#181 _uid:txt#182 _uid:txt#183 _uid:txt#184 _uid:txt#185 _uid:txt#186 _uid:txt#187 _uid:txt#188 _uid:txt#189 _uid:txt#190 _uid:txt#191 _uid:txt#192 _uid:txt#193 _uid:txt#194 _uid:txt#195 _uid:txt#196 _uid:txt#197 _uid:txt#198 _uid:txt#199 _uid:txt#200 _uid:txt#201 _uid:txt#202 _uid:txt#203 _uid:txt#204 _uid:txt#205 _uid:txt#206 _uid:txt#207 _uid:txt#208 _uid:txt#209 _uid:txt#210 _uid:txt#211 _uid:txt#212 _uid:txt#213 _uid:txt#214 _uid:txt#215 _uid:txt#216 _uid:txt#217 _uid:txt#218 _uid:txt#219 _uid:txt#220 _uid:txt#221 _uid:txt#222 _uid:txt#223 _uid:txt#224 _uid:txt#225 _uid:txt#226 _uid:txt#227 _uid:txt#228) +((skills:asp.net skills:c#)~1) +skills:c#"
}
]
}

我可以看到查询格式正确。是某事,我想念这导致此问题。我正在使用空白 token 生成器

最佳答案

我可以看到查询时间分析似乎可以正常工作,但是如何在索引时间进行分析?默认的standard tokenizer#.进行标记,这将导致c#查询不匹配,如下所示:

curl -XGET http://192.168.12.5:9200/tmdb/_analyze?analyzer=standard -d 'c#'
{
"tokens":[
{"token":"c",
"start_offset":1,
"end_offset":2,
"type":"<ALPHANUM>",
"position":1
}]}

您可以切换到仅在空白上标记化的 simple analyzer。您还可以使用 mapping char filter映射类似 c# => csharp的内容。

关于elasticsearch - ElasticSearch不返回带前缀或后缀带点或#的单词的文档,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32821121/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com