gpt4 book ai didi

java - ElasticSearch - 在我的输入查询中没有 (*) 时 JavaApi 搜索不会发生

转载 作者:行者123 更新时间:2023-12-02 11:02:33 29 4
gpt4 key购买 nike

我使用 java api 从 Elasticsearch 中获取文档,我的 Elasticsearch 文档中有以下代码,并尝试使用以下模式搜索它。

代码:MS-VMA1615-0D

Input : *VMA1615-0*     -- Am getting the results (MS-VMA1615-0D).
Input : MS-VMA1615-0D -- Am getting the results (MS-VMA1615-0D).
Input : *VMA1615-0 -- Am getting the results (MS-VMA1615-0D).
Input : *VMA*-0* -- Am getting the results (MS-VMA1615-0D).

但是,如果我给出如下所示的输入,则不会得到结果。

Input : VMA1615         -- Am not getting the results.

我希望返回代码MS-VMA1615-0D

请找到我正在使用的下面的java代码

private final String INDEX = "products";
private final String TYPE = "doc";
SearchRequest searchRequest = new SearchRequest(INDEX);
searchRequest.types(TYPE);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
QueryStringQueryBuilder qsQueryBuilder = new QueryStringQueryBuilder(code);

qsQueryBuilder.defaultField("code");
searchSourceBuilder.query(qsQueryBuilder);

searchSourceBuilder.size(50);
searchRequest.source(searchSourceBuilder);
SearchResponse searchResponse = null;
try {
searchResponse = SearchEngineClient.getInstance().search(searchRequest);
} catch (IOException e) {
e.getLocalizedMessage();
}
Item item = null;
SearchHit[] searchHits = searchResponse.getHits().getHits();

请查找我的映射详细信息:

PUT products
{
"settings": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"char_filter": [
"html_strip"
],
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"code": {
"type": "text",
"analyzer": "custom_analyzer"
}
}
}
}
}

最佳答案

要执行您正在寻找的操作,您可能必须更改您正在使用的标记生成器。目前您正在使用 whitespace 分词器,必须将其替换为 pattern 分词器。因此,您的新映射应如下所示:

PUT products
{
"settings": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "pattern",
"char_filter": [
"html_strip"
],
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"code": {
"type": "text",
"analyzer": "custom_analyzer"
}
}
}
}
}

因此,更改映射后,对 VMA1615 的查询将返回 MS-VMA1615-0D

这是因为它将字符串“MS-VMA1615-0D”标记为“MS”、“VMA1615”和“0D”。因此,只要在您的查询中遇到其中任何一个,它都会为您提供结果。

POST _analyze
{
"tokenizer": "pattern",
"text": "MS-VMA1615-0D"
}

将返回:

{
"tokens": [
{
"token": "MS",
"start_offset": 0,
"end_offset": 2,
"type": "word",
"position": 0
},
{
"token": "VMA1615",
"start_offset": 3,
"end_offset": 10,
"type": "word",
"position": 1
},
{
"token": "0D",
"start_offset": 11,
"end_offset": 13,
"type": "word",
"position": 2
}
]
}

根据您的评论:

It is not how elasticsearch works. Elasticsearch stores the terms and their corresponding documents in an inverted index data structure and by default the terms produced by a full text search is based on white-spaces, i.e. a text "Hi there I am a technocrat" would split up as ["Hi", "there", "I", "am", "a", "technocrat"]. So this implies that the terms which gets stored depends on how it is tokenized. After indexing when you query let's say in the above example if I query for "technocrat", I will get the result as the inverted index has that term associated with my document. So in your case "VMA" is not stored as a term.

为此,请使用以下映射:

PUT products
{
"settings": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "my_pattern_tokenizer",
"char_filter": [
"html_strip"
],
"filter": [
"lowercase",
"asciifolding"
]
}
},
"tokenizer": {
"my_pattern_tokenizer": {
"type": "pattern",
"pattern": "-|\\d"
}
}
}
},
"mappings": {
"doc": {
"properties": {
"code": {
"type": "text",
"analyzer": "custom_analyzer"
}
}
}
}
}

所以要检查:

POST products/_analyze
{
"tokenizer": "my_pattern_tokenizer",
"text": "MS-VMA1615-0D"
}

将产生:

{
"tokens": [
{
"token": "MS",
"start_offset": 0,
"end_offset": 2,
"type": "word",
"position": 0
},
{
"token": "VMA",
"start_offset": 3,
"end_offset": 6,
"type": "word",
"position": 1
},
{
"token": "D",
"start_offset": 12,
"end_offset": 13,
"type": "word",
"position": 2
}
]
}

关于java - ElasticSearch - 在我的输入查询中没有 (*) 时 JavaApi 搜索不会发生,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51212683/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com