gpt4 book ai didi

java - AppEngine 全文文档索引使用词干运算符进行搜索

转载 作者:行者123 更新时间:2023-12-02 09:22:24 26 4
gpt4 key购买 nike

我正在评估 AppEngine 文档索引全文搜索,并在使用词干运算符“~”时遇到一些问题。基本上,我创建了一些测试文档的索引,所有文档都带有标题字段。该字段的一些示例值是:

"Houses Desks Tables"
"referer image vod event"
"events with cats and dogs and"
"names very interesting days"

我使用的是 Java,我的查询代码片段如下所示:

Document doc = Document.newBuilder().setId(key)
.addField(Field.newBuilder().setName("title").setText(title))
.addField(Field.newBuilder().setName("type").setText(type))
.addField(Field.newBuilder().setName("username").setText(username))
.build();
DocumentSearchIndexService.getInstance().indexDocument(indexName, doc);
IndexSpec indexSpec = IndexSpec.newBuilder().setName(indexName).build();
Index index = SearchServiceFactory.getSearchService().getIndex(indexSpec);
return index.search("title = ~"+searchText);

但是,返回的结果将始终仅匹配精确的单数或复数形式:

query cat, return nothing
query dog, return nothing
query name, return nothing
query house, return nothing

query cats, return "events with cats and dogs and"
query dogs, return "events with cats and dogs and"
query names, return "names very interesting days"
query houses, return "Houses Desks Tables"

所以我真的很迷茫,因为我不知道如何返回条目,或者我的查询构造方式是否不正确。

最佳答案

请注意,词干是 not implemented如果您在标准环境上使用 Java 8 的 Java 开发服务器。

如果您要在 App Engine 上部署应用程序,请使用找到的 Utils.java 类 here正确索引您的文档。

我克隆了 repository对于 Google Cloud Platform 的 java-docs-samples,转到 appengine-java8/search 文件夹并修改 SearchServlet.java 的代码为了包含带有词干运算符“~”的查询,请按以下方式创建类:

...
@Override
public void doGet(HttpServletRequest req, HttpServletResponse resp) throws IOException {
PrintWriter out = resp.getWriter();
Document doc =
Document.newBuilder()
.setId("theOnlyPiano")
.addField(Field.newBuilder().setName("product").setText("cats and dogs"))
.addField(Field.newBuilder().setName("maker").setText("Yamaha"))
.addField(Field.newBuilder().setName("price").setNumber(4000))
.build();
try {
Utils.indexADocument(SEARCH_INDEX, doc);
} catch (InterruptedException e) {
// ignore
}
// [START search_document]
final int maxRetry = 3;
int attempts = 0;
int delay = 2;
while (true) {
try {
String searchText = "cat";
String queryString = "product = ~"+searchText;
Results<ScoredDocument> results = getIndex().search(queryString);

// Iterate over the documents in the results
for (ScoredDocument document : results) {
// handle results
out.print("product: " + document.getOnlyField("product").getText());
//out.println(", price: " + document.getOnlyField("price").getNumber());
}
} catch (SearchException e) {
if (StatusCode.TRANSIENT_ERROR.equals(e.getOperationResult().getCode())
&& ++attempts < maxRetry) {
// retry
try {
Thread.sleep(delay * 1000);
} catch (InterruptedException e1) {
// ignore
}
delay *= 2; // easy exponential backoff
continue;
} else {
throw e;
}
}
break;
}
// [END search_document]
// We don't test the search result below, but we're fine if it runs without errors.
out.println(" Search performed");
Index index = getIndex();
// [START simple_search_1]
index.search("rose water");
// [END simple_search_1]
// [START simple_search_2]
index.search("1776-07-04");
// [END simple_search_2]
// [START simple_search_3]
// search for documents with pianos that cost less than $5000
index.search("product = ~cat AND price < 5000");
// [END simple_search_3]
}
}

并且我能够验证词干运算符对于复数(例如猫、狗等单词)是否可以正确地使用“~”。但请注意,正如文档中提到的,词干算法有其 limitations .

注意。如果您想复制我所做的步骤,请不要忘记评论 SearchServletTest.java 上的测试部分。使用 mvn appengine:deploy 将应用程序部署到 App Engine 之前使用 .class 进行部署。该文件应如下所示:

...
@After
public void tearDown() {
helper.tearDown();
}

@Test
public void doGet_successfulyInvoked() throws Exception {
// servletUnderTest.doGet(mockRequest, mockResponse);
// String content = responseWriter.toString();
// assertWithMessage("SearchServlet response").that(content).contains("maker: Yamaha");
// assertWithMessage("SearchServlet response").that(content).contains("price: 4000.0");
}
}

关于java - AppEngine 全文文档索引使用词干运算符进行搜索,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58601695/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com