gpt4 book ai didi

Elasticsearch 5.5.0 : Finding relevant documents

转载 作者:行者123 更新时间:2023-11-29 02:56:02 25 4
gpt4 key购买 nike

在 ElasticSearch 5.5.0 中,我正在浏览“more_like_this”子句但无法找到相关文档。我在 ElasticSearch 中有以下数据,“描述”字段有大量大小超过 100 万字节的非索引数据。就像下面我有一万个文件。我怎样才能找出一组相互匹配至少 80% 的文档:

{
"_index": "school",
"_type": "book",
"_id": "1",
"_source": {
"title": "How to drive safely",
"description": "LOTS OF WORDS...The book is written to help readers about giving driving safety guidelines. Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum. Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum. LONG...."
}
}

最后,我正在寻找具有至少 80% 匹配内容的文档 ID 列表。包含匹配文档 ID 的可能预期结果(任何格式都可以):

[ [1,30, 500, 8000], [2, 40, 199], .... ]

我是否需要编写批处理并将每个文档与所有其他文档进行比较并构建输出集?

请帮忙。

最佳答案

more like this query有一个名为 minimum_should_match 的参数,可以将其设置为 80%。但是,此处还需要考虑 max_query_terms 参数。

最重要的是,当您为这些字段的内容编制索引时,它才有效。

此外,在查询时执行此操作听起来非常缓慢。您可能想在这里重新考虑您的策略,并在索引时间对文档进行集群/分组(您需要自己做一些事情,因为这是一项非常定制的事情),以便搜索变得更快。

关于 Elasticsearch 5.5.0 : Finding relevant documents,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45709102/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com