indexing - Solr/Lucene 文档中的部分更新-6ren

indexing - Solr/Lucene 文档中的部分更新

转载作者：行者123 更新时间：2023-12-04 14:52:48

26

4

最近我们开始探索 Solr 部分索引更新。

完整和部分更新的 API 看起来很相似。代替

doc.addField("location", "UK")
solrClient.add(doc)

你必须写

doc.addField("location", map("set", "Germany"))
solrClient.add(doc)

我预计会发生什么:solr 将更新字段“位置”的倒排索引

实际发生的情况:

solr 加载文档

的存储字段

适用于文档

的给定更新

按 ID

删除文档

将文档写入索引

结果，所有未存储的字段都丢失了。

我在邮件列表中发现了一些旧的讨论，人们说这是预期的行为，您需要存储所有字段等等。我们不想存储所有字段。 “Stored”属性是为需要从 Solr 返回给调用者的响应中返回的字段而设计的。我们在响应中只需要很小的元信息，使所有存储的字段看起来都有些矫枉过正。

问题是 - 为什么 solr/lucene 执行所有这些步骤来执行部分更新？据我了解，每个字段在自己的文件中都有自己的倒排索引，因此应该可以独立更新字段。从实际发生的情况来看，solr/lucene 无法更新单个字段的索引，我找不到原因。

关于这个话题的讨论:

https://stackoverflow.com/a/34643681/2513573

https://lucene.472066.n3.nabble.com/Partial-update-vs-full-update-performance-td4069948.html

最佳答案

你的观察是正确的——这就是行为。原因是有些因素可能取决于其他字段(例如，通过 copyField 指令)、字段如何合并(位置增量等)，这也是为什么部分更新只能对存储的字段进行的原因- 简单地加载文档，操作该特定字段的值，然后再次索引。
这些字段没有自己的索引文件 - 它是完整索引的一组文件，并且索引只是附加 - 文档不会在此索引中就地更改(因此文档仅标记为已删除，并且然后将新文档附加到索引中)。当您运行时 optimize在索引上，索引被重写而不存在已删除的文档。
有一种方法可以解决这个问题，如果您的字段满足一组条件，an in-place update can be performed instead .这就是你所要求的。

In-place updates are very similar to atomic updates; in some sense, this is a subset of atomic updates. In regular atomic updates, the entire document is reindexed internally during the application of the update. However, in this approach, only the fields to be updated are affected and the rest of the documents are not reindexed internally. Hence, the efficiency of updating in-place is unaffected by the size of the documents that are updated (i.e., number of fields, size of fields, etc.). Apart from these internal differences, there is no functional difference between atomic updates and in-place updates.

但是，要求可能与您的用例不匹配 - 即它们必须是非索引和数字(因为正在替换的是后台中的 docValue，而不是索引中的内容 - 通常无法执行此操作 -索引仅附加):

An atomic update operation is performed using this approach only when the fields to be updated meet these three conditions:

are non-indexed (indexed="false"), non-stored (stored="false"), single valued (multiValued="false") numeric docValues (docValues="true") fields;

the version field is also a non-indexed, non-stored single valued docValues field; and,

copy targets of updated fields, if any, are also non-indexed, non-stored single valued numeric docValues fields.

To use in-place updates, add a modifier to the field that needs to be updated. The content can be updated or incrementally increased.

关于indexing - Solr/Lucene 文档中的部分更新，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/60061358/

26

4

0

文章推荐： dll - DLL 怎么可能有零导出？

文章推荐： android - 为什么 ListView 不显示用户输入的文本

文章推荐： SwiftUI - 表单选择器 - 如何防止导航回到选定的位置？

typescript - A 部分部分 io-ts
我在使用 io-ts 时遇到一些问题。我发现它确实缺乏文档，我取得的大部分进展都是通过 GitHub issues 取得的。不，我不明白 HKT，所以没有帮助。基本上，我在其他地方创建一个类型，ty
java - 匹配完整文件正则表达式中的 A 部分，但不匹配 B 部分
我必须创建一个正则表达式来搜索整个文件，以找到与 Java XML 解析器的第一部分(但不是第二部分)的匹配项。这将用于防止某些 XXE 攻击。不幸的是，它确实必须是单个正则表达式，并且它确实需要搜索
c# - 部分/部分中的 asp.net mvs 部分？
我有一些简单的 Shared/_Header.cshtml 文件中的内容。 My Shared/_Layout.cshtml 通过调用插入该代码 @Html.Partial("_Header") 目前
java - Selenium 只执行循环的 if != null 部分，不运行循环的 "else if null "部分
我有一个 if-else 语句，其中: 条件 1:ID 匹配并且自动填充某些字段。然后 if 语句只填充其余字段条件 2:ID 不匹配，所有字段均为空白。 ELSE 语句将它们全部填充当我使条件
javascript - 无法在 JSFIDDLE 中使用滚动魔法(第 1 部分，共 2 部分)
我正在开发一个单页滚动网站。我正在尝试实现 ScrollMagic 并固定第一部分，以便网站的其余部分滚动到固定部分的顶部。我尝试创建一个 jsfiddle 来显示问题，但我似乎无法让 jsfiddl
javascript - 既然有

首页

博学

6Ren·AI

商城

indexing - Solr/Lucene 文档中的部分更新