java - Lucidworks 保存 solr 格式未知字段-6ren

java - Lucidworks 保存 solr 格式未知字段

转载作者：可可西里更新时间：2023-11-01 14:51:42

27

4

我正在用 spark java 编写脚本。我需要使用 Lucidworks - spark-solr 工具 (https://github.com/lucidworks/spark-solr) 将数据(从 DataFrame)插入到 Solr 集合中

我的 schema.xml :

<schema name="MY_NAME" version="1.6">
    <field name="_version_" type="long" indexed="true" stored="true" />
    <field name="_root_" type="string" indexed="true" stored="false" />
    <field name="ignored_id" type="ignored" />
    <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
    <field name="age" type="int" indexed="true" stored="true" required="false" multiValued="false" />
    <field name="height" type="tlong" indexed="true" stored="true" required="false" multiValued="false" />
    <field name="name " type="string" indexed="true" stored="true" required="false" multiValued="false" />

    <fieldType name="string" class="solr.StrField" sortMissingLast="true" />
    <fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0" />
    <fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0" />
    <fieldType name="tlong" class="solr.TrieLongField" precisionStep="8" positionIncrementGap="0" />
    <fieldType name="ignored" stored="false" indexed="false" multiValued="true" class="solr.StrField" />

    <uniqueKey>id</uniqueKey>
</schema>

我的数据框:

DataFrame df = sqlContext.sql("SELECT id, age, height, name FROM TABLE");

df.show() 给出:

+--------------------+-----------+------+------+
|                  id|        age|height|name |
+--------------------+-----------+------+------+
|12345678912345678...|         10|   101|hello|

但是当我尝试插入我的 solr 集合时:

df.write()
.format("solr")
.option("collection", MY_COLLECTION)
.option("zkhost", MY_ZKHOST)
.save()

我有以下错误:

Caused by: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://MY_IP/solr/MY_COLLECTION_SHARD_REPLICA: ERROR :[doc=123456789123456789] unknown field '_indexed_at_tdt'

我不明白“_indexed_at_tdt”字段的来源。

DataFrame 似乎只有我要插入的 4 个字段是正确的，但由于这个未知字段“_indexed_at_tdt”，我仍然无法插入到我的 Solr 集合中。

更多信息:我有一个 HBase 索引器，它插入同一个集合并且正在工作。

在此先感谢您的帮助!

最佳答案

如你所见here似乎该字段是由 Lucidworks 代码自动添加的。

您只需将相应的字段添加到架构中，它就会起作用:

<field name="_indexed_at_tdt" type="tdate" indexed="true" stored="true" required="false" multiValued="false" />

或者，如果您更喜欢为 *_tdt 使其动态化。

关于java - Lucidworks 保存 solr 格式未知字段，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/43371613/

27

4

0

文章推荐： apache - 有没有办法在配置单元 UDF 中获取数据库名称

文章推荐： html - 没有计算输入元素的边距，但在指标(和显示)下有边距

文章推荐： html - 具有多个跨度的文本

lucene - lucidworks 企业爬行不索引我的数据
我将 lucidworks Enterprise 配置为使用我的 sql server 数据库中的数据。当我开始清醒地抓取我的数据时，不幸的是没有创建索引! 谢谢最佳答案查看documentati
java - Lucidworks 保存 solr 格式未知字段
我正在用 spark java 编写脚本。我需要使用 Lucidworks - spark-solr 工具 (https://github.com/lucidworks/spark-solr) 将数据
linux - 在 Lucidworks 服务器上添加 Linux 数据源
我正在尝试使用 Lucidworks 创建一个 Linux 服务器的数据源，我可以索引文档以在其上查询搜索。请建议实现它的方法。p.s.-目前我正在尝试使用 NFS 服务器但无法创建连接最佳答案如
hadoop - "Hadoop-Solr Lucidworks Project"检索输入名称路径
我正在使用这个项目:https://github.com/lucidworks/hadoop-solr有谁知道在哪个值中保存了正在处理的文档的名称(或路径)。我想将此值检索到 Solr Admin(将
solr - 使用 Lucidworks Fusion 而不是常规 Solr 的优缺点
我想知道使用 Fusion 而不是常规 Solr 的优缺点是什么？你们能举一些例子吗(比如一些可以使用Fusion轻松解决的问题)？最佳答案首先，我应该透露我是 Lucidworks Fusion
java - Lucidworks Fusion 4.1 使用 Javascript 查询管道转换结果文档
如何在 Lucidworks Fusion 4.1 中使用 JavaScript 查询管道转换 solr 响应？例如我有以下响应: [ { "doc_type":"type1", "p
Java Ivy/Maven 为 LucidWorks auto-phrase-tokenizer 构建依赖解析
我没有构建 Java 项目的经验，我正在尝试为 Solr 构建一个分词器(可在此处找到:https://github.com/LucidWorks/auto-phrase-tokenfilter)。我

首页

博学

6Ren·AI

商城

java - Lucidworks 保存 solr 格式未知字段