gpt4 book ai didi

apache - 将Solr(4.8.1)指向目录(Windows 7)

转载 作者:行者123 更新时间:2023-12-02 22:25:33 25 4
gpt4 key购买 nike

我正在为文档目录(文件类型,例如:MS Word,PDF,.txt,PowerPoint等)建立搜索系统。

文档目录存储在本地网络中。

我在机器上启动并运行了Apache Solr(可从本地主机端口8983进行查看和访问的管理员 Pane )。

现在,我需要索引目录中文档的内容和标题,并使其可通过我的Solr服务器进行搜索。

我下一步要去哪里?
- -进一步来说 - -

  • 是否需要集成开放源索引技术,或者Solr可以自己对文档进行索引吗?
  • 如何告诉Solr在此目录中专门搜索? (要么
    通常,在我的硬盘驱动器/本地网络上的目录中)
  • 最佳答案

    您可以使用Solr Cell(以前称为ExtractingRequestHandler)

    它建立在Apache Tika Project之上。

    关于Solr Cell:

    Key Concepts

    When using the Solr Cell framework, it is helpful to keep the following in mind:

    • Tika will automatically attempt to determine the input document type (Word, PDF, HTML) and extract the content appropriately. If you like, you can explicitly specify a MIME type for Tika with the stream.type parameter.
    • Tika works by producing an XHTML stream that it feeds to a SAX ContentHandler. SAX is a common interface implemented for many
      different XML parsers. For more information, see
      http://www.saxproject.org/quickstart.html.
    • Solr then responds to Tika's SAX events and creates the fields to index.
    • Tika produces metadata such as Title, Subject, and Author according to specifications such as the DublinCore. See
      http://tika.apache.org/1.5/formats.html for the file types supported.
    • Tika adds all the extracted text to the content field. This field is defined as "stored" in schema.xml. It is also copied to the text field with a copyField rule.
    • You can map Tika's metadata fields to Solr fields. You can also boost these fields.
    • You can pass in literals for field values. Literals will override Tika-parsed values, including fields in the Tika metadata object, the Tika content field, and any "captured content" fields.
    • You can apply an XPath expression to the Tika XHTML to restrict the content that is produced.


    Solr Cell上的Wiki页面包含教程和配置信息。

    关于apache - 将Solr(4.8.1)指向目录(Windows 7),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24213696/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com