gpt4 book ai didi

java - 如何使用 SolrJ Java 应用程序索引不同类型的文件(pdf、word、html 等)

转载 作者:行者123 更新时间:2023-11-30 05:48:36 24 4
gpt4 key购买 nike

我是 SolrJ 的新手。我需要使用 SolrJ Java API 索引 zip、pdf 和 html 文档。谁能给我一些在 java 应用程序中使用 SolrJ 来索引不同类型文档的示例吗?

有没有什么好的链接可以让我找到很好的 Java 示例来索引文件夹中可用的不同类型的文档...

感谢您的帮助..

根据输出,很明显 solrj 没有索引我正在尝试的 .xml 文件,任何人都可以评论我做错了什么......

代码:

 String urlString = "http://localhost:8983/solr/tests";
HttpSolrClient solr = new HttpSolrClient.Builder(urlString).build();

solr.setParser(new XMLResponseParser());

File file = new File("D:/work/devtools/Solr/solr-7.6.0/example/exampledocs/hd.xml");
InputStream fis = new FileInputStream(file);
/* Tika specific */
ContentHandler contenthandler = new BodyContentHandler(10 * 1024 * 1024);
Metadata metadata = new Metadata();
metadata.set(Metadata.RESOURCE_NAME_KEY, "hd.xml");
ParseContext parseContext = new ParseContext();
// Automatically detect best parser base on detected document type
AutoDetectParser autodetectParser = new AutoDetectParser();
// OOXMLParser parser = new OOXMLParser();
autodetectParser.parse(fis, contenthandler, metadata, parseContext);
SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", file.getCanonicalPath());
SolrQuery query = new SolrQuery("*.*");
// query.set("q", "price:599.99");
QueryResponse response = solr.query(query);

输出:

solr query{responseHeader={status=0,QTime=0,params={q=*.*,wt=xml,version=2.2}},response={numFound=0,start=0,docs=[]}}

最佳答案

基本信息链接:https://www.youtube.com/watch?v=rxoS1p1TaFY&t=198s2) https://lucene.apache.org/solr/下载最新版本的链接 如何在Java应用程序中使用solrj: java版本应该是1.8 @)下载solr最新版本解压 1)在 pom.xml 文件中添加依赖项 org.apache.solr solr-solrj 7.6.0

** 从 solr/bin 文件夹启动 solr 并通过点击此 http://localhost:8983/solr/# 检查 solr 管理控制台 2) 基本示例代码:(此代码足以理解 solrj)

    create the indexfiles core in solr and use the following code 

String urlString = "http://localhost:8983/solr/indexfiles";
HttpSolrClient solr = new HttpSolrClient.Builder(urlString).build();

solr.setParser(new XMLResponseParser());
File file = new File("D:/work/devtools/Solr/solr-7.6.0/example/exampledocs/176444.zip");

ContentStreamUpdateRequest req = new ContentStreamUpdateRequest("/update/extract");

// req.addFile(file, "application/pdf");//change the content type for different input files
req.addFile(file, "text/plain");
String fileName = file.getName();
req.setParam("literal.id", fileName);
req.setAction(req.getAction().COMMIT, true, true);
NamedList<Object> result = solr.request(req);
int status = (Integer) ((org.apache.solr.common.util.SimpleOrderedMap) (result.get("responseHeader"))).get("status");

System.out.println("Result: " +result);
System.out.println("solr query"+ solr.query(new SolrQuery("*.*")));



3)query from the solr admin console using this http://localhost:8983/solr/indexfiles/select?q=SOLR1000

just change the text(q="<text to search>") that u want to search that available in the files that u indexed

u can find query parameter q in the solr admin console where we can give the required text to search if u are not comfortable with solr querys by default it is *:*


NOTE:dont need to think about Apache Tika and all to integrate it with Apache solr to index zip files and all because its by default available in solr new version

****Note: dont confuse by looking into the outputs from standalone admin(which gives complete data in the output ex: hd.xml is indexed which is available in the /exampledocs folder in solr) and the output u get by indexing the same files using solrj through java application

ex:solrj it will just index the file which means from the solr admin console u can see the following as out put when u fire query
(http://localhost:8983/solr/indexfiles/select?q=*:*)
output:

{
"id":"hd.xml",
"stream_size":["null"],
"x_parsed_by":["org.apache.tika.parser.DefaultParser",
"org.apache.tika.parser.xml.DcXMLParser"],
"stream_content_type":["text/xml"],
"content_type":["application/xml"],
"_version_":1624155471570010112},


But if we index throw command prompt using ---> java -Dc=name -jar post.jar *.xml the output contains the data available inside the xml file (http://localhost:8983/solr/indexfiles/select?q=*:*)

关于java - 如何使用 SolrJ Java 应用程序索引不同类型的文件(pdf、word、html 等),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54399386/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com