gpt4 book ai didi

java - 通过java代码在elasticsearch中使用inguest-attachment插件索引pdf/word

转载 作者:行者123 更新时间:2023-12-02 11:06:23 25 4
gpt4 key购买 nike

我正在尝试为我的 word/pdf 文档建立索引,以便我使用 java 创建一个 util 程序将我的文件编码为 Base64,然后尝试在 ElasticSearch 中为它们建立索引。

请找到我的以下代码,我可以将我的文件编码为 Base64。现在,我不确定如何在 ElasticSearch 中对它们进行索引

请在下面找到我的 java 代码。

public static void main(String args[]) throws IOException {
String filePath = "D:\\\\1SearchEngine\\testing.pdf";
String encodedfile = null;
RestHighLevelClient restHighLevelClient = null;
File file = new File(filePath);
try {
FileInputStream fileInputStreamReader = new FileInputStream(file);
byte[] bytes = new byte[(int) file.length()];
fileInputStreamReader.read(bytes);
encodedfile = new String(Base64.getEncoder().encodeToString(bytes));
//System.out.println(encodedfile);
} catch (FileNotFoundException e) {
e.printStackTrace();
}

try {
if (restHighLevelClient != null) {
restHighLevelClient.close();
}
} catch (final Exception e) {
System.out.println("Error closing ElasticSearch client: ");
}

try {
restHighLevelClient = new RestHighLevelClient(RestClient.builder(new HttpHost("localhost", 9200, "http"),
new HttpHost("localhost", 9201, "http")));
} catch (Exception e) {
System.out.println(e.getMessage());
}

IndexRequest request = new IndexRequest( "attach_local", "doc", "103");
Map<String, Object> jsonMap = new HashMap<>();
jsonMap.put("resume", "Karthikeyan");
jsonMap.put("postDate", new Date());
jsonMap.put("resume", encodedfile);
try {
IndexResponse response = restHighLevelClient.index(request);
} catch(ElasticsearchException e) {
if (e.status() == RestStatus.CONFLICT) {

}
}
}

我使用 ElasticSearch 6.2.3 版本,并且我已经安装了 ingest-attachment 插件版本 6.3.0

我正在为 ElasticSearch 客户端使用以下依赖项

<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-high-level-client</artifactId>
<version>6.1.2</version>
</dependency>

请查找我的 map 详细信息

PUT attach_local
{
"mappings" : {
"doc" : {
"properties" : {
"attachment" : {
"properties" : {
"content" : {
"type" : "binary"
},
"content_length" : {
"type" : "long"
},
"content_type" : {
"type" : "text"
},
"language" : {
"type" : "text"
}
}
},
"resume" : {
"type" : "text"
}
}
}
}
}

PUT _ingest/pipeline/attach_local
{
"description" : "Extract attachment information",
"processors" : [
{
"attachment" : {
"field" : "resume"
}
}
]
}

现在在创建索引时从 java 收到以下错误

Exception in thread "main" org.elasticsearch.action.ActionRequestValidationException: Validation Failed: 1: source is missing;2: content type is missing;
at org.elasticsearch.action.ValidateActions.addValidationError(ValidateActions.java:26)
at org.elasticsearch.action.index.IndexRequest.validate(IndexRequest.java:153)
at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:436)
at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:429)
at org.elasticsearch.client.RestHighLevelClient.index(RestHighLevelClient.java:312)
at com.es.utility.DocumentIndex.main(DocumentIndex.java:82)

最佳答案

终于我找到了解决方案,如何通过 Java API 在 ElasticSearch 中索引 PDF/WORD 文档

String filePath = "D:\\\\1SearchEngine\\testing.pdf";
String encodedfile = null;
RestHighLevelClient restHighLevelClient = null;
File file = new File(filePath);
try {
FileInputStream fileInputStreamReader = new FileInputStream(file);
byte[] bytes = new byte[(int) file.length()];
fileInputStreamReader.read(bytes);
encodedfile = new String(Base64.getEncoder().encodeToString(bytes));
} catch (FileNotFoundException e) {
e.printStackTrace();
}

try {
if (restHighLevelClient != null) {
restHighLevelClient.close();
}
} catch (final Exception e) {
System.out.println("Error closing ElasticSearch client: ");
}

try {
restHighLevelClient = new RestHighLevelClient(RestClient.builder(new HttpHost("localhost", 9200, "http"),
new HttpHost("localhost", 9201, "http")));
} catch (Exception e) {
System.out.println(e.getMessage());
}


Map<String, Object> jsonMap = new HashMap<>();
jsonMap.put("Name", "Karthikeyan");
jsonMap.put("postDate", new Date());
jsonMap.put("resume", encodedfile);

IndexRequest request = new IndexRequest("attach_local", "doc", "104")
.source(jsonMap)
.setPipeline("attach_local");

try {
IndexResponse response = restHighLevelClient.index(request);
} catch(ElasticsearchException e) {
if (e.status() == RestStatus.CONFLICT) {

}
}

映射详细信息:

PUT attach_local
{
"mappings" : {
"doc" : {
"properties" : {
"attachment" : {
"properties" : {
"content" : {
"type" : "binary"
},
"content_length" : {
"type" : "long"
},
"content_type" : {
"type" : "text"
},
"language" : {
"type" : "text"
}
}
},
"resume" : {
"type" : "text"
}
}
}
}
}


PUT _ingest/pipeline/attach_local
{
"description" : "Extract attachment information",
"processors" : [
{
"attachment" : {
"field" : "resume"
}
}
]
}

关于java - 通过java代码在elasticsearch中使用inguest-attachment插件索引pdf/word,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50927198/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com