gpt4 book ai didi

elasticsearch - 在Logstash ElasticSearch中将_Id设置为更新键

转载 作者:行者123 更新时间:2023-12-03 01:26:23 26 4
gpt4 key购买 nike

我的索引如下:

{
"_index": "mydata",
"_type": "_doc",
"_id": "PuhnbG0B1IIlyY9-ArdR",
"_score": 1,
"_source": {
"age": 9,
"@version": "1",
"updated_on": "2019-01-01T00:00:00.000Z",
"id": 4,
"name": "Emma",
"@timestamp": "2019-09-26T07:09:11.947Z"
}

因此,我用于更新数据的logstash conf输入为{
    jdbc {
jdbc_connection_string => "***"
jdbc_driver_class => "***"
jdbc_driver_library => "***"
jdbc_user => ***
statement => "SELECT * from agedata WHERE updated_on > :sql_last_value ORDER BY updated_on"
use_column_value =>true
tracking_column =>updated_on
tracking_column_type => "timestamp"
}
}
output {
elasticsearch { hosts => ["localhost:9200"]
index => "mydata"
action => update
document_id => "{_id}"
doc_as_upsert =>true}
stdout { codec => rubydebug }
}

因此,当我在同一行中进行任何更新后运行此命令时,我的预期输出是为该行中进行的任何更改更新现有的_id值。
但是我的Elasticsearch将其索引为新行,其中我的_id被视为字符串。
"_index": "agesep",
"_type": "_doc",
"_id": ***"%{_id}"***

当我使用document_id =>“%{id}”时,将出现重复项:
实际:
         {
"_index": "mydata",
"_type": "_doc",
"_id": "BuilbG0B1IIlyY9-4P7t",
"_score": 1,
"_source": {
"id": 1,
"age": 13,
"name": "Greg",
"updated_on": "2019-09-26T08:11:00.000Z",
"@timestamp": "2019-09-26T08:17:52.974Z",
"@version": "1"
}
}

重复:
{
"_index": "mydata",
"_type": "_doc",
"_id": "1",
"_score": 1,
"_source": {
"age": 56,
"@version": "1",
"id": 1,
"name": "Greg",
"updated_on": "2019-09-26T08:18:00.000Z",
"@timestamp": "2019-09-26T08:20:14.561Z"
}

如何在ES中进行更新时考虑现有的_id而不创建重复的值?
我的期望是基于_id更新索引中的数据,而不创建新的更新行。

最佳答案

我建议使用id而不是_id

        document_id => "%{id}"

关于elasticsearch - 在Logstash ElasticSearch中将_Id设置为更新键,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58111721/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com