gpt4 book ai didi

java - 使用 Apache Beam 将插入流式传输到 BigQuery 时如何指定 insertId

转载 作者:行者123 更新时间:2023-12-02 15:08:01 29 4
gpt4 key购买 nike

BigQuery 支持流式插入的重复数据删除。如何通过 Apache Beam 使用此功能?

https://cloud.google.com/bigquery/streaming-data-into-bigquery#dataconsistency

To help ensure data consistency, you can supply insertId for each inserted row. BigQuery remembers this ID for at least one minute. If you try to stream the same set of rows within that time period and the insertId property is set, BigQuery uses the insertId property to de-duplicate your data on a best effort basis. You might have to retry an insert because there's no way to determine the state of a streaming insert under certain error conditions, such as network errors between your system and BigQuery or internal errors within BigQuery. If you retry an insert, use the same insertId for the same set of rows so that BigQuery can attempt to de-duplicate your data. For more information, see troubleshooting streaming inserts.

我在 Java 文档中找不到这样的功能。 https://beam.apache.org/releases/javadoc/2.9.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html

this question ,他建议在TableRow中设置insertId。这是正确的吗?

https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/java/latest/com/google/api/services/bigquery/model/TableRow.html?is-external=true

BigQuery 客户端库具有此功能。

https://googleapis.github.io/google-cloud-java/google-cloud-clients/apidocs/index.html?com/google/cloud/bigquery/package-summary.html https://github.com/googleapis/google-cloud-java/blob/master/google-cloud-clients/google-cloud-bigquery/src/main/java/com/google/cloud/bigquery/InsertAllRequest.java#L134

最佳答案

  • Pub/Sub + Beam/Dataflow + BigQuery:应该保证“恰好一次”,您无需对此太担心。当您要求 Dataflow 使用 FILE_LOADS 插入 BigQuery 时,这种保证会更强。而不是STREAMING_INSERTS ,暂时。

  • Kafka + Beam/Dataflow + BigQuery:如果一条消息可以从 Kafka 多次发出(例如,如果生产者重试插入),那么您需要注意-复制。在 BigQuery 中(根据您的评论,目前已实现),或在带有 .apply(Distinct.create()) 转换的 Dataflow 中。

关于java - 使用 Apache Beam 将插入流式传输到 BigQuery 时如何指定 insertId,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54110596/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com