gpt4 book ai didi

mysql - 将大数据插入 Cloud Spanner 表

转载 作者:行者123 更新时间:2023-11-29 11:00:39 30 4
gpt4 key购买 nike

我想将大数据插入 Google 的 Cloud Spanner 表。

这就是我正在使用node.js应用程序所做的事情,但它停止了,因为txt文件太大(几乎2GB)。

1.load txt file

2.read line by line

3.split the line by "|"

4.build data object

5.insert data to Cloud Spanner table

Mysql支持使用.sql文件插入数据。 Cloud Spanner 也支持多种方式吗?

最佳答案

Cloud Spanner 目前不公开批量导入方法。听起来您打算单独插入每一行,这不是最佳方法。该文档提供了 efficient bulk loading 的最佳(和不良)实践。 :

To get optimal write throughput for bulk loads, partition your data by primary key with this pattern:

Each partition contains a range of consecutive rows. Each commit contains data for only a single partition. A good rule of thumb for your number of partitions is 10 times the number of nodes in your Cloud Spanner instance. So if you have N nodes, with a total of 10*N partitions, you can assign rows to partitions by:

Sorting your data by primary key. Dividing it into 10*N separate sections. Creating a set of worker tasks that upload the data. Each worker will write to a single partition. Within the partition, it is recommended that your worker write the rows sequentially. However, writing data randomly within a partition should also provide reasonably high throughput.

As more of your data is uploaded, Cloud Spanner automatically splits and rebalances your data to balance load on the nodes in your instance. During this process, you may experience temporary drops in throughput.

Following this pattern, you should see a maximum overall bulk write throughput of 10-20 MiB per second per node.

看起来您正在尝试在处理之前将整个大文件加载到内存中。对于大文件,您应该考虑加载和处理 block 而不是整个文件。我是一名 Node 专家,但您可能应该尝试将其作为流读取,而不是将所有内容都保留在内存中。

关于mysql - 将大数据插入 Cloud Spanner 表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42339544/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com