gpt4 book ai didi

cassandra - DataStax 企业 : Spark Cassandra Batch Size

转载 作者:行者123 更新时间:2023-12-02 08:30:53 30 4
gpt4 key购买 nike

我在我的 SparkConf 中设置参数 spark.cassandra.output.batch.size.rows 如下:

val conf = new SparkConf(true)
.set("spark.cassandra.connection.host", "host")
.set("spark.cassandra.auth.username", "cassandra")
.set("spark.cassandra.auth.password", "cassandra")
.set("spark.cassandra.output.batch.size.rows", "5120")
.set("spark.cassandra.output.concurrent.writes", "10")

但是当我表演的时候

saveToCassandra("data","ten_days")

我继续在 system.log 中看到警告

NFO [FlushWriter:7] 2014-11-20 11:11:16,498 Memtable.java (line 395) Completed flushing /var/lib/cassandra/data/system/hints/system-hints-jb-76-Data.db (5747287 bytes) for commitlog position ReplayPosition(segmentId=1416480663951, position=44882909)
INFO [FlushWriter:7] 2014-11-20 11:11:16,499 Memtable.java (line 355) Writing Memtable-ten_days@1656582530(32979978/329799780 serialized/live bytes, 551793 ops)
WARN [Native-Transport-Requests:761] 2014-11-20 11:11:16,499 BatchStatement.java (line 226) Batch of prepared statements for [data.ten_days] is of size 36825, exceeding specified threshold of 5120 by 31705.
WARN [Native-Transport-Requests:777] 2014-11-20 11:11:16,500 BatchStatement.java (line 226) Batch of prepared statements for [data.ten_days] is of size 36813, exceeding specified threshold of 5120 by 31693.
WARN [Native-Transport-Requests:822] 2014-11-20 11:11:16,501 BatchStatement.java (line 226) Batch of prepared statements for [data.ten_days] is of size 36823, exceeding specified threshold of 5120 by 31703.
WARN [Native-Transport-Requests:835] 2014-11-20 11:11:16,500 BatchStatement.java (line 226) Batch of prepared statements for [data.ten_days] is of size 36817, exceeding specified threshold of 5120 by 31697.
WARN [Native-Transport-Requests:781] 2014-11-20 11:11:16,501 BatchStatement.java (line 226) Batch of prepared statements for [data.ten_days] is of size 36817, exceeding specified threshold of 5120 by 31697.
WARN [Native-Transport-Requests:755] 2014-11-20 11:11:16,501 BatchStatement.java (line 226) Batch of prepared statements for [data.ten_days] is of size 36822, exceeding specified threshold of 5120 by 31702.

我知道这只是警告,但我想了解为什么我的设置没有按预期工作。然后我可以在我的集群中看到很多提示。批量大小会影响集群中提示的数量吗?

谢谢

最佳答案

您已设置批量大小行而不是批量大小字节。这意味着连接器限制的是行数,而不是批处理的内存大小。

spark.cassandra.output.batch.size.rows: number of rows per singlebatch; default is 'auto' which means the connector will adjust thenumber of rows based on the amount of data in each row

spark.cassandra.output.batch.size.bytes: maximum total size of thebatch in bytes; defaults to 64 kB.

https://github.com/datastax/spark-cassandra-connector/blob/master/doc/5_saving.md

更重要的一点是,使用更大的批处理大小 (64kb) 并更改 cassandra.yaml 文件中的警告限制很可能会更好。

编辑:

最近我们发现,较大的批处理可能会导致某些 C* 配置不稳定,因此如果系统变得不稳定,请降低该值。

关于cassandra - DataStax 企业 : Spark Cassandra Batch Size,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27039398/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com