gpt4 book ai didi

python - Dask 数据帧 : Get row count?

转载 作者:行者123 更新时间:2023-12-02 02:24:14 31 4
gpt4 key购买 nike

简单的问题:我在 dask 中有一个数据框,包含大约 3 亿条记录。我需要知道数据框包含的确切行数。有没有简单的方法可以做到这一点?

当我尝试运行 dataframe.x.count().compute() 时,它看起来像是试图将整个数据加载到 RAM 中,但 RAM 中没有空间并且崩溃了。

最佳答案

# ensure small enough block size for the graph to fit in your memory
ddf = dask.dataframe.read_csv('*.csv', blocksize="10MB")
ddf.shape[0].compute()

来自documentation :

blocksize <str, int or None>Optional Number of bytes by which to cut uplarger files. Default value is computed based on available physicalmemory and the number of cores, up to a maximum of 64MB. Can be anumber like 64000000` or a string like ``"64MB". If None, a singleblock is used for each file.

关于python - Dask 数据帧 : Get row count?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49309523/

31 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com