gpt4 book ai didi

python - 了解有关 DataFrame 操作的 Dask 分布式行为

转载 作者:行者123 更新时间:2023-12-01 02:52:46 24 4
gpt4 key购买 nike

我想更好地了解 dask.distributed 的工作原理。我有一个简单的 csv,我读入 Dask 数据帧,如下所示。此操作执行良好并返回一些表示数据帧长度的整数值,这是我期望的行为。

import dask.dataframe as dd
gdf = dd.read_csv(filepath)
len(gdf)
# returns some int value

但是一旦我从 dask.distributed 引入 Client 实例,我就会收到以下错误:

distributed.utils - ERROR - 'LocalFileSystem' object has no attribute 'cwd'

这是一个示例代码块:

from dask.distributed import Client
import dask.dataframe as dd
client_db = Client(remote_addr)
gdf = dd.read_csv(filepath)
len(gdf)
# throws the above error

我很困惑 - 一旦 Client 实例化后,它是否会“将自身注入(inject)”到所有 Dask 操作中。我想我需要执行类似 gdf ​​= client_db.persist(gdf) 的操作来要求 Client 连接来管理该数据帧上的操作。

如果能提供一些有关此处发生的情况的背景信息,我们将不胜感激!我可以从回溯中看到它与 Tornado 有关,Tornado 是 Py 中的一个 Web 框架,允许 Web 套接字、长轮询等。我认为它正在尝试存储某些内容......某处......但我很熟悉在这里下车。

如果需要,回溯:

Traceback (most recent call last):
File "/.../geopandas_opt/venv/lib/python3.6/site-packages/distributed/utils.py", line 223, in f
result[0] = yield make_coro()
File "/.../geopandas_opt/venv/lib/python3.6/site-packages/tornado/gen.py", line 1015, in run
value = future.result()
File "/.../geopandas_opt/venv/lib/python3.6/site-packages/tornado/concurrent.py", line 237, in result
raise_exc_info(self._exc_info)
File "<string>", line 3, in raise_exc_info
File "/.../geopandas_opt/venv/lib/python3.6/site-packages/tornado/gen.py", line 1021, in run
yielded = self.gen.throw(*exc_info)
File "/.../geopandas_opt/venv/lib/python3.6/site-packages/distributed/client.py", line 1156, in _gather
traceback)
File "/.../geopandas_opt/venv/lib/python3.6/site-packages/six.py", line 685, in reraise
raise value.with_traceback(tb)
File "/usr/local/lib/python3.5/site-packages/dask/bytes/core.py", line 212, in read_block_from_file
File "/usr/local/lib/python3.5/site-packages/dask/bytes/core.py", line 314, in __enter__
File "/usr/local/lib/python3.5/site-packages/dask/bytes/local.py", line 64, in open
File "/usr/local/lib/python3.5/site-packages/dask/bytes/local.py", line 36, in _trim_filename
AttributeError: 'LocalFileSystem' object has no attribute 'cwd'

最佳答案

是的,当您创建客户端时,它会将自己注册为默认的全局调度程序。您可以使用set_as_default=关键字来避免这种行为

client = Client(..., set_as_default=False)

关于您遇到的异常,我怀疑这是版本不匹配。您可能需要使用 condapip 进行升级。

conda install dask distributed

pip install dask distributed

关于python - 了解有关 DataFrame 操作的 Dask 分布式行为,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44555659/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com