gpt4 book ai didi

python - DASK:Typerrror:列分配不支持类型 numpy.ndarray 而 Pandas 工作正常

转载 作者:行者123 更新时间:2023-12-04 11:24:02 27 4
gpt4 key购买 nike

我正在使用 Dask 读取 10m 行 csv+ 并执行一些计算。到目前为止,它被证明比 Pandas 快 10 倍。

我在下面有一段代码,当与 Pandas 一起使用时可以正常工作,但与 dask 一起使用时会引发类型错误。 我不确定如何克服打字错误 .似乎在使用 dask 时,select 函数将一个数组传递回数据框/列,但在使用 Pandas 时却没有?但我不想将整个事情切换回 Pandas 并失去 10 倍的性能优势。

这个答案是 Stack Overflow 上其他一些人的一些帮助的结果,但是我认为这个问题与最初的问题相差甚远,以至于完全不同。代码如下。

Pandas :作品
不包括 AndHeathSolRadFact 的时间:40 秒

import pandas as pd
import numpy as np

from timeit import default_timer as timer
start = timer()
df = pd.read_csv(r'C:\Users\i5-Desktop\Downloads\Weathergrids.csv')
df['DateTime'] = pd.to_datetime(df['Date'], format='%Y-%d-%m %H:%M')
df['Month'] = df['DateTime'].dt.month
df['Grass_FMC'] = (97.7+4.06*df['RH'])/(df['Temperature']+6)-0.00854*df['RH']+3000/df['Curing']-30


df["AndHeathSolRadFact"] = np.select(
[
(df['Month'].between(8,12)),
(df['Month'].between(1,2) & df['CloudCover']>30)
], #list of conditions
[1, 1], #list of results
default=0) #default if no match



print(df.head())
#print(ddf.tail())
end = timer()
print(end - start)


DASK: splinter
不包括 AndHeathSolRadFact 的时间:4 秒
import dask.dataframe as dd
import dask.multiprocessing
import dask.threaded
import pandas as pd
import numpy as np

# Dataframes implement the Pandas API
import dask.dataframe as dd



from timeit import default_timer as timer
start = timer()
ddf = dd.read_csv(r'C:\Users\i5-Desktop\Downloads\Weathergrids.csv')
ddf['DateTime'] = dd.to_datetime(ddf['Date'], format='%Y-%d-%m %H:%M')
ddf['Month'] = ddf['DateTime'].dt.month
ddf['Grass_FMC'] = (97.7+4.06*ddf['RH'])/(ddf['Temperature']+6)-0.00854*ddf['RH']+3000/ddf['Curing']-30



ddf["AndHeathSolRadFact"] = np.select(
[
(ddf['Month'].between(8,12)),
(ddf['Month'].between(1,2) & ddf['CloudCover']>30)
], #list of conditions
[1, 1], #list of results
default=0) #default if no match



print(ddf.head())
#print(ddf.tail())
end = timer()
print(end - start)



错误
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-50-86c08f38bce6> in <module>
29 ], #list of conditions
30 [1, 1], #list of results
---> 31 default=0) #default if no match
32
33

~\Anaconda3\lib\site-packages\dask\dataframe\core.py in __setitem__(self, key, value)
3276 df = self.assign(**{k: value for k in key})
3277 else:
-> 3278 df = self.assign(**{key: value})
3279
3280 self.dask = df.dask

~\Anaconda3\lib\site-packages\dask\dataframe\core.py in assign(self, **kwargs)
3510 raise TypeError(
3511 "Column assignment doesn't support type "
-> 3512 "{0}".format(typename(type(v)))
3513 )
3514 if callable(v):

TypeError: Column assignment doesn't support type numpy.ndarray

示例天气网格 CSV
Location,Date,Temperature,RH,WindDir,WindSpeed,DroughtFactor,Curing,CloudCover
1075,2019-20-09 04:00,6.8,99.3,143.9,5.6,10.0,93.0,1.0
1075,2019-20-09 05:00,6.4,100.0,93.6,7.2,10.0,93.0,1.0
1075,2019-20-09 06:00,6.7,99.3,130.3,6.9,10.0,93.0,1.0
1075,2019-20-09 07:00,8.6,95.4,68.5,6.3,10.0,93.0,1.0
1075,2019-20-09 08:00,12.2,76.0,86.4,6.1,10.0,93.0,1.0

最佳答案

我刚刚遇到了类似的问题,我能够通过转换 ndarray 来让它工作。成一个 Dask 数组。我还必须确保 ndarray 之间匹配的分区数和 Dask 数据帧。

关于python - DASK:Typerrror:列分配不支持类型 numpy.ndarray 而 Pandas 工作正常,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58254236/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com