python - 内存数据帧的昂贵计算-6ren

python - 内存数据帧的昂贵计算

转载作者：太空宇宙更新时间：2023-11-03 15:17:34

24

4

我有一个昂贵的计算，在 pandas DataFrames 上运行。我想记住它。我想弄清楚，我可以为此使用什么。

In [16]: id(pd.DataFrame({1: [1,2,3]}))
Out[16]: 52015696

In [17]: id(pd.DataFrame({1: [1,2,3]}))
Out[17]: 52015504

In [18]: id(pd.DataFrame({1: [1,2,3]}))
Out[18]: 52015504

In [19]: id(pd.DataFrame({1: [1,2,3]})) # different results, won't work for my case
Out[19]: 52015440

In [20]: hash(pd.DataFrame({1: [1,2,3]})) # throws
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-20-3bddc0b20163> in <module>()
----> 1 hash(pd.DataFrame({1: [1,2,3]}))

/usr/local/lib/python2.7/dist-packages/pandas/core/generic.pyc in __hash__(self)
     52     def __hash__(self):
     53         raise TypeError('{0!r} objects are mutable, thus they cannot be'
---> 54                               ' hashed'.format(self.__class__.__name__))
     55 
     56     def __unicode__(self):

TypeError: 'DataFrame' objects are mutable, thus they cannot be hashed

是否可以做我想做的事，因为我确定我不会改变被记忆化的 DataFrame？

最佳答案

如果您不介意比较索引或列名，您可以将 DataFrame 转换为元组:

>>> df1 = pd.DataFrame({1: [1,2,3]})
>>> df2 = pd.DataFrame({1: [1,2,3]})
>>> hash(tuple(tuple(x) for x in df1.values)) == hash(tuple(tuple(x) for x in df2.values))
True
>>> id(df1) == id(df2)
False

你也可以使用 map 函数代替生成器:

tuple(map(tuple, df1.values))

如果你也需要比较索引，你可以将它添加为一个列。您还可以通过创建 namedtuple 来保留列名:

>>> from collections import namedtuple
>>> from pprint import pprint
>>> df = pd.DataFrame({1: [1,2,3], 2:[3,4,5]})
>>> df['index'] = df.index
>>> df
   1  2  index
0  1  3      0
1  2  4      1
2  3  5      2
>>>
>>> dfr = namedtuple('row', map(lambda x: 'col_' + str(x), df.columns))
>>> res = tuple(map(lambda x: dfr(*x), df.values))
>>> pprint(res)
(row(col_1=1, col_2=3, col_index=0),
 row(col_1=2, col_2=4, col_index=1),
 row(col_1=3, col_2=5, col_index=2))

希望对您有所帮助。

关于python - 内存数据帧的昂贵计算，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/19786040/

24

4

0

文章推荐： c# - 如何在RDLC中使用字体 "Code 128"

文章推荐： python - Conda 更新除 python 之外的软件包

文章推荐： c# - 获取特定列的 Entity Framework Lambda 表达式

文章推荐： python - 尝试从字典中查找以键名称开头的任何文件

mongodb - mongodb中哪些操作便宜/昂贵？
我正在阅读 MongoDB，并试图了解它的最佳用途。我没有看到明确答案的一个问题是哪些操作便宜或昂贵，以及在什么条件下。你能帮忙澄清一下吗？谢谢。最佳答案人们经常声称 mongodb 的写入速
iphone - 为什么自动释放对于 iPhone 应用程序特别危险/昂贵？
我正在寻找一个主要来源(或一个非常好的解释)来支持在为 iPhone 编写软件时使用 autorelease 是危险的或过于昂贵的说法。许多开发者都提出了这种说法，我什至听说 Apple 不推荐它，
c# - 为什么 DateTime.Now DateTime.UtcNow 如此缓慢/昂贵
我意识到这离微优化领域太远了，但我很想知道为什么调用 DateTime.Now 和 DateTime.UtcNow 如此“昂贵”。我有一个示例程序，它运行几个场景来做一些“工作”(添加到一个计数器)并

首页

博学

6Ren·AI

商城

python - 内存数据帧的昂贵计算