gpt4 book ai didi

python - 子类化 Pandas DataFrame,更新?

转载 作者:太空宇宙 更新时间:2023-11-03 12:54:37 25 4
gpt4 key购买 nike

继承还是不继承?

有关 Pandas 子类化问题的最新消息是什么? (大多数其他线程都是 3-4 岁的)。

我希望做一些像...

import pandas as pd

class SomeData(pd.DataFrame):
# Methods
pass

ClsInstance = SomeData()

# Create a new column on ClsInstance?

最佳答案

我就是这样做的。我遵循了发现的建议:

下面的例子只展示了构造pandas.DataFrame的新子类的用法。如果您遵循我第一个链接中的建议,您可以考虑对 pandas.Series 进行子类化,以考虑对您的 pandas.DataFrame 子类进行单维切片。

定义SomeData

import pandas as pd
import numpy as np

class SomeData(pd.DataFrame):
# This class variable tells Pandas the name of the attributes
# that are to be ported over to derivative DataFrames. There
# is a method named `__finalize__` that grabs these attributes
# and assigns them to newly created `SomeData`
_metadata = ['my_attr']

@property
def _constructor(self):
"""This is the key to letting Pandas know how to keep
derivative `SomeData` the same type as yours. It should
be enough to return the name of the Class. However, in
some cases, `__finalize__` is not called and `my_attr` is
not carried over. We can fix that by constructing a callable
that makes sure to call `__finlaize__` every time."""
def _c(*args, **kwargs):
return SomeData(*args, **kwargs).__finalize__(self)
return _c

def __init__(self, *args, **kwargs):
# grab the keyword argument that is supposed to be my_attr
self.my_attr = kwargs.pop('my_attr', None)
super().__init__(*args, **kwargs)

def my_method(self, other):
return self * np.sign(self - other)

演示

mydata = SomeData(dict(A=[1, 2, 3], B=[4, 5, 6]), my_attr='an attr')

print(mydata, type(mydata), mydata.my_attr, sep='\n' * 2)

A B
0 1 4
1 2 5
2 3 6

<class '__main__.SomeData'>

an attr
newdata = mydata.mul(2)

print(newdata, type(newdata), newdata.my_attr, sep='\n' * 2)

A B
0 2 8
1 4 10
2 6 12

<class '__main__.SomeData'>

an attr
newerdata = mydata.my_method(newdata)

print(newerdata, type(newerdata), newerdata.my_attr, sep='\n' * 2)

A B
0 -1 -4
1 -2 -5
2 -3 -6

<class '__main__.SomeData'>

an attr

陷阱

这对方法 pd.DataFrame.equals 造成了影响

newerdata.equals(newdata)  # Should be `False`
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-304-866170ab179e> in <module>()
----> 1 newerdata.equals(newdata)

~/anaconda3/envs/3.6.ml/lib/python3.6/site-packages/pandas/core/generic.py in equals(self, other)
1034 the same location are considered equal.
1035 """
-> 1036 if not isinstance(other, self._constructor):
1037 return False
1038 return self._data.equals(other._data)

TypeError: isinstance() arg 2 must be a type or tuple of types

实际情况是此方法期望在 _constructor 属性中找到类型为 type 的对象。相反,它找到了我放置在那里的可调用对象,以解决我遇到的 __finalize__ 问题。

解决方法

在您的类定义中使用以下内容覆盖 equals 方法。

    def equals(self, other):
try:
pd.testing.assert_frame_equal(self, other)
return True
except AssertionError:
return False

newerdata.equals(newdata) # Should be `False`

False

关于python - 子类化 Pandas DataFrame,更新?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47466255/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com