gpt4 book ai didi

python - Pandas DataFrame子类的属性 setter

转载 作者:行者123 更新时间:2023-12-03 16:18:50 25 4
gpt4 key购买 nike

我正在尝试设置pd.DataFrame的子类,该子类在初始化时具有两个必需的参数(grouptimestamp_col)。我想对这些参数grouptimestamp_col运行验证,因此我对每个属性都有一个setter方法。在我尝试set_index()并获得TypeError: 'NoneType' object is not iterable之前,所有这些都有效。似乎没有参数传递给test_set_indextest_assignment_with_indexed_obj中的setter函数。如果我将if g == None: return添加到我的setter函数中,则可以通过测试用例,但认为这不是正确的解决方案。

如何为这些必需的参数实现属性验证?

下面是我的课:

import pandas as pd
import numpy as np


class HistDollarGains(pd.DataFrame):
@property
def _constructor(self):
return HistDollarGains._internal_ctor

_metadata = ["group", "timestamp_col", "_group", "_timestamp_col"]

@classmethod
def _internal_ctor(cls, *args, **kwargs):
kwargs["group"] = None
kwargs["timestamp_col"] = None
return cls(*args, **kwargs)

def __init__(
self,
data,
group,
timestamp_col,
index=None,
columns=None,
dtype=None,
copy=True,
):
super(HistDollarGains, self).__init__(
data=data, index=index, columns=columns, dtype=dtype, copy=copy
)

self.group = group
self.timestamp_col = timestamp_col

@property
def group(self):
return self._group

@group.setter
def group(self, g):
if g == None:
return

if isinstance(g, str):
group_list = [g]
else:
group_list = g

if not set(group_list).issubset(self.columns):
raise ValueError("Data does not contain " + '[' + ', '.join(group_list) + ']')
self._group = group_list

@property
def timestamp_col(self):
return self._timestamp_col

@timestamp_col.setter
def timestamp_col(self, t):
if t == None:
return
if not t in self.columns:
raise ValueError("Data does not contain " + '[' + t + ']')
self._timestamp_col = t

这是我的测试用例:
import pytest

import pandas as pd
import numpy as np

from myclass import *


@pytest.fixture(scope="module")
def sample():
samp = pd.DataFrame(
[
{"timestamp": "2020-01-01", "group": "a", "dollar_gains": 100},
{"timestamp": "2020-01-01", "group": "b", "dollar_gains": 100},
{"timestamp": "2020-01-01", "group": "c", "dollar_gains": 110},
{"timestamp": "2020-01-01", "group": "a", "dollar_gains": 110},
{"timestamp": "2020-01-01", "group": "b", "dollar_gains": 90},
{"timestamp": "2020-01-01", "group": "d", "dollar_gains": 100},
]
)

return samp

@pytest.fixture(scope="module")
def sample_obj(sample):
return HistDollarGains(sample, "group", "timestamp")

def test_constructor_without_args(sample):
with pytest.raises(TypeError):
HistDollarGains(sample)


def test_constructor_with_string_group(sample):
hist_dg = HistDollarGains(sample, "group", "timestamp")
assert hist_dg.group == ["group"]
assert hist_dg.timestamp_col == "timestamp"


def test_constructor_with_list_group(sample):
hist_dg = HistDollarGains(sample, ["group", "timestamp"], "timestamp")

def test_constructor_with_invalid_group(sample):
with pytest.raises(ValueError):
HistDollarGains(sample, "invalid_group", np.random.choice(sample.columns))

def test_constructor_with_invalid_timestamp(sample):
with pytest.raises(ValueError):
HistDollarGains(sample, np.random.choice(sample.columns), "invalid_timestamp")

def test_assignment_with_indexed_obj(sample_obj):
b = sample_obj.set_index(sample_obj.group + [sample_obj.timestamp_col])

def test_set_index(sample_obj):
# print(isinstance(a, pd.DataFrame))
assert sample_obj.set_index(sample_obj.group + [sample_obj.timestamp_col]).index.names == ['group', 'timestamp']

最佳答案

set_index()方法将在内部调用self.copy()来创建您的DataFrame对象的副本(请参见源代码here),在其中使用您的自定义构造函数方法_internal_ctor()来创建新对象(source)。请注意,self._constructor()self._internal_ctor()相同,后者是几乎所有pandas类的通用内部方法,用于在诸如深复制或切片之类的操作期间创建新实例。您的问题实际上源自此功能:

class HistDollarGains(pd.DataFrame):
...
@classmethod
def _internal_ctor(cls, *args, **kwargs):
kwargs["group"] = None
kwargs["timestamp_col"] = None
return cls(*args, **kwargs) # this is equivalent to calling
# HistDollarGains(data, group=None, timestamp_col=None)

我想您是从 the github issue复制了此代码。 kwargs["**"] = None行明确告诉构造函数将 None设置为 grouptimestamp_col。最后, setter /验证器将 None作为新值,并引发错误。

因此,应该为 grouptimestamp_col设置一个可接受的值。
    @classmethod
def _internal_ctor(cls, *args, **kwargs):
kwargs["group"] = []
kwargs["timestamp_col"] = 'timestamp' # or whatever name that makes your validator happy
return cls(*args, **kwargs)

然后,您可以删除验证器中的 if g == None: return行。

关于python - Pandas DataFrame子类的属性 setter ,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59655537/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com