gpt4 book ai didi

Pandas: Chained assignments [duplicate](熊猫:连锁赋值[重复])

转载 作者:bug小助手 更新时间:2023-10-25 21:30:43 27 4
gpt4 key购买 nike




I have been reading this link on "Returning a view versus a copy". I do not really get how the chained assignment concept in Pandas works and how the usage of .ix(), .iloc(), or .loc() affects it.

我一直在阅读这个链接,内容是“返回一个视图与一个副本”。我真的不明白Pandas中的链式赋值概念是如何工作的,以及使用.ix()、.iloc()或.loc()会对其产生怎样的影响。


I get the SettingWithCopyWarning warnings for the following lines of code, where data is a Panda dataframe and amount is a column (Series) name in that dataframe:

我收到以下代码行的SettingWithCopyWarning警告,其中数据是Panda数据帧,而Amount是该数据帧中的列(系列)名称:


data['amount'] = data['amount'].astype(float)

data["amount"].fillna(data.groupby("num")["amount"].transform("mean"), inplace=True)

data["amount"].fillna(mean_avg, inplace=True)

Looking at this code, is it obvious that I am doing something suboptimal? If so, can you let me know the replacement code lines?

看看这段代码,我是不是在做一些不太理想的事情呢?如果是这样的话,你能告诉我替换的代码行吗?


I am aware of the below warning and like to think that the warnings in my case are false positives:

我知道下面的警告,并愿意认为我的警告是误报:



The chained assignment warnings / exceptions are aiming to inform the
user of a possibly invalid assignment. There may be false positives;
situations where a chained assignment is inadvertantly reported.



EDIT : the code leading to the first copy warning error.

编辑:导致第一个复制警告错误的代码。


data['amount'] = data.apply(lambda row: function1(row,date,qty), axis=1) 
data['amount'] = data['amount'].astype(float)

def function1(row,date,qty):
try:
if(row['currency'] == 'A'):
result = row[qty]
else:
rate = lookup[lookup['Date']==row[date]][row['currency'] ]
result = float(rate) * float(row[qty])
return result
except ValueError: # generic exception clause
print "The current row causes an exception:"

更多回答
优秀答案推荐

The point of the SettingWithCopy is to warn the user that you may be doing something that will not update the original data frame as one might expect.

SettingWithCopy的目的是警告用户,您可能正在执行的操作不会像预期的那样更新原始数据框。



Here, data is a dataframe, possibly of a single dtype (or not). You are then taking a reference to this data['amount'] which is a Series, and updating it. This probably works in your case because you are returning the same dtype of data as existed.

在这里,数据是一个数据帧,可能是单一数据类型(也可能不是)。然后引用这个数据[‘Amount’],它是一个系列,并对其进行更新。这可能适用于您的情况,因为您返回的数据类型与现有数据类型相同。



However it could create a copy which updates a copy of data['amount'] which you would not see; Then you would be wondering why it is not updating.

然而,它可以创建一个副本来更新您看不到的数据[‘Amount’]的副本;然后您会奇怪为什么它没有更新。



Pandas returns a copy of an object in almost all method calls. The inplace operations are a convience operation which work, but in general are not clear that data is being modified and could potentially work on copies.

Pandas几乎在所有方法调用中都返回对象的副本。就地操作是一种方便的操作,可以工作,但通常不清楚数据是否被修改,可能会在副本上工作。



Much more clear to do this:

要做到这一点要清楚得多:



data['amount'] = data["amount"].fillna(data.groupby("num")["amount"].transform("mean"))

data["amount"] = data['amount'].fillna(mean_avg)


One further plus to working on copies. You can chain operations, this is not possible with inplace ones.

复制工作的另一个加分。你可以连锁操作,这在内部操作中是不可能的。



e.g.

例如:



data['amount'] = data['amount'].fillna(mean_avg)*2


And just an FYI. inplace operations are neither faster nor more memory efficient. my2c they should be banned. But too late on that API.

仅供参考。就地操作既没有更快,也没有更高效的内存。我的2c他们应该被禁止。但这一API为时已晚。



You can of course turn this off:

当然,您可以将其关闭:



pd.set_option('chained_assignment',None)


Pandas runs with the entire test suite with this set to raise (so we know if chaining is happening) on, FYI.

Pandas在整个测试套件中运行,并设置为raise(因此我们知道是否发生了链接),FYI。


更多回答

Thanks Jeff, so I should ideally remove the inplace parameters for the 2nd and 3rd warnings. Regarding the 1st one, i.e. data['amount'] = data['amount'].astype(float), what would be a replacement that does not produce the copy warning?

谢谢Jeff,所以理想情况下我应该删除第二个和第三个警告的原地参数。关于第一个选项,即data[‘mount’]=data[‘mount’].astype(Float),哪一个替代项不会产生复制警告?

you must be doing something before the astype assignment. can you show more code?

在打字作业之前,你一定在做些什么。你能展示更多的代码吗?

sure, I added the code to my question.

当然,我把代码加到我的问题里了。

can you show data.info() before this? you should have float64 dtypes already. secondarily, you don't need the apply, you can do something like: data[data['currency']!='A','amount']=data['qty']*data['rate']

您能在此之前显示data.info()吗?您应该已经有了Float64dtype。其次,您不需要应用,您可以执行如下操作:data[data[‘Currency’]!=‘A’,‘Amount’]=data[‘qty’]*data[‘rate’]

Thanks @Jeff for this solution: pd.set_option('chained_assignment',None) However I'm wondering if this is recommended since I'm always conservative in changing default warning settings...

感谢@Jeff提供的这个解决方案:pd.set_Option(‘CHAINED_ASSIGNMENT’,NONE)然而,我想知道这是否值得推荐,因为我在更改默认警告设置时总是很保守...

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com