gpt4 book ai didi

pandas - 如何将 lambda 函数正确应用到 Pandas 数据框列中

转载 作者:行者123 更新时间:2023-12-03 11:52:56 26 4
gpt4 key购买 nike

我有一个 Pandas 数据框,sample ,其中一列名为 PR我正在应用 lambda 函数,如下所示:

sample['PR'] = sample['PR'].apply(lambda x: NaN if x < 90)

然后我收到以下语法错误消息:
sample['PR'] = sample['PR'].apply(lambda x: NaN if x < 90)
^
SyntaxError: invalid syntax

我究竟做错了什么?

最佳答案

您需要 mask :

sample['PR'] = sample['PR'].mask(sample['PR'] < 90, np.nan)

另一个解决方案 loc boolean indexing :
sample.loc[sample['PR'] < 90, 'PR'] = np.nan

样本:
import pandas as pd
import numpy as np

sample = pd.DataFrame({'PR':[10,100,40] })
print (sample)
PR
0 10
1 100
2 40

sample['PR'] = sample['PR'].mask(sample['PR'] < 90, np.nan)
print (sample)
PR
0 NaN
1 100.0
2 NaN
sample.loc[sample['PR'] < 90, 'PR'] = np.nan
print (sample)
PR
0 NaN
1 100.0
2 NaN

编辑:

使用 apply 的解决方案:
sample['PR'] = sample['PR'].apply(lambda x: np.nan if x < 90 else x)

计时 len(df)=300k :
sample = pd.concat([sample]*100000).reset_index(drop=True)

In [853]: %timeit sample['PR'].apply(lambda x: np.nan if x < 90 else x)
10 loops, best of 3: 102 ms per loop

In [854]: %timeit sample['PR'].mask(sample['PR'] < 90, np.nan)
The slowest run took 4.28 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 3.71 ms per loop

关于pandas - 如何将 lambda 函数正确应用到 Pandas 数据框列中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37428218/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com