gpt4 book ai didi

python - 为什么 pandas apply lambda 比这里的循环慢?

转载 作者:行者123 更新时间:2023-11-30 22:24:45 25 4
gpt4 key购买 nike

我有一个 pandas 数据框,我想根据是否满足某些条件进行过滤。我运行了一个循环和一个 .apply() 并使用 %%timeit 来测试速度。该数据集大约有 45000 行。 for循环的代码片段是:

%%timeit
qualified_actions = []
for row in all_actions.index:
if all_actions.ix[row,'Lower'] <= all_actions.ix[row, 'Mid'] <= all_actions.ix[row,'Upper']:
qualified_actions.append(True)
else:
qualified_actions.append(False)

1.44 s ± 3.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

对于 .apply() 来说是:

%%timeit
qualified_actions = all_actions.apply(lambda row: row['Lower'] <= row['Mid'] <= row['Upper'], axis=1)

6.71 s ± 54.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

我认为 .apply() 应该比循环 pandas 中的行要快得多。有人可以解释为什么在这种情况下速度会变慢吗?

最佳答案

apply 在底层使用循环,所以如果您需要 better performance最好和最快的方法是向量化的替代方法。

无循环,仅链 2 个条件向量化解决方案:

m1 = all_actions['Lower'] <= all_actions['Mid']
m2 = all_actions['Mid'] <= all_actions['Upper']
qualified_actions = m1 & m2

谢谢Jon Clements另一种解决方案:

all_actions.Mid.between(all_actions.Lower, all_actions.Upper)

时间:

np.random.seed(2017)
N = 45000
all_actions=pd.DataFrame(np.random.randint(50, size=(N,3)),columns=['Lower','Mid','Upper'])

#print (all_actions)
<小时/>
In [85]: %%timeit
...: qualified_actions = []
...: for row in all_actions.index:
...: if all_actions.ix[row,'Lower'] <= all_actions.ix[row, 'Mid'] <= all_actions.ix[row,'Upper']:
...: qualified_actions.append(True)
...: else:
...: qualified_actions.append(False)
...:
...:
__main__:259: DeprecationWarning:
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
1 loop, best of 3: 579 ms per loop

In [86]: %%timeit
...: (all_actions.apply(lambda row: row['Lower'] <= row['Mid'] <= row['Upper'], axis=1))
...:
1 loop, best of 3: 1.17 s per loop

In [87]: %%timeit
...: ((all_actions['Lower'] <= all_actions['Mid']) & (all_actions['Mid'] <= all_actions['Upper']))
...:
1000 loops, best of 3: 509 µs per loop


In [90]: %%timeit
...: (all_actions.Mid.between(all_actions.Lower, all_actions.Upper))
...:
1000 loops, best of 3: 520 µs per loop

关于python - 为什么 pandas apply lambda 比这里的循环慢?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47749018/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com