gpt4 book ai didi

python - Pandas :条件列创建

转载 作者:太空宇宙 更新时间:2023-11-04 08:29:26 25 4
gpt4 key购买 nike

我正在尝试根据 A 列和 B 列中的值创建 C 列,并满足以下条件:

if A < 5000: C = A * B
else: C = A

下面给出语法错误:

df['C'] = df.apply(lambda x (x['A'] * x['B)'] if x['A'] < 5000 else x = x['A']),axis=1)

我离你有多远?

最佳答案

使用矢量化 numpy.where :

df['C'] = np.where(df['A'] < 5000, df['A'] * df['B'], df['A'])

性能:

np.random.seed(2019)

N = 1000
data = np.asarray([np.random.rand(N).tolist(), list(range(N))]).T
df = pd.DataFrame(data, columns=['A', 'B'])

In [56]: %timeit df['C'] = np.where(df['A'] < 5000, df['A'] * df['B'], df['A'])
536 µs ± 47.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [57]: %timeit df['C'] = df.apply(lambda x: x.A * x.B if x.A > 0.5 else x.A, 1)
30.9 ms ± 597 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

N = 100000
data = np.asarray([np.random.rand(N).tolist(), list(range(N))]).T
df = pd.DataFrame(data, columns=['A', 'B'])

In [59]: %timeit df['C'] = np.where(df['A'] < 5000, df['A'] * df['B'], df['A'])
1.29 ms ± 23.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [60]: %timeit df['C'] = df.apply(lambda x: x.A * x.B if x.A > 0.5 else x.A, 1)
3.32 s ± 374 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

关于python - Pandas :条件列创建,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54124065/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com