gpt4 book ai didi

python - Pandas :依赖于另一个值的列

转载 作者:太空宇宙 更新时间:2023-11-04 08:00:12 26 4
gpt4 key购买 nike

我有一个如下所示的 Pandas 数据框:

   col1  col2  col3  col4
0 5 1 11 9
1 2 3 14 7
2 6 5 54 8
3 11 2 67 44
4 23 8 2 23
5 1 5 9 8
6 9 7 45 71

我想制作第 5 列 (col5),它依赖于 col1 的值并采用其他列之一的值。

这是我想要的样子,但遇到了一些问题。

if col1 < 3:
col5 == col2
elif col1 < 7 & col1 >= 3:
col5 == col3
elif col1 >= 7 & col1 < 50:
col5 == col4

这将产生以下数据框:

   col1  col2  col3  col4  col5
0 5 1 11 9 11
1 2 3 14 7 3
2 6 5 54 8 54
3 11 2 67 44 44
4 23 8 2 23 23
5 97 5 9 8 8
6 9 7 45 71 71

提前致谢,如果您有任何问题,请告诉我

最佳答案

您可以使用多个 numpy.where , 如果没有条件为 True (col1 => 50) 添加最后一个值 1:

df['col5'] = np.where(df['col1'] <3, df['col2'], 
np.where((df['col1'] <7) & (df['col1'] >=3 ), df['col3'],
np.where((df['col1'] >=7) & (df['col1'] <50 ), df['col4'], 1)))
print (df)
col1 col2 col3 col4 col5
0 5 1 11 9 11
1 2 3 14 7 3
2 6 5 54 8 54
3 11 2 67 44 44
4 23 8 2 23 23
5 97 5 9 8 1
6 9 7 45 71 71

按更改的值进行编辑:

如果所有值 >=7 都需要 col4:

df['col5'] = np.where(df['col1'] <3, df['col2'], 
np.where((df['col1'] <7) & (df['col1'] >=3 ), df['col3'], df['col4']))
print (df)
col1 col2 col3 col4 col5
0 5 1 11 9 11
1 2 3 14 7 3
2 6 5 54 8 54
3 11 2 67 44 44
4 23 8 2 23 23
5 97 5 9 8 8
6 9 7 45 71 71

len(df)=7000 中的时间:

In [441]: %timeit df['col51'] = np.where(df['col1'] <3, df['col2'], np.where((df['col1'] <7) & (df['col1'] >=3 ), df['col3'], df['col4']))
The slowest run took 5.31 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 1.25 ms per loop

In [442]: %timeit df["col52"] = df.apply(lambda x: col52(x), axis=1)
1 loop, best of 3: 552 ms per loop

In [443]: %timeit df["col53"] = [col53(c1,c2,c3,c4) for c1,c2,c3,c4 in zip(df.col1,df.col2,df.col3,df.col4)]
100 loops, best of 3: 9.87 ms per loop

时间 len(df)=70k

In [446]: %timeit df['col51'] = np.where(df['col1'] <3, df['col2'], np.where((df['col1'] <7) & (df['col1'] >=3 ), df['col3'], df['col4']))
100 loops, best of 3: 2.5 ms per loop

In [447]: %timeit df["col52"] = df.apply(lambda x: col52(x), axis=1)
1 loop, best of 3: 5.36 s per loop

In [448]: %timeit df["col53"] = [col53(c1,c2,c3,c4) for c1,c2,c3,c4 in zip(df.col1,df.col2,df.col3,df.col4)]
10 loops, best of 3: 96.3 ms per loop

计时代码:

#change 1000 to 10000 for 70k
df = pd.concat([df]*1000).reset_index(drop=True)

def col52(x):
if x["col1"] < 3:
return x["col2"]
elif x["col1"] >=3 and x["col1"] < 7:
return x["col3"]
elif x["col1"] >= 7 and x["col1"] < 50:
return x["col4"]
def col53(c1,c2,c3,c4):
if c1 < 3:
return c2
elif c1 >=3 and c1 < 7:
return c3
elif c1>= 7 and c1< 50:
return c4

df['col51'] = np.where(df['col1'] <3, df['col2'], np.where((df['col1'] <7) & (df['col1'] >=3 ), df['col3'], df['col4']))
df["col52"] = df.apply(lambda x: col52(x), axis=1)
df["col53"] = [col53(c1,c2,c3,c4) for c1,c2,c3,c4 in zip(df.col1,df.col2,df.col3,df.col4)]
print (df)

关于python - Pandas :依赖于另一个值的列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41556020/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com