gpt4 book ai didi

python - 组内连续行与创建说明相同的字符串之间的数据帧差异

转载 作者:行者123 更新时间:2023-12-05 06:59:54 25 4
gpt4 key购买 nike

数据框:

col1  col_entity col2
a a1 50
b b1 40
a a2 40
a a3 30
b b2 20
a a4 20
b b3 30
b b4 50

我需要根据 col1 对它们进行分组,并根据每组的 col2 将它们从高到低排序并找到连续行之间的差异,然后为字符串语句的不同组创建列。日期框架:

col1  col_entity col2   diff   col_statement
a a1 50 10 difference between a1 and a2 is 10
b a2 40 10 difference between a2 and a3 is 10
a a3 30 10 difference between a3 and a4 is 10
a a4 20 nan **will drop this row**
b b1 40 10 difference between b1 and b4 is 10
a b4 50 10 difference between b4 and b3 is 10
b b3 30 10 difference between b3 and b2 is 10
b b2 20 nan **will drop this row**

请帮我解决这个问题提前致谢

最佳答案

你可以做几个 np.where 语句:

  1. 使用 diff().abs() 获取一行与使用 .shift() 的下一行之间的绝对差异。
  2. 如果提取的字母字符在一行和下一行之间不匹配,则为 .dif() 返回 NaN
  3. col_statement 列中,使用 np.where()
  4. 根据 NaN 值有条件地根据其他列构建一个字符串

df['diff'] = np.where(df['col1'].str.extract('([a-z])') == df['col1'].shift(-1).str.extract('([a-z])'),
df['col_entity col2'].diff().abs().shift(-1), np.nan)
df['col_statement'] = np.where(df['diff'].isnull(),
'**will drop this row**',
'difference between' + ' ' + df['col1'] + ' and '
+ df['col1'].shift(-1) + ' is ' + df['diff'].astype(str))
df
Out[1]:
col1 col_entity col2 diff col_statement
a a1 50 10.0 difference between a1 and a2 is 10.0
b a2 40 10.0 difference between a2 and a3 is 10.0
a a3 30 10.0 difference between a3 and a4 is 10.0
a a4 20 NaN **will drop this row**
b b1 40 10.0 difference between b1 and b4 is 10.0
a b4 50 10.0 difference between b4 and b3 is 10.0
b b3 30 10.0 difference between b3 and b2 is 10.0
b b2 20 NaN **will drop this row**

关于python - 组内连续行与创建说明相同的字符串之间的数据帧差异,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64277200/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com