gpt4 book ai didi

python - Pandas 将值与具有过滤条件的前一行进行比较

转载 作者:太空宇宙 更新时间:2023-11-03 12:53:36 24 4
gpt4 key购买 nike

我有一个包含员工薪水信息的 DataFrame。大约有 900000 多行。

示例:

+----+-------------+---------------+----------+
| | table_num | name | salary |
|----+-------------+---------------+----------|
| 0 | 001234 | John Johnson | 1200 |
| 1 | 001234 | John Johnson | 1000 |
| 2 | 001235 | John Johnson | 1000 |
| 3 | 001235 | John Johnson | 1200 |
| 4 | 001235 | John Johnson | 1000 |
| 5 | 001235 | Steve Stevens | 1000 |
| 6 | 001236 | Steve Stevens | 1200 |
| 7 | 001236 | Steve Stevens | 1200 |
| 8 | 001236 | Steve Stevens | 1200 |
+----+-------------+---------------+----------+

数据类型:

table_num: string
name: string
salary: float

我需要添加一个列,其中包含有关增加\减少工资水平的信息。我正在使用 shift() 函数来比较行中的值。

主要问题在于过滤和迭代整个数据集中的所有唯一员工。

我的脚本大约需要 3 个半小时

如何更快地完成?

我的脚本:

# giving us only unique combination of 'table_num' and 'name'
# since there can be same 'table_num' for different 'name'
# and same names with different 'table_num' appears sometimes

names_df = df[['table_num', 'name']].drop_duplicates()

# then extracting particular name and table_num from Series
for i in range(len(names_df)): ### Bottleneck of whole script ###
t = names_df.iloc[i,[0,1]][0]
n = names_df.iloc[i,[0,1]][1]

# using shift() and lambda to check if there difference between two rows
diff_sal = (df[(df['table_num']==t)
& ((df['name']==n))]['salary'] - df[(df['table_num']==t)
& ((df['name']==n))]['salary'].shift(1)).apply(lambda x: 1 if x>0 else (-1 if x<0 else 0))
df.loc[diff_sal.index, 'inc'] = diff_sal.values

示例输入数据:

df = pd.DataFrame({'table_num': ['001234','001234','001235','001235','001235','001235','001236','001236','001236'], 
'name': ['John Johnson','John Johnson','John Johnson','John Johnson','John Johnson', 'Steve Stevens', 'Steve Stevens', 'Steve Stevens', 'Steve Stevens'],
'salary':[1200.,1000.,1000.,1200.,1000.,1000.,1200.,1200.,1200.]})

示例输出:

+----+-------------+---------------+----------+-------+
| | table_num | name | salary | inc |
|----+-------------+---------------+----------+-------|
| 0 | 001234 | John Johnson | 1200 | 0 |
| 1 | 001234 | John Johnson | 1000 | -1 |
| 2 | 001235 | John Johnson | 1000 | 0 |
| 3 | 001235 | John Johnson | 1200 | 1 |
| 4 | 001235 | John Johnson | 1000 | -1 |
| 5 | 001235 | Steve Stevens | 1000 | 0 |
| 6 | 001236 | Steve Stevens | 1200 | 0 |
| 7 | 001236 | Steve Stevens | 1200 | 0 |
| 8 | 001236 | Steve Stevens | 1200 | 0 |
+----+-------------+---------------+----------+-------+

最佳答案

使用groupby连同 diff :

df['inc'] = df.groupby(['table_num', 'name'])['salary'].diff().fillna(0.0)
df.loc[df['inc'] > 0.0, 'inc'] = 1.0
df.loc[df['inc'] < 0.0, 'inc'] = -1.0

关于python - Pandas 将值与具有过滤条件的前一行进行比较,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52072315/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com