gpt4 book ai didi

python - 有条件地选择和设置列值

转载 作者:行者123 更新时间:2023-11-28 22:34:49 24 4
gpt4 key购买 nike

我有两个数据框。我需要根据单位和日期的值将 df2.faults 列的值复制到 df1.faults 列。

两个数据帧的长度不同。与 df2 相反,df1 可能有 (unit,date) 的重复项。模仿我的数据集的示例:

    df1 = pd.DataFrame({'unit': ['x']*5+['y']*6 + ['z']*5,
'date': ['2016-06-14', '2016-06-14', '2016-06-15', '2016-06-16', '2016-06-16',
'2016-06-14', '2016-06-14', '2016-06-15', '2016-06-15', '2016-06-16', '2016-06-16',
'2016-06-15', '2016-06-16', '2016-06-16', '2016-06-17', '2016-06-17'],
'faults': None})
df1.date = pd.to_datetime(df1.date)
print(df1)
date faults unit
0 2016-06-14 None x
1 2016-06-14 None x
2 2016-06-15 None x
3 2016-06-16 None x
4 2016-06-16 None x
5 2016-06-14 None y
6 2016-06-14 None y
7 2016-06-15 None y
8 2016-06-15 None y
9 2016-06-16 None y
10 2016-06-16 None y
11 2016-06-15 None z
12 2016-06-16 None z
13 2016-06-16 None z
14 2016-06-17 None z
15 2016-06-17 None z

df2 = pd.DataFrame({'unit': ['x']*3+['y']*3 + ['z']*3,
'date': ['2016-06-14', '2016-06-15', '2016-06-16',
'2016-06-14', '2016-06-15', '2016-06-16',
'2016-06-15', '2016-06-16', '2016-06-17'],
'faults': [76, 12, 30, 45, 23, 25, 10, 26, 43]})
df2.date = pd.to_datetime(df2.date)
print(df2)
date faults unit
0 2016-06-14 76 x
1 2016-06-15 12 x
2 2016-06-16 30 x
3 2016-06-14 45 y
4 2016-06-15 23 y
5 2016-06-16 25 y
6 2016-06-15 10 z
7 2016-06-16 26 z
8 2016-06-17 43 z

使用嵌套循环所需的输出:

   for u in pd.unique(df2.unit):
for d in pd.unique(df2[df2.unit == u].date):
df1.ix[(df1.unit == u)&(df1.date == d) ,'faults'] = int(df2[(df2.unit == u)&(df2.date == d)]['faults'])
print(df1)
date faults unit
0 2016-06-14 76 x
1 2016-06-14 76 x
2 2016-06-15 12 x
3 2016-06-16 30 x
4 2016-06-16 30 x
5 2016-06-14 45 y
6 2016-06-14 45 y
7 2016-06-15 23 y
8 2016-06-15 23 y
9 2016-06-16 25 y
10 2016-06-16 25 y
11 2016-06-15 10 z
12 2016-06-16 26 z
13 2016-06-16 26 z
14 2016-06-17 43 z
15 2016-06-17 43 z

我想不出一个有效的方法!列表理解,条件索引,......?我错过了什么吗?

谢谢!

更新

单循环解决方案是

for index, row in df2.iterrows():   
df1.ix[(df1.unit == row['unit'])&(df1.date == row['date']) ,'faults'] = row['faults']

还有更有效的解决方案吗?我的数据集相对较大,我想完全避免循环。

最佳答案

简单,使用左合并:

df1 = pd.merge(df1,df2,how='left',on=['date','unit'])
df1 =
date faults_x unit faults_y
0 2016-06-14 None x 76
1 2016-06-14 None x 76
2 2016-06-15 None x 12
3 2016-06-16 None x 30
4 2016-06-16 None x 30
5 2016-06-14 None y 45
6 2016-06-14 None y 45
7 2016-06-15 None y 23
8 2016-06-15 None y 23
9 2016-06-16 None y 25
10 2016-06-16 None y 25
11 2016-06-15 None z 10
12 2016-06-16 None z 26
13 2016-06-16 None z 26
14 2016-06-17 None z 43
15 2016-06-17 None z 43

# Some Bookkeeping
df1 = df1.drop('faults_x',1)
df1.rename(columns={'faults_y':'faults'})

# Final Output
df1 =
date unit faults
0 2016-06-14 x 76
1 2016-06-14 x 76
2 2016-06-15 x 12
3 2016-06-16 x 30
4 2016-06-16 x 30
5 2016-06-14 y 45
6 2016-06-14 y 45
7 2016-06-15 y 23
8 2016-06-15 y 23
9 2016-06-16 y 25
10 2016-06-16 y 25
11 2016-06-15 z 10
12 2016-06-16 z 26
13 2016-06-16 z 26
14 2016-06-17 z 43
15 2016-06-17 z 43

记住你的加入,你会没事的!! :)

如果您想一次性完成:

df1 = pd.merge(df1.drop('faults',1),df2,how='left',on=['date','unit'])

关于python - 有条件地选择和设置列值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38729313/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com