gpt4 book ai didi

python - Pandas - 将数据从一列计算到另一列

转载 作者:太空宇宙 更新时间:2023-11-04 04:04:37 25 4
gpt4 key购买 nike

考虑以下数据框:

df = pd.read_json("""{"week":{"0":1,"1":1,"2":1,"3":1,"4":1,"5":1,"6":2,"7":2,"8":2,"9":2,"10":2,"11":2,"12":3,"13":3,"14":3,"15":3,"16":3,"17":3},"extra_hours":{"0":"01:00:00","1":"00:00:00","2":"01:00:00","3":"01:00:00","4":"00:00:00","5":"01:00:00","6":"01:00:00","7":"01:00:00","8":"01:00:00","9":"01:00:00","10":"00:00:00","11":"01:00:00","12":"01:00:00","13":"02:00:00","14":"01:00:00","15":"02:00:00","16":"00:00:00","17":"00:00:00"},"extra_hours_over":{"0":null,"1":null,"2":null,"3":null,"4":null,"5":null,"6":null,"7":null,"8":null,"9":null,"10":null,"11":null,"12":null,"13":null,"14":null,"15":null,"16":null,"17":null}}""")
df.tail(6)

week extra_hours extra_hours_over
12 3 01:00:00 NaN
13 3 02:00:00 NaN
14 3 01:00:00 NaN
15 3 02:00:00 NaN
16 3 00:00:00 NaN
17 3 00:00:00 NaN

现在,在每个星期,extra_hours 的最大数量是 4 小时,这意味着我必须从 extra_hour 列中减去 30 分钟的 block ,并填充 extra_hour_over 列,因此在每个星期中,extra_hour 的总和最大为 4h。

因此,给定示例数据框,可能的解决方案(第 3 周)如下:

    week  extra_hours  extra_hours_over
12 3 01:00:00 00:00:00
13 3 01:30:00 00:30:00
14 3 00:30:00 00:30:00
15 3 01:00:00 01:00:00
16 3 00:00:00 00:00:00
17 3 00:00:00 00:00:00

我需要汇总每周的总 extra_hours,检查哪些天超过了 4 小时,然后随机减去半小时的时间 block 。

实现此目标的最简单/最直接的方法是什么?

最佳答案

这里尝试一下您似乎要问的问题。这个想法很简单,尽管代码相当冗长:

1) 创建一些辅助变量(分钟、extra_minutes、本周总计)

2) 循环遍历一个临时数据集,该数据集仅包含当总和 > 240 分钟时。

3) 在循环中,使用 random.choice 选择一个时间,从中删除 30 分钟。

4) 将更改应用于分钟和额外分钟

代码:

df = pd.read_json("""{"week":{"0":1,"1":1,"2":1,"3":1,"4":1,"5":1,"6":2,"7":2,"8":2,"9":2,"10":2,"11":2,"12":3,"13":3,"14":3,"15":3,"16":3,"17":3},"extra_hours":{"0":"01:00:00","1":"00:00:00","2":"01:00:00","3":"01:00:00","4":"00:00:00","5":"01:00:00","6":"01:00:00","7":"01:00:00","8":"01:00:00","9":"01:00:00","10":"00:00:00","11":"01:00:00","12":"01:00:00","13":"02:00:00","14":"01:00:00","15":"02:00:00","16":"00:00:00","17":"00:00:00"},"extra_hours_over":{"0":null,"1":null,"2":null,"3":null,"4":null,"5":null,"6":null,"7":null,"8":null,"9":null,"10":null,"11":null,"12":null,"13":null,"14":null,"15":null,"16":null,"17":null}}""")

df['minutes'] = pd.DatetimeIndex(df['extra_hours']).hour * 60 + pd.DatetimeIndex(df['extra_hours']).minute
df['extra_minutes'] = 0

df['tot_time'] = df.groupby('week')['minutes'].transform('sum')

while not df[df['tot_time'] > 240].empty:
mask = df[(df['minutes']>=30)&(df['tot_time']>240)].groupby('week').apply(lambda x: np.random.choice(x.index)).values
df.loc[mask,'minutes'] -= 30
df.loc[mask,'extra_minutes'] += 30

df['tot_time'] = df.groupby('week')['minutes'].transform('sum')

df['extra_hours_over'] = df['extra_minutes'].apply(lambda x: pd.Timedelta(minutes=x))
df['extra_hours'] = df['minutes'].apply(lambda x: pd.Timedelta(minutes=x))
df.drop(['minutes','extra_minutes'], axis=1).tail(6)

Out[1]:
week extra_hours extra_hours_over tot_time
12 3 00:30:00 00:30:00 240
13 3 01:30:00 00:30:00 240
14 3 00:30:00 00:30:00 240
15 3 01:30:00 00:30:00 240
16 3 00:00:00 00:00:00 240
17 3 00:00:00 00:00:00 240

注意:因为我使用的是np.random.choice,所以同一个观察可以被pick两次,这会使那个观察变化超过30分钟。

关于python - Pandas - 将数据从一列计算到另一列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57581702/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com