gpt4 book ai didi

python - 重新索引此轮类时间表的有效 Pandas 方法是什么?

转载 作者:行者123 更新时间:2023-12-04 15:38:31 25 4
gpt4 key购买 nike

我有一个 Pandas 数据框,表示一整年的轮类时间表,如下所示:


January 2019 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Shift A 1 1 1 0 0 0 2 2 0 0 1 1 1 1 0 2 2 0 0 0 0 0 0 0 2 2 2 0 1 1 1
Shift B 0 2 2 0 0 0 0 0 0 0 2 2 2 0 1 1 1 0 0 0 2 2 0 0 1 1 1 1 0 2 2
Shift C 0 0 0 2 2 2 0 1 1 1 0 0 0 2 2 0 0 1 1 1 1 0 2 2 0 0 0 0 0 0 0
Shift D 2 0 0 1 1 1 1 0 2 2 0 0 0 0 0 0 0 2 2 2 0 1 1 1 0 0 0 2 2 0 0
February 2019 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 nan nan nan
Shift A 0 0 0 2 2 0 0 1 1 1 1 0 2 2 0 0 0 0 0 0 0 2 2 2 0 1 1 1 nan nan nan
Shift B 0 0 0 0 0 0 0 2 2 2 0 1 1 1 0 0 0 2 2 0 0 1 1 1 1 0 2 2 nan nan nan
Shift C 2 2 2 0 1 1 1 0 0 0 2 2 0 0 1 1 1 1 0 2 2 0 0 0 0 0 0 0 nan nan nan
Shift D 1 1 1 1 0 2 2 0 0 0 0 0 0 0 2 2 2 0 1 1 1 0 0 0 2 2 0 0 nan nan nan


其中 1 代表白类(06:00 - 18:00),2 代表夜类(18:00 - 06:00),0 可以忽略。只有一个类次的团队会在给定的时间段内工作。

我需要一种格式的数据,其中数据由当前工作类次的日期时间戳记索引,例如:
             DateTime Shift
0 2019-01-01 06:00:00 A
1 2019-01-01 18:00:00 D
2 2019-01-02 06:00:00 A
3 2019-01-02 18:00:00 B
4 2019-01-03 06:00:00 A
5 2019-01-03 18:00:00 B
.
.
.

重新索引数据以实现此目的的最有效的 Pandas 方法是什么,即避免 for 循环?

最佳答案

用:

#get first column by position
first = df.iloc[:, 0]
#convert column to datetimes with missing values for no datetimes values
dates = pd.to_datetime(first, errors='coerce')
#mask for data row
mask = dates.isna()
#forward filling missing values and replace first NaNs by first column name
df.index = dates.ffill().fillna(pd.to_datetime(first.name))
#filter out rows with datetimes in first column, add first column to index
df = df[mask.values].set_index(first.name, append=True)
#convert columns names to timedeltas in days, first is 0 days
df.columns = pd.to_timedelta(df.columns.astype(int) - 1, unit='D')
#dictionary for map 1, 2 values
mapp = {1: pd.Timedelta('06:00:00'), 2:pd.Timedelta('18:00:00')}
#remove 0 rows with convert to NaN by mask and reshape by stack
#map by dict and convert MultiIndex to columns
df = (df.mask(df == 0)
.stack()
.map(mapp)
.rename_axis(('Datetime','Shift', 'day'))
.reset_index(name='td')
)
#add days to hours and add to Datetime
df['Datetime'] += (df.pop('td') + df.pop('day'))
#sorting ans create default index
df = df.sort_values(['Datetime','Shift']).reset_index(drop=True)
print (df)
Datetime Shift
0 2019-01-01 06:00:00 Shift A
1 2019-01-01 18:00:00 Shift D
2 2019-01-02 06:00:00 Shift A
3 2019-01-02 18:00:00 Shift B
4 2019-01-03 06:00:00 Shift A
.. ... ...
113 2019-02-26 18:00:00 Shift D
114 2019-02-27 06:00:00 Shift A
115 2019-02-27 18:00:00 Shift B
116 2019-02-28 06:00:00 Shift A
117 2019-02-28 18:00:00 Shift B

[118 rows x 2 columns]

关于python - 重新索引此轮类时间表的有效 Pandas 方法是什么?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58340866/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com