gpt4 book ai didi

python - 使用 Python 从时间序列创建横截面数据框

转载 作者:太空宇宙 更新时间:2023-11-03 14:37:54 28 4
gpt4 key购买 nike

假设我们有一个按分钟索引的时间序列,如下所示:

df=

Time (HH:MM)     Value
01/01/2014 00:00 1
01/01/2014 00:01 2
01/01/2014 00:02 3
01/01/2014 00:03 4
...
01/08/2014 00:00 5000
...

我希望按周对数据集进行“分组”,如下所示:

df2=

Week  Val1 Val2 Val3 Val4 ...
1 1 2 3 4 ...
2 5000 ...
3
4
...

换句话说,第 1 周 (01/01/2014-01/08/2014) 中的每个 1 分钟观察都表示为 df2 中的一列。 (每周应该有 10,080 分钟/栏)。

我尝试了一些函数,包括 groupby(),但大多数函数似乎都是聚合数据,而不是将其拆分为我要查找的各个列。

编辑:它不一定必须采用数据帧格式,但我将其用于输入为数周的函数。类似于尝试创建每周值的直方图。

最佳答案

您需要weekofyear + cumcount计算新列名称的数量,然后按 set_index reshape 形状与 unstack :

1。如果 dfDataFrame 并且 Time (HH:MM) 是列的解决方案:

print (type(df))
<class 'pandas.core.frame.DataFrame'>

print (df.columns)
Index(['Time (HH:MM)', 'Value'], dtype='object')

weeks = pd.to_datetime(df['Time (HH:MM)']).dt.weekofyear.rename('Week')
countweeks = df.groupby(weeks).cumcount() + 1
df = df.set_index([weeks, countweeks])['Value'].unstack().add_prefix('Val')
print (df)
Val1 Val2 Val3 Val4
Week
1 1.0 2.0 3.0 4.0
2 5000.0 NaN NaN NaN

另一个解决方案 pivot :

weeks = pd.to_datetime(df['Time (HH:MM)']).dt.weekofyear.rename('Week')
countweeks = df.groupby(weeks).cumcount().add(1).astype(str).radd('Val')
df = pd.pivot(index=weeks, columns=countweeks, values=df['Value']).fi
print (df)
Val1 Val2 Val3 Val4
Week
1 1.0 2.0 3.0 4.0
2 5000.0 NaN NaN NaN
<小时/>

如果需要用 0 替换 NaN,请将参数 fill_value=0 添加到 unstack:

weeks = pd.to_datetime(df['Time (HH:MM)']).dt.weekofyear.rename('Week')
countweeks = df.groupby(weeks).cumcount() + 1
df = df.set_index([weeks, countweeks])['Value'].unstack(fill_value=0).add_prefix('Val')
print (df)
Val1 Val2 Val3 Val4
Week
1 1 2 3 4
2 5000 0 0 0

在第二个解决方案中使用 fillna :

weeks = pd.to_datetime(df['Time (HH:MM)']).dt.weekofyear.rename('Week')
countweeks = df.groupby(weeks).cumcount().add(1).astype(str).radd('Val')
df = pd.pivot(index=weeks, columns=countweeks, values=df['Value']).fillna(0)
print (df)
Val1 Val2 Val3 Val4
Week
1 1.0 2.0 3.0 4.0
2 5000.0 0.0 0.0 0.0

2。如果 sSeriesTime (HH:MM) 为索引的解决方案:

print (s)

Time (HH:MM)
01/01/2014 00:00 1
01/01/2014 00:01 2
01/01/2014 00:02 3
01/01/2014 00:03 4
01/08/2014 00:00 5000
Name: Value, dtype: int64

print (type(s))
<class 'pandas.core.series.Series'>

print (s.index)
Index(['01/01/2014 00:00', '01/01/2014 00:01', '01/01/2014 00:02',
'01/01/2014 00:03', '01/08/2014 00:00'],
dtype='object', name='Time (HH:MM)')

weeks = pd.to_datetime(s.index).weekofyear.rename('Week')
countweeks = s.groupby(weeks).cumcount() + 1
df = s.to_frame().set_index([weeks, countweeks])['Value'].unstack().add_prefix('Val')
print (df)
Val1 Val2 Val3 Val4
Week
1 1.0 2.0 3.0 4.0
2 5000.0 NaN NaN NaN

第二个解决方案:

weeks = pd.to_datetime(s.index).weekofyear.rename('Week')
countweeks = s.groupby(weeks).cumcount().add(1).astype(str).radd('Val')
df = pd.pivot(index=weeks, columns=countweeks, values=s)
print (df)
Val1 Val2 Val3 Val4
Week
1 1.0 2.0 3.0 4.0
2 5000.0 NaN NaN NaN

3。如果 dfDataFrameTime (HH:MM) 是索引的解决方案:

print (df)
Value
Time (HH:MM)
01/01/2014 00:00 1
01/01/2014 00:01 2
01/01/2014 00:02 3
01/01/2014 00:03 4
01/08/2014 00:00 5000

print (type(df))
<class 'pandas.core.frame.DataFrame'>

print (df.index)
Index(['01/01/2014 00:00', '01/01/2014 00:01', '01/01/2014 00:02',
'01/01/2014 00:03', '01/08/2014 00:00'],
dtype='object', name='Time (HH:MM)')

weeks = pd.to_datetime(df.index).weekofyear.rename('Week')
countweeks = df.groupby(weeks).cumcount() + 1
df = df.set_index([weeks, countweeks])['Value'].unstack().add_prefix('Val')
<小时/>
weeks = pd.to_datetime(df.index).weekofyear.rename('Week')
countweeks = df.groupby(weeks).cumcount().add(1).astype(str).radd('Val')
df = pd.pivot(index=weeks, columns=countweeks, values=df['Value'])
print (df)

Val1 Val2 Val3 Val4
Week
1 1.0 2.0 3.0 4.0
2 5000.0 NaN NaN NaN

关于python - 使用 Python 从时间序列创建横截面数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46783325/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com