gpt4 book ai didi

python - 如何比较日期记录并将新列添加到数据框作为条件

转载 作者:太空宇宙 更新时间:2023-11-03 23:54:39 25 4
gpt4 key购买 nike

我有以下传感器数据数据框:

    Data_Digital    Data_Analog         Time
1 10 2015-02-01 00:00:00
1 12 2015-02-01 00:00:05
1 25 2015-02-01 07:45:07
1 25 2015-02-01 07:45:08
1 25 2015-02-01 21:45:10
0 25 2015-03-04 00:00:08

我需要比较位置 0 的“时间”和位置 1 的“时间”。如果两队之间的时间差大于 6 小时,则他们必须属于不同的类(class)。但是,如果时差很小(< 6 小时),它们必须属于同一类。我需要在新的数据框列中表示此类。

期望的输出是:

      Data_Digital  Data_Analog         Time                New_Col_Target
1 10 2015-02-01 00:00:00 1 # init with 1
1 12 2015-02-01 00:00:05 1
1 25 2015-02-01 07:45:07 2 # far from the previous
1 25 2015-02-01 07:45:08 2
1 25 2015-02-01 21:45:10 3 # far from the previous
0 25 2015-03-04 00:00:08 4 # far from the previous

下面是原始数据框:

    import pandas as pd

df = pd.DataFrame({'Data_Digital': [1, 1, 1, 1, 1, 0],
'Data_Analog': [10, 12, 25, 25, 25, 25],
'Time': ['2015-02-01 00:00:00', '2015-02-01 00:00:05','2015-02-01 07:45:07',
'2015-02-01 07:45:08', '2015-02-01 21:45:10', '2015-03-04 00:00:08']})

print(df)

我试过(但是错了):

    index = 0
index2 = 1

df['New_Col_Target'] = 1

for i in range(0, len(df) -1):
for j in range(1, len(df)):

if(abs(pd.to_datetime(df['Time'].iloc[i]) -
pd.to_datetime(df['Time'].iloc[j])) >
pd.to_timedelta('0 day 06:00:00')):

# I don't know how to do the assignments
df['New_Col_Target'].iloc[i] = index
else:
# I don't know how to do the assignments
df['New_Col_Target'].iloc[i] = index2
index2 += 1




# New process

Date Init Date End Mean_Dig Mean_Analog
2015-02-01 00:00:00 2015-02-01 00:00:05 1 11
2015-02-01 07:45:07 2015-02-01 07:45:08 1 25
2015-02-01 07:45:08 2015-02-01 07:45:08 1 25
2015-03-04 00:00:08 2015-03-04 00:00:08 0 25

df_mean_group_New_Col_Target = pd.DataFrame({'Date Init': ['2015-02-01 00:00:00', '2015-02-01 07:45:07', '2015-02-01 07:45:08', '2015-03-04 00:00:08'],
'Date End': ['2015-02-01 00:00:05', '2015-02-01 07:45:08', '2015-02-01 07:45:08', '2015-03-04 00:00:08'],
'Mean_Data_Digital': [1, 1, 1, 0],
'Mean_Data_Analog': [11, 25, 25, 25]})

print(df_mean_group_New_Col_Target)

最佳答案

使用diffpd.Timedeltacumsum:

df['New_col_target'] = (df['Time'].diff() > pd.Timedelta(hours=6)).cumsum().add(1)

输出

   Data_Digital  Data_Analog                Time  New_col_target
0 1 10 2015-02-01 00:00:00 1
1 1 12 2015-02-01 00:00:05 1
2 1 25 2015-02-01 07:45:07 2
3 1 25 2015-02-01 07:45:08 2
4 1 25 2015-02-01 21:45:10 3
5 0 25 2015-03-04 00:00:08 4

如果您的 Time 列还不是 datetime,请使用:

df['Time'] = pd.to_datetime(df['Time'])

场景 2:每组 Data_Digital:

我们必须使用groupby:

m1 = df.groupby('Data_Digital')['Time'].diff().ge(pd.Timedelta(hours=6))
m2 = df['Data_Digital'].diff().ne(0)

df['New_col_target'] = (m1|m2).cumsum()

输出

   Data_Digital  Data_Analog                Time  New_col_target
0 1 10 2015-02-01 00:00:00 1
1 1 12 2015-02-01 00:00:05 1
2 1 25 2015-02-01 07:45:07 2
3 1 25 2015-02-01 07:45:08 2
4 1 25 2015-02-01 21:45:10 3
5 0 25 2015-03-04 00:00:08 4

最后,得到每个目标的平均值

我们可以使用groupby.mean:

df.groupby('New_col_target',as_index=False)[['Data_Digital', 'Data_Analog']].mean()

或者

df.groupby('New_col_target',as_index=False).agg({'Data_Digital':'mean',
'Data_Analog':'mean'})

或者如果你有 pandas >= 0.25.0(检查 pd.__version__),我们可以使用 named_agreggations:

df.groupby('New_col_target').agg(
Digital_mean=('Data_Digital', 'mean'),
Analog_mean=('Data_Analog', 'mean')
).reset_index()

输出

   New_col_target  Data_Digital  Data_Analog
0 1 1 11
1 2 1 25
2 3 1 25
3 4 0 25

输出命名聚合

   New_col_target  Digital_mean  Analog_mean
0 1 1 11
1 2 1 25
2 3 1 25
3 4 0 25

关于python - 如何比较日期记录并将新列添加到数据框作为条件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58294340/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com