gpt4 book ai didi

python - Q : Python CSV, 能否根据日期时间列中的不同日期自动将 CSV 中的数据行发送到数据框

转载 作者:行者123 更新时间:2023-12-01 02:37:54 25 4
gpt4 key购买 nike

Sample CSV File可以在这里查看

数据描述:

我有大量数据,其中有名为 Time_Stamp 的日期时间列 [例如2017/6/21 01:09:30。每条记录以秒为单位创建行...因此 2017/6/21 01:09:302017/6/21 01:09:31 等等向前。也有不同的日期和不同的时间,例如 2017/6/22 03:38:43

我想要得到什么:

  • Time_Stamp日期2017/6/21 ..(time).. 的行获取到 Dataframe 中,然后按其他日期(例如2017/6/22 ..(时间)..)也放入单独的数据帧中。

我的Python代码:

import csv
from datetime import date,time,datetime
import pandas as pd


Data = pd.read_csv('MainD2.csv', parse_dates=['Time_Stamp'], infer_datetime_format=True)
Data['Date'] = Data.Time_Stamp.dt.date
Data['Time'] = Data.Time_Stamp.dt.time

Data['Time_Stamp'] = pd.to_datetime(Data['Time_Stamp'])
df = pd.DataFrame(Data)

下面是我通过手动获取时间范围和日期尝试的代码: - 我希望这是自动化的,但我不知道该怎么做 -

countIC = 0
countIC2 = 0
countIC3 = 0
def getMask(start,end):
mask = (df['Time_Stamp'] > start) & (df['Time_Stamp'] <= end)
return mask

start = '2017-06-21 01:09:21'
end = '2017-06-21 01:18:54'
timerange1 = Data.loc[mask]

InputCurrent =timerange1.AC_Input_Current.tolist()

for i in InputCurrent:
if float(i) >= 4.9:
countIC += 1
a = (countIC/60)
a1 = (a/60)
else:
countIC2 += 1
b = (countIC2/60)
b1 = (b/60)
countIC3 = (a+b)
print ('Start: ', start, ' - ', 'End:', end ,' \n')
print ('%.2f' %a,'[in Mins]',' or ','%.2f' %a1,'[Hrs]',' of work')
print ('%.2f' %b,'[in Mins]',' or ','%.2f' %b1,'[Hrs]',' of no work done')
print ('Total duration recorded: ','%.2f' %countIC3,'[Mins]')

countICC = 0
countIC22 = 0
countIC33 = 0

def getMask(start,end):
mask1 = (df['Time_Stamp'] > start) & (df['Time_Stamp'] <= end)
return mask1

start = '2017-06-21 19:54:20'
end = '2017-06-21 23:15:04'
timerange2 = Data.loc[mask1]

InputCurrent =timerange2.AC_Input_Current.tolist()

for i in InputCurrent:
if float(i) >= 4.9:
countICC += 1
a = (countICC/60)
a1 = (a/60)
else:
countIC22 += 1
b = (countIC22/60)
b1 = (b/60)
countIC33 = (a+b)
print ('Start: ', start, ' - ', 'End:', end ,' \n')
print ('%.2f' %a,'[in Mins]',' or ','%.2f' %a1,'[Hrs]',' of work')
print ('%.2f' %b,'[in Mins]',' or ','%.2f' %b1,'[Hrs]',' of no work done')
print ('Total duration recorded: ','%.2f' %countIC33,'[Mins]')

上面代码的输出:

Start:  2017-06-21 01:09:21  -  End: 2017-06-21 01:18:54  

3.10 [in Mins] or 0.05 [Hrs] of work done
6.23 [in Mins] or 0.10 [Hrs] of No work done
Total Duration recorded: 9.33 [Mins]

Start: 2017-06-21 19:54:20 - End: 2017-06-21 23:15:04

125.00 [in Mins] or 2.08 [Hrs] of work done
472.38 [in Mins] or 7.87 [Hrs] of No work done
Total Duration recorded: 597.38 [Mins]

最佳答案

首先:你的代码非常像复制粘贴,你定义了函数 getMask 两次,你只需要执行一次。与您的输出字符串相同。

而且你计算工作/非工作时间的方法不是很好。在每次迭代中将最终值除以 60,您也可以在最后进行此计算。

但是,pandas 拥有非常强大的工具来选择、索引和分析数据。尝试下面的代码:

    import csv
from datetime import date,time,datetime
import pandas as pd

def print_output(df):
worktime = len(df.loc[df['AC_Input_Current'] >= 4.9])
noworktime = len(df) - worktime

print ('Start: ', df.index.min(), ' - ', 'End:', df.index.max())
print ('%.2f' %(worktime/60),'[in Mins]',' or ','%.2f' %(worktime/3600),'[Hrs]',' of work')
print ('%.2f' %(noworktime/60),'[in Mins]',' or ','%.2f' %(noworktime/3600),'[Hrs]',' of no work done')
print ('Total duration recorded: ','%.2f' %((noworktime + worktime)/60),'[Mins]\n')

dateparse = lambda x: pd.datetime.strptime(x, '%d/%m/%Y %H:%M:%S')

Data = pd.read_csv('MainD2.csv', parse_dates=['Time_Stamp'], date_parser = dateparse, index_col='Time_Stamp')

print_output(Data['2017-06-23'].between_time('08:00', '23:15:04'))

Data.resample('D').apply(lambda x: print_output(x))

print_output() 打印所选数据的报告。通过日期和 df. Between_time() 方法选择数据。请注意,数据框现在将日期时间作为索引,与您的示例不同。

如果您想制作每日报告,您可以使用df.resample()方法并应用您自己的函数,在本例中为print_output()

这会打印:第一份报告来自于选定时间的手动 print_output(),下面是每日报告。

    Start:  2017-06-23 08:43:31  -  End: 2017-06-23 23:10:15
33.02 [in Mins] or 0.55 [Hrs] of work
210.40 [in Mins] or 3.51 [Hrs] of no work done
Total duration recorded: 243.42 [Mins]

Start: 2017-06-21 01:09:21 - End: 2017-06-21 23:15:04
43.75 [in Mins] or 0.73 [Hrs] of work
161.63 [in Mins] or 2.69 [Hrs] of no work done
Total duration recorded: 205.38 [Mins]

Start: 2017-06-22 00:38:54 - End: 2017-06-22 05:47:29
30.25 [in Mins] or 0.50 [Hrs] of work
271.40 [in Mins] or 4.52 [Hrs] of no work done
Total duration recorded: 301.65 [Mins]

Start: 2017-06-23 00:39:50 - End: 2017-06-23 23:10:15
44.23 [in Mins] or 0.74 [Hrs] of work
263.35 [in Mins] or 4.39 [Hrs] of no work done
Total duration recorded: 307.58 [Mins]

Start: 2017-06-24 00:38:10 - End: 2017-06-24 13:33:49
77.83 [in Mins] or 1.30 [Hrs] of work
120.00 [in Mins] or 2.00 [Hrs] of no work done
Total duration recorded: 197.83 [Mins]

Start: 2017-06-25 09:54:25 - End: 2017-06-25 20:16:05
75.62 [in Mins] or 1.26 [Hrs] of work
131.28 [in Mins] or 2.19 [Hrs] of no work done
Total duration recorded: 206.90 [Mins]

Start: 2017-06-26 00:51:32 - End: 2017-06-26 01:05:55
9.18 [in Mins] or 0.15 [Hrs] of work
4.92 [in Mins] or 0.08 [Hrs] of no work done
Total duration recorded: 14.10 [Mins]

编辑、绘制数据:

首先,我对matplotlib不太熟悉,所以请参阅文档。这只是一个非常基本的介绍。

import matplotlib.pyplot as plt

然后您可以在 print_output() 函数的末尾添加:

df.AC_Input_Current.plot(kind='line')
plt.show()

关于python - Q : Python CSV, 能否根据日期时间列中的不同日期自动将 CSV 中的数据行发送到数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46030065/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com