gpt4 book ai didi

Python:计算 CSV 中每小时的平均值?

转载 作者:太空宇宙 更新时间:2023-11-03 16:01:55 26 4
gpt4 key购买 nike

我想使用 CSV 计算每个小时的平均值文件:

以下是我的数据集:

Timestamp    Temperature
9/1/2016 0:00:08 53.8
9/1/2016 0:00:38 53.8
9/1/2016 0:01:08 53.8
9/1/2016 0:01:38 53.8
9/1/2016 0:02:08 53.8
9/1/2016 0:02:38 54.1
9/1/2016 0:03:08 54.1
9/1/2016 0:03:38 54.1
9/1/2016 0:04:38 54
9/1/2016 0:05:38 54
9/1/2016 0:06:08 54
9/1/2016 0:06:38 54
9/1/2016 0:07:08 54
9/1/2016 0:07:38 54
9/1/2016 0:08:08 54.1
9/1/2016 0:08:38 54.1
9/1/2016 0:09:38 54.1
9/1/2016 0:10:32 54
9/1/2016 0:11:02 54
9/1/2016 0:11:32 54
9/1/2016 0:00:08 54
9/2/2016 0:00:20 32
9/2/2016 0:00:50 32
9/2/2016 0:01:20 32
9/2/2016 0:01:50 32
9/2/2016 0:02:20 32
9/2/2016 0:02:50 32
9/2/2016 0:03:20 32
9/2/2016 0:03:50 32
9/2/2016 0:04:20 32
9/2/2016 0:04:50 32
9/2/2016 0:05:20 32
9/2/2016 0:05:50 32
9/2/2016 0:06:20 32
9/2/2016 0:06:50 32
9/2/2016 0:07:20 32
9/2/2016 0:07:50 32

这是我计算每日平均值的代码,但我想要每小时:

from datetime import datetime
import pandas
def same_day(date_string): # Remove year
return datetime.strptime(date_string, "%m/%d/%Y %H:%M%S").strftime(%m%d')

df = pandas.read_csv('/home/kk/Desktop/cal_Avg.csv',index_col=0,usecols=[0, 1], names=['Timestamp', 'Discharge'],converters={'Timestamp': same_day})

print(df.groupby(level=0).mean())

我想要的输出是这样的:

Timestamp              Temp          *        Avg
9/1/2016 0:00:08 53.8
9/1/2016 0:00:38 53.8 ?avg for this hour
9/1/2016 0:01:08 53.8
9/1/2016 0:01:38 53.8 ?avg for this hour
9/1/2016 0:02:08 53.8
9/1/2016 0:02:38 54.1

现在我想要特定时间的平均值,分钟

期望的输出:

这里我只打印日期 01-09-2016 和 02-09-16 的 5 小时输出

010900              54.362727         45.497273
010901 54.723276 45.068103
010902 54.746847 45.370270
010903 54.833913 44.931304
010904 54.971053 44.835088
010905 55.519444 44.459259
020901 31.742553 55.640426
020902 31.495556 55.655556
020903 31.304348 55.442609
020904 31.200000 55.437273
020905 31.294382 55.442697

具体日期和具体时间有吗?我该如何存档?

最佳答案

我认为你首先需要read_csv使用参数 index_col=[0] 将第一列读取到 indexparse_dates=[0] 将第一列解析为 DatetimeIndex:

df = pd.read_csv('filename', index_col=[0], parse_dates=[0],, usecols=[0,1])
print (df)
Temperature
Timestamp
2016-09-01 00:00:08 53.8
2016-09-01 00:00:38 53.8
2016-09-01 00:01:08 53.8
2016-09-01 00:01:38 53.8
2016-09-01 00:02:08 53.8
2016-09-01 00:02:38 54.1
2016-09-01 00:03:08 54.1
...
...

然后使用 resample小时并汇总Resampler.mean ,但对于 DatetimeIndex 中缺失的数据,获取 NaN:

print (df.resample('H').mean())
Temperature
Timestamp
2016-09-01 00:00:00 53.980952
2016-09-01 01:00:00 NaN
2016-09-01 02:00:00 NaN
2016-09-01 03:00:00 NaN
2016-09-01 04:00:00 NaN
2016-09-01 05:00:00 NaN
2016-09-01 06:00:00 NaN
2016-09-01 07:00:00 NaN
2016-09-01 08:00:00 NaN
2016-09-01 09:00:00 NaN
2016-09-01 10:00:00 NaN
2016-09-01 11:00:00 NaN
2016-09-01 12:00:00 NaN
2016-09-01 13:00:00 NaN
2016-09-01 14:00:00 NaN
2016-09-01 15:00:00 NaN
2016-09-01 16:00:00 NaN
2016-09-01 17:00:00 NaN
2016-09-01 18:00:00 NaN
2016-09-01 19:00:00 NaN
2016-09-01 20:00:00 NaN
2016-09-01 21:00:00 NaN
2016-09-01 22:00:00 NaN
2016-09-01 23:00:00 NaN
2016-09-02 00:00:00 32.000000

另一个解决方案是通过此数组转换为小时groupby来删除分钟 :

print (df.index.values.astype('<M8[h]'))
['2016-09-01T00' '2016-09-01T00' '2016-09-01T00' '2016-09-01T00'
'2016-09-01T00' '2016-09-01T00' '2016-09-01T00' '2016-09-01T00'
'2016-09-01T00' '2016-09-01T00' '2016-09-01T00' '2016-09-01T00'
'2016-09-01T00' '2016-09-01T00' '2016-09-01T00' '2016-09-01T00'
'2016-09-01T00' '2016-09-01T00' '2016-09-01T00' '2016-09-01T00'
'2016-09-01T00' '2016-09-02T00' '2016-09-02T00' '2016-09-02T00'
'2016-09-02T00' '2016-09-02T00' '2016-09-02T00' '2016-09-02T00'
'2016-09-02T00' '2016-09-02T00' '2016-09-02T00' '2016-09-02T00'
'2016-09-02T00' '2016-09-02T00' '2016-09-02T00' '2016-09-02T00'
'2016-09-02T00']

print (df.groupby([df.index.values.astype('<M8[h]')]).mean())
Temperature
2016-09-01 53.980952
2016-09-02 32.000000

此外,如果需要按月、日和小时进行平均,groupby by DatetimeIndex.strftime :

print (df.index.strftime('%m%d%H'))
['090100' '090100' '090100' '090100' '090100' '090100' '090100' '090100'
'090100' '090100' '090100' '090100' '090100' '090100' '090100' '090100'
'090100' '090100' '090100' '090100' '090100' '090200' '090200' '090200'
'090200' '090200' '090200' '090200' '090200' '090200' '090200' '090200'
'090200' '090200' '090200' '090200' '090200']

print (df.groupby([df.index.strftime('%m%d%H')]).mean())
Temperature
090100 53.980952
090200 32.000000

或者如果需要仅按小时groupby by DatetimeIndex.hour :

print (df.index.hour)
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

print (df.groupby([df.index.hour]).mean())
Temperature
0 44.475676

关于Python:计算 CSV 中每小时的平均值?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40256020/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com