gpt4 book ai didi

python - 使用方程和日平均值标准化 pandas 数据框列

转载 作者:行者123 更新时间:2023-11-30 22:27:51 25 4
gpt4 key购买 nike

我有一个 pandas 数据框,例如:

df = pd.DataFrame({
'time' : pd.date_range('2017-07-18 00:00:00', '2017-07-21 00:00:00', freq='3H'),
'val1' : np.random.random(25)*300,
'val2' : np.random.random(25)*30})

df.set_index('time', inplace=True)

以及值的数据框:

real_values = pd.DataFrame({
'day' : [18, 19, 20],
'values' : [500, 600, 700]})

我想使用如下公式对 val1 列的值进行标准化:

new_value = old_value*real_value_that_day/daily_average

即每个值乘以当天的实际值与日平均值之间的分数。

我尝试使用.map,但无法在数据框中包含index.day 条件。我尝试使用 groupby(df.index.day),但我不知道如何得到最终结果。

非常感谢

最佳答案

我认为你需要:

np.random.seed(45)
df = pd.DataFrame({
'time' : pd.date_range('2017-07-18 00:00:00', '2017-07-21 00:00:00', freq='3H'),
'val1' : np.random.random(25)*300,
'val2' : np.random.random(25)*30})

df.set_index('time', inplace=True)

real_values = pd.DataFrame({
'day' : [18, 19, 20],
'values' : [500, 600, 700]})
<小时/>
#map real_values to Series with same length as df by days
a = pd.Series(df.index.day, index=df.index).map(real_values.set_index('day')['values'])
print (a.head())
time
2017-07-18 00:00:00 500.0
2017-07-18 03:00:00 500.0
2017-07-18 06:00:00 500.0
2017-07-18 09:00:00 500.0
2017-07-18 12:00:00 500.0
Name: time, dtype: float64
<小时/>
#original multiple by Series a and divide by daily average by transform
df1 = df.mul(a, 0).div(df.groupby(df.index.day).transform('mean'))
print (df1)
val1 val2
time
2017-07-18 00:00:00 1307.171491 403.372865
2017-07-18 03:00:00 726.330473 851.356196
2017-07-18 06:00:00 371.987469 77.497641
2017-07-18 09:00:00 102.153227 959.768694
2017-07-18 12:00:00 587.453074 233.817177
2017-07-18 15:00:00 624.907891 734.391568
2017-07-18 18:00:00 64.131282 114.951326
2017-07-18 21:00:00 215.865093 624.844533
2017-07-19 00:00:00 120.686108 542.744066
2017-07-19 03:00:00 653.014193 1116.500860
2017-07-19 06:00:00 891.148297 333.591495
2017-07-19 09:00:00 676.652432 610.715673
2017-07-19 12:00:00 1031.182496 743.728715
2017-07-19 15:00:00 489.559748 336.152862
2017-07-19 18:00:00 643.545466 147.084368
2017-07-19 21:00:00 294.211260 969.481959
2017-07-20 00:00:00 1474.421809 404.910284
2017-07-20 03:00:00 1016.785621 1078.311435
2017-07-20 06:00:00 665.498098 589.809072
2017-07-20 09:00:00 437.622829 122.931391
2017-07-20 12:00:00 769.989526 1158.555013
2017-07-20 15:00:00 169.891633 968.620184
2017-07-20 18:00:00 342.854461 159.225353
2017-07-20 21:00:00 722.936022 1117.637269
2017-07-21 00:00:00 NaN NaN

详细信息:

print (df.groupby(df.index.day).transform('mean'))
val1 val2
time
2017-07-18 00:00:00 113.490638 14.427688
2017-07-18 03:00:00 113.490638 14.427688
2017-07-18 06:00:00 113.490638 14.427688
2017-07-18 09:00:00 113.490638 14.427688
2017-07-18 12:00:00 113.490638 14.427688
2017-07-18 15:00:00 113.490638 14.427688
2017-07-18 18:00:00 113.490638 14.427688
2017-07-18 21:00:00 113.490638 14.427688
2017-07-19 00:00:00 172.937287 13.491194
2017-07-19 03:00:00 172.937287 13.491194
2017-07-19 06:00:00 172.937287 13.491194
2017-07-19 09:00:00 172.937287 13.491194
2017-07-19 12:00:00 172.937287 13.491194
2017-07-19 15:00:00 172.937287 13.491194
2017-07-19 18:00:00 172.937287 13.491194
2017-07-19 21:00:00 172.937287 13.491194
2017-07-20 00:00:00 139.010896 16.081470
2017-07-20 03:00:00 139.010896 16.081470
2017-07-20 06:00:00 139.010896 16.081470
2017-07-20 09:00:00 139.010896 16.081470
2017-07-20 12:00:00 139.010896 16.081470
2017-07-20 15:00:00 139.010896 16.081470
2017-07-20 18:00:00 139.010896 16.081470
2017-07-20 21:00:00 139.010896 16.081470
2017-07-21 00:00:00 72.827447 2.008148

关于python - 使用方程和日平均值标准化 pandas 数据框列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46766751/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com