gpt4 book ai didi

python - 计算 Pandas 的每月异常

转载 作者:行者123 更新时间:2023-12-04 03:42:07 31 4
gpt4 key购买 nike

你好 StackOverflow 社区,
我一直对使用 Python 3.9.1 和 Numpy 1.19.5 计算 pandas 1.2.0 中的数据异常感兴趣,但一直在努力找出完成此任务的最“Pythonic”和“pandas”方式(或任何方式)对于这个问题)。下面我创建了一些虚拟数据并将其放入pandas DataFrame 。此外,我已尝试清楚地概述我计算虚拟数据每月异常的方法。
我想要做的是取“n”年的月度值(在本例中,2 年的月度数据 = 25 个月)并计算所有年份的月平均值(例如,将所有 1 月值组合在一起并计算平均值) .我已经能够使用 Pandas 来做到这一点。
接下来,我想取每个月的平均值,并从属于该特定月份的 DataFrame 中的所有元素中减去它(例如,从整个 1 月的平均值中减去每个 1 月的值)。在下面的代码中,您将看到一些尝试进行减法运算的代码行,但无济于事。
如果有人对解决此问题的好方法有任何想法或提示,我非常感谢您的洞察力。如果您需要进一步说明,请告诉我。感谢您的时间和想法。
-玛丽安

#Import packages
import numpy as np
import pandas as pd
#-------------------------------------------------------------
#Create a pandas dataframe with some data that will represent:
#Column of dates for two years, at monthly resolution
#Column of corresponding values for each date.

#Create two years worth of monthly dates
dates = pd.date_range(start='2018-01-01', end='2020-01-01', freq='MS')

#Create some random data that will act as our data that we want to compute the anomalies of
values = np.random.randint(0,100,size=25)

#Put our dates and values into a dataframe to demonsrate how we have tried to calculate our anomalies
df = pd.DataFrame({'Dates': dates, 'Values': values})
#-------------------------------------------------------------
#Anomalies will be computed by finding the mean value of each month over all years
#And then subtracting the mean value of each month by each element that is in that particular month

#Group our df according to the month of each entry and calculate monthly mean for each month
monthly_means = df.groupby(df['Dates'].dt.month).mean()
#-------------------------------------------------------------
#Now, how do we go about subtracting these grouped monthly means from each element that falls
#in the corresponding month.
#For example, if the monthly mean over 2 years for January is 20 and the value is 21 in January 2018, the anomaly would be +1 for January 2018

#Example lines of code I have tried, but have not worked

#ValueError:Unable to coerce to Series, length must be 1: given 12
#anomalies = socal_csv.groupby(socal_csv['Date'].dt.month) - monthly_means

#TypeError: unhashable type: "list"
#anomalies = socal_csv.groupby(socal_csv['Date'].dt.month).transform([np.subtract])

最佳答案

您可以像这样使用 pd.merge :

import numpy as np
import pandas as pd

dates = pd.date_range(start='2018-01-01', end='2020-01-01', freq='MS')


values = np.random.randint(0,100,size=25)


df = pd.DataFrame({'Dates': dates, 'Values': values})

monthly_means = df.groupby(df['Dates'].dt.month.mean()


df['month']=df['Dates'].dt.strftime("%m").astype(int)
df=df.merge(monthly_means.rename(columns={'Dates':'month','Values':'Mean'}),on='month',how='left')
df['Diff']=df['Mean']-df['Values']
输出:
 df['Diff']
Out[19]:
0 33.333333
1 19.500000
2 -29.500000
3 -22.500000
4 -24.000000
5 -3.000000
6 10.000000
7 2.500000
8 14.500000
9 -17.500000
10 44.000000
11 31.000000
12 -11.666667
13 -19.500000
14 29.500000
15 22.500000
16 24.000000
17 3.000000
18 -10.000000
19 -2.500000
20 -14.500000
21 17.500000
22 -44.000000
23 -31.000000
24 -21.666667
如果你想要绝对的差异,你可以使用 abs()

关于python - 计算 Pandas 的每月异常,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65839307/

31 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com