pandas - 如何在 Pandas 中制作绘制多年月度数据的图表

转载作者：行者123 更新时间：2023-12-05 02:51:43

我有 11 年的每小时臭氧浓度数据。
- 有 11 个 csv 文件包含每天每小时的臭氧浓度。
我能够读取其中的所有文件并将索引从日期转换为日期时间。
对于我的图表:
- 我计算了每天 8 小时的最大平均值，然后对每个月的这些值求平均值。
我的新数据框 (df3) 有:
- 一个日期时间索引，它由 12 年中一年中每个月的最后一天组成。
- 它还有一列包括平均 MDA8 值。
我想为 4 月、5 月和 6 月制作 3 个单独的散点图。 (x 轴 = 年，y 轴 = 该月的平均 MDA8)
- 但是，我对如何调用这些单独的月份并绘制年度数据感到困惑。

最小样本

site,date,start_hour,value,variable,units,quality,prelim,name 
3135,2010-01-01,0,13.0,OZONE,Parts Per Billion ( ppb ),,,Calexico-Ethel Street
3135,2010-01-01,1,5.0,OZONE,Parts Per Billion ( ppb ),,,Calexico-Ethel Street
3135,2010-01-01,2,11.0,OZONE,Parts Per Billion ( ppb ),,,Calexico-Ethel Street
3135,2010-01-01,3,17.0,OZONE,Parts Per Billion ( ppb ),,,Calexico-Ethel Street
3135,2010-01-01,5,16.0,OZONE,Parts Per Billion ( ppb ),,,Calexico-Ethel Street

这是查找类似 CSV 数据的链接 https://www.arb.ca.gov/aqmis2/aqdselect.php?tab=hourly

我在下面附上了一些代码:

import pandas as pd
import os
import glob
import matplotlib.pyplot as plt

path = "C:/Users/blah"
for f in glob.glob(os.path.join(path, "*.csv")):
    df = pd.read_csv(f, header = 0, index_col='date')
    df2 = df.dropna(axis = 0, how = "all", subset = ['start_hour', 'variable'], inplace = True) 
    df = df.iloc[0:]
    df.index = pd.to_datetime(df.index) #converting date to datetime
    df['start_hour'] = pd.to_timedelta(df['start_hour'], unit = 'h')
    df['datetime'] = df.index + df['start_hour']
    df.set_index('datetime', inplace = True)

    df2 = df.value.rolling('8H', min_periods = 6).mean() 
    df2.index -= pd.DateOffset(hours=3)
    df2 = df4.resample('D').max()
    df2.index.name = 'timestamp'

出现以下问题:

    df3 = df2.groupby(pd.Grouper(freq = 'M')).mean()
    df4 = df3[df3.index.month.isin([4,5,6])]
    if df4 == True:
        plt.plot(df3.index, df3.values)
    print(df4)

每当我这样做时，我都会收到一条消息说“ValueError:系列的真值不明确。使用 a.empty、a.bool()、a.item()、a.any() 或 a.全部()。”当我使用 df4.any() == True: 尝试此代码时，它绘制了除 4 月至 6 月之外的所有月份，并将所有值绘制在同一图中。我想要每个月的不同情节。

我还尝试添加以下内容并删除之前的 if 语句:

df5 = df4.index.year.isin([2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019])
    if df5.all() == True:
        plt.plot(df4.index, df4.values)

但是，这给了我这样的图像:
Apr-Jun MDA8 values

同样，我想为每个月制作一个单独的散点图，尽管这更接近我想要的。任何帮助将不胜感激，谢谢。

编辑另外，我有2020年的数据，只延伸到7月份。我不认为这会影响我的图表，但我只是想提一下。理想情况下，我希望它看起来像这样，但每年和 4 月的各个月份都有不同的点

Scatterplot

最佳答案

df.index -= pd.DateOffset(hours=3) 已被移除，因为它可能存在问题
- 每个月的第一个小时就是上个月
- 每天的头几个小时是前一天

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
from datetime import date
from pandas.tseries.offsets import MonthEnd

# set the path to the files
p = Path('/PythonProjects/stack_overflow/data/ozone/')

# list of files
files = list(p.glob('OZONE*.csv'))

# create a dataframe from the files - all years all data
df = pd.concat([pd.read_csv(file) for file in files])

# format the dataframe
df.start_hour = pd.to_timedelta(df['start_hour'], unit = 'h')
df.date = pd.to_datetime(df.date)
df['datetime'] = df.date + df.start_hour
df.drop(columns=['date', 'start_hour'], inplace=True)
df['month'] = df.datetime.dt.month
df['day'] = df.datetime.dt.day
df['year'] = df.datetime.dt.year
df = df[df.month.isin([4, 5, 6])].copy()  # filter the dataframe - only April, May, June
df.set_index('datetime', inplace = True)

# calculate the 8-hour rolling mean
df['r_mean'] = df.value.rolling('8H', min_periods=6).mean()

# determine max value per day
r_mean_daily_max = df.groupby(['year', 'month', 'day'], as_index=False)['r_mean'].max()

# calculate the mean from the daily max
mda8 = r_mean_daily_max.groupby(['year', 'month'], as_index=False)['r_mean'].mean()

# add a new datetime column with the date as the end of the month
mda8['datetime'] = pd.to_datetime(mda8.year.astype(str) + mda8.month.astype(str), format='%Y%m') + MonthEnd(1)

`df.info()` & `.head()` 在任何处理之前

<class 'pandas.core.frame.DataFrame'>
Int64Index: 78204 entries, 0 to 4663
Data columns (total 9 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   site        78204 non-null  int64  
 1   date        78204 non-null  object 
 2   start_hour  78204 non-null  int64  
 3   value       78204 non-null  float64
 4   variable    78204 non-null  object 
 5   units       78204 non-null  object 
 6   quality     4664 non-null   float64
 7   prelim      4664 non-null   object 
 8   name        78204 non-null  object 
dtypes: float64(2), int64(2), object(5)
memory usage: 6.0+ MB

   site        date  start_hour  value variable                      units  quality prelim                   name 
0  3135  2011-01-01           0   14.0    OZONE  Parts Per Billion ( ppb )      NaN    NaN  Calexico-Ethel Street 
1  3135  2011-01-01           1   11.0    OZONE  Parts Per Billion ( ppb )      NaN    NaN  Calexico-Ethel Street 
2  3135  2011-01-01           2   22.0    OZONE  Parts Per Billion ( ppb )      NaN    NaN  Calexico-Ethel Street 
3  3135  2011-01-01           3   25.0    OZONE  Parts Per Billion ( ppb )      NaN    NaN  Calexico-Ethel Street 
4  3135  2011-01-01           5   22.0    OZONE  Parts Per Billion ( ppb )      NaN    NaN  Calexico-Ethel Street

`df.info` & `.head()`处理后

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 20708 entries, 2011-04-01 00:00:00 to 2020-06-30 23:00:00
Data columns (total 11 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   site      20708 non-null  int64  
 1   value     20708 non-null  float64
 2   variable  20708 non-null  object 
 3   units     20708 non-null  object 
 4   quality   2086 non-null   float64
 5   prelim    2086 non-null   object 
 6   name      20708 non-null  object 
 7   month     20708 non-null  int64  
 8   day       20708 non-null  int64  
 9   year      20708 non-null  int64  
 10  r_mean    20475 non-null  float64
dtypes: float64(3), int64(4), object(4)
memory usage: 1.9+ MB

                     site  value variable                      units  quality prelim                   name   month  day  year  r_mean
datetime                                                                                                                              
2011-04-01 00:00:00  3135   13.0    OZONE  Parts Per Billion ( ppb )      NaN    NaN  Calexico-Ethel Street       4    1  2011     NaN
2011-04-01 01:00:00  3135   29.0    OZONE  Parts Per Billion ( ppb )      NaN    NaN  Calexico-Ethel Street       4    1  2011     NaN
2011-04-01 02:00:00  3135   31.0    OZONE  Parts Per Billion ( ppb )      NaN    NaN  Calexico-Ethel Street       4    1  2011     NaN
2011-04-01 03:00:00  3135   28.0    OZONE  Parts Per Billion ( ppb )      NaN    NaN  Calexico-Ethel Street       4    1  2011     NaN
2011-04-01 05:00:00  3135   11.0    OZONE  Parts Per Billion ( ppb )      NaN    NaN  Calexico-Ethel Street       4    1  2011     NaN

`r_mean_daily_max.info()` 和 `.head()`

<class 'pandas.core.frame.DataFrame'>
Int64Index: 910 entries, 0 to 909
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   year    910 non-null    int64  
 1   month   910 non-null    int64  
 2   day     910 non-null    int64  
 3   r_mean  910 non-null    float64
dtypes: float64(1), int64(3)
memory usage: 35.5 KB

   year  month  day  r_mean
0  2011      4    1  44.125
1  2011      4    2  43.500
2  2011      4    3  42.000
3  2011      4    4  49.625
4  2011      4    5  45.500

`mda8.info()` & `.head()`

<class 'pandas.core.frame.DataFrame'>
Int64Index: 30 entries, 0 to 29
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype         
---  ------    --------------  -----         
 0   year      30 non-null     int64         
 1   month     30 non-null     int64         
 2   r_mean    30 non-null     float64       
 3   datetime  30 non-null     datetime64[ns]
dtypes: datetime64[ns](1), float64(1), int64(2)
memory usage: 1.2 KB

   year  month     r_mean   datetime
0  2011      4  49.808135 2011-04-30
1  2011      5  55.225806 2011-05-31
2  2011      6  58.162302 2011-06-30
3  2012      4  45.865278 2012-04-30
4  2012      5  61.061828 2012-05-31

mda8

地 block 1

sns.lineplot(mda8.datetime, mda8.r_mean, marker='o')
plt.xlim(date(2011, 1, 1), date(2021, 1, 1))

图2

# create color mapping based on all unique values of year
years = mda8.year.unique()
colors = sns.color_palette('husl', n_colors=len(years))  # get a number of colors
cmap = dict(zip(years, colors))  # zip values to colors

for g, d in mda8.groupby('year'):
    sns.lineplot(d.datetime, d.r_mean, marker='o', hue=g, palette=cmap)
    
plt.xlim(date(2011, 1, 1), date(2021, 1, 1))
plt.legend(bbox_to_anchor=(1.04,0.5), loc="center left", borderaxespad=0)

地 block 3

sns.barplot(x='month', y='r_mean', data=mda8, hue='year')
plt.legend(bbox_to_anchor=(1.04,0.5), loc="center left", borderaxespad=0)
plt.title('MDA8: April - June')
plt.ylabel('mda8 (ppb)')
plt.show()

地 block 4

for month in mda8.month.unique():
    data = mda8[mda8.month == month]  # filter and plot the data for a specific month
    plt.figure()  # create a new figure for each month
    sns.lineplot(data.datetime, data.r_mean, marker='o')
    plt.xlim(date(2011, 1, 1), date(2021, 1, 1))
    plt.title(f'Month: {month}')
    plt.ylabel('MDA8: PPB')
    plt.xlabel('Year')

每个月会有一个地 block

地 block 5

for month in mda8.month.unique():
    data = mda8[mda8.month == month]
    sns.lineplot(data.datetime, data.r_mean, marker='o', label=month)
    plt.legend(title='Month')
    plt.xlim(date(2011, 1, 1), date(2021, 1, 1))
    plt.ylabel('MDA8: PPB')
    plt.xlabel('Year')

解决 我想为 4 月、5 月和 6 月制作 3 个单独的散点图。
主要问题是，无法使用日期时间轴绘制数据。
- 目标是在轴上绘制每一天，每个数字代表不同的月份。

线图

有点忙
已使用自定义颜色图，因为标准调色板中没有足够的颜色来为每年提供独特的颜色

# create color mapping based on all unique values of year
years = df.index.year.unique()
colors = sns.color_palette('husl', n_colors=len(years))  # get a number of colors
cmap = dict(zip(years, colors))  # zip values to colors

for k, v in df.groupby('month'):  # group the dateframe by month
    plt.figure(figsize=(16, 10))
    for year in v.index.year.unique():  # withing the month plot each year
        data = v[v.index.year == year]
        sns.lineplot(data.index.day, data.r_mean, err_style=None, hue=year, palette=cmap)
    plt.xlim(0, 33)
    plt.xticks(range(1, 32))
    plt.title(f'Month: {k}')
    plt.xlabel('Day of Month')
    plt.legend(bbox_to_anchor=(1.04,0.5), loc="center left", borderaxespad=0)
plt.show()

这是四月，其他两个数字看起来与此相似

条形图

for k, v in df.groupby('month'):  # group the dateframe by month
    plt.figure(figsize=(10, 20))

    sns.barplot(x=v.r_mean, y=v.day, ci=None, orient='h', hue=v.index.year)
    plt.title(f'Month: {k}')
    plt.ylabel('Day of Month')
    plt.legend(bbox_to_anchor=(1.04,0.5), loc="center left", borderaxespad=0)
plt.show()

关于pandas - 如何在 Pandas 中制作绘制多年月度数据的图表，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/63040250/

文章推荐： c# - Unity编辑器捕获 "hold mouse down and move"事件

javascript - 我需要将文本放在一个中，它位于一个 Div 中，该 Div 位于另一个 Div 中，该 Div 位于另一个 Div 中
我需要将文本放在中在一个 Div 中，在另一个 Div 中，在另一个 Div 中。所以这是它的样子: #document Change PIN
html - 两个背景图像。一个在 HTML 中，一个在 BODY 中。在 Firefox 中，主体图像未呈现
奇怪的事情发生了。我有一个基本的 html 代码。 html，头部， body 。(因为我收到了一些反对票，这里是完整的代码) 这是我的CSS: html { backgroun
ios - 将图像从 asset.xcassets 加载到 imageArray 中，并将其动态加载到 UIImageView 中，该 UIImageView 存在于 UICollectionView 中 - swift
我正在尝试将 Assets 中的一组图像加载到 UICollectionview 中存在的 ImageView 中，但每当我运行应用程序时它都会显示错误。而且也没有显示图像。我在ViewDidLoa
linux - 在 BASH 中，我需要根据 perl 脚本的输出更改一些环境变量。在 tcsh 中，我可以使用别名 eval 组合。不能在 bash 中
我需要根据带参数的 perl 脚本的输出更改一些环境变量。在 tcsh 中，我可以使用别名命令来评估 perl 脚本的输出。 tcsh: alias setsdk 'eval `/localhome/
asp.net - Windows 身份验证适用于 IIS，但不适用于 Kestrel/Microsoft.AspNetCore.Authentication.Negotiate(不在 Chrome 中，有时在 Edge 中，始终在 IE 中)？
我使用 Windows 身份验证创建了一个新的 Blazor(服务器端)应用程序，并使用 IIS Express 运行它。它将显示一条消息“Hello Domain\User!”来自右上方的以下 Ra
java - java 中 Kotlin 中的等价物是什么？
这是我的方法 void login(Event event);我想知道 Kotlin 中应该如何最佳答案在 Kotlin 中通配符运算符是 * 。它指示编译器它是未知的，但一旦知道，就不会有其他类
express - 在 Jade 中，为什么有时我可以按原样使用变量而有时必须将它们包含在#{......} 中？
看下面的代码 for story in book if story.title.length < 140 - var story
c - C 中 strstr() 中 for 循环的错误使用
我正在尝试用 C 语言学习字符串处理。我写了一个程序，它存储了一些音乐轨道，并帮助用户检查他/她想到的歌曲是否存在于存储的轨道中。这是通过要求用户输入一串字符来完成的。然后程序使用 strstr()
c - * 在 sscanf 中，* 在 [] 中
我正在学习 sscanf 并遇到如下格式字符串: sscanf("%[^:]:%[^*=]%*[*=]%n",a,b,&c); 我理解 %[^:] 部分意味着扫描直到遇到 ':' 并将其分配给 a。:
python - 在 Python (2.7.3) 中，如果 str(x) 中的任何字符在 str(y) 中(或 str(y) 在 str(x) 中)，我如何编写一个函数来回答？
def char_check(x,y): if (str(x) in y or x.find(y) > -1) or (str(y) in x or y.find(x) > -1):
ansible - 在 Ansible 中，如何将一行移动到一个 block 中？
我有一种情况，我想将文本文件中的现有行包含到一个新 block 中。 line 1 line 2 line in block line 3 line 4 应该变成 line 1 line 2 line
Django 调试工具栏显示在根 URL 中，但不显示在应用程序 URL 中
我有一个新项目，我正在尝试设置 Django 调试工具栏。首先，我尝试了快速设置，它只涉及将 'debug_toolbar' 添加到我的已安装应用程序列表中。有了这个，当我转到我的根 URL 时，调试
r - 在 R 中，Matlab 中 @ 函数句柄的等价物是什么？
在 Matlab 中，如果我有一个函数 f，例如签名是 f(a,b,c)，我可以创建一个只有一个变量 b 的函数，它将使用固定的 a=a1 和 c=c1 调用 f: g = @(b) f(a1, b,
swiftui - SwiftUI 中 ScrollView 中 VStack 元素中的神秘间距或填充
我不明白为什么 ForEach 中的元素之间有多余的垂直间距在 VStack 里面在 ScrollView 里面使用 GeometryReader 时渲染自定义水平分隔线。 Scrol
cookies - 什么应该存储在 session 中，什么应该存储在 cookie 中？
我想知道，是否有关于何时使用 session 和 cookie 的指南或最佳实践？什么应该和什么不应该存储在其中？谢谢! 最佳答案这些文档很好地了解了 session cookie 的安全问题以及
python - Python 中 matplotlib 中 3d 直方图的奇怪行为
我在 scipy/numpy 中有一个 Nx3 矩阵，我想用它制作一个 3 维条形图，其中 X 轴和 Y 轴由矩阵的第一列和第二列的值、高度确定每个条形的是矩阵中的第三列，条形的数量由 N 确定。
c - c 中 sem_init(...) 中 value 参数的不同用法
假设我用两种不同的方式初始化信号量 sem_init(&randomsem,0,1) sem_init(&randomsem,0,0) 现在， sem_wait(&randomsem) 在这两种情况下
c - 实际值存储在 pstr 中，但是该值如何存储在数组 "WORD"中
我怀疑该值如何存储在“WORD”中，因为 PStr 包含实际输出。？既然Pstr中存储的是小写到大写的字母，那么在printf中如何将其给出为“WORD”。有人可以吗？解释一下？ #include
javascript - 数组索引选择像在 numpy 中，但在 javascript 中
我有一个 3x3 数组: var my_array = [[0,1,2], [3,4,5], [6,7,8]]; 并想获得它的第一个 2
javascript - 在 Javascript 中，如何检测浏览器窗口何时在 View 中？
我意识到您可以使用如下方式轻松检查焦点: var hasFocus = true; $(window).blur(function(){ hasFocus = false; }); $(win

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

pandas - 如何在 Pandas 中制作绘制多年月度数据的图表

最小样本

`df.info()` & `.head()` 在任何处理之前

`df.info` & `.head()`处理后

`r_mean_daily_max.info()` 和 `.head()`

`mda8.info()` & `.head()`

mda8

地 block 1

图2

地 block 3

地 block 4

地 block 5

线图

条形图

首页

博学

6Ren·AI

商城

pandas - 如何在 Pandas 中制作绘制多年月度数据的图表

最小样本

df.info() & .head() 在任何处理之前

df.info & .head()处理后

r_mean_daily_max.info() 和 .head()

mda8.info() & .head()

mda8

地 block 1

图2

地 block 3

地 block 4

地 block 5

线图

条形图

`df.info()` & `.head()` 在任何处理之前

`df.info` & `.head()`处理后

`r_mean_daily_max.info()` 和 `.head()`

`mda8.info()` & `.head()`