python - 使用 pandas.DataFrame.resample 的最常见值-6ren

python - 使用 pandas.DataFrame.resample 的最常见值

转载作者：太空狗更新时间：2023-10-30 02:57:37

25

4

我正在使用 pandas.DataFrame.resample 对具有时间戳索引的分组 Pandas dataframe 进行重采样。

在其中一列中，我想重新采样以便选择最频繁的值。目前，我只成功地使用了 NumPy 函数，例如 np.max 或 np.sum 等

#generate test dataframe
data = np.random.randint(0,10,(366,2))
index = pd.date_range(start=pd.Timestamp('1-Dec-2012'), periods=366, unit='D')
test = pd.DataFrame(data, index=index)

#generate group array
group =  np.random.randint(0,2,(366,))

#define how dictionary for resample
how_dict = {0: np.max, 1: np.min}

#perform grouping and resample
test.groupby(group).resample('48 h',how=how_dict)

前面的代码之所以有效，是因为我使用了 NumPy 函数。但是，如果我想按最频繁的值使用重采样，我不确定。我尝试定义一个自定义函数，如

def frequent(x):
    (value, counts) = np.unique(x, return_counts=True)
    return value[counts.argmax()]

但是，如果我现在这样做:

how_dict = {0: np.max, 1: frequent}

我得到一个空数据框...

df = test.groupby(group).resample('48 h',how=how_dict)
df.shape

最佳答案

您的重采样周期太短，因此当一个组在某个周期为空时，您的用户函数会引发一个 ValueError 不会被 pandas 友好地捕获。

但它可以在没有空组的情况下工作，例如常规组:

In [8]: test.groupby(arange(366)%2).resample('48h',how=how_dict).head()
Out[8]: 
              0  1
0 2012-12-01  4  8
  2012-12-03  0  3
  2012-12-05  9  5
  2012-12-07  3  4
  2012-12-09  7  3

或者更大的周期:

In [9]: test.groupby(group).resample('122D',how=how_dict)
Out[9]: 
              0  1
0 2012-12-02  9  0
  2013-04-03  9  0
  2013-08-03  9  6
1 2012-12-01  9  3
  2013-04-02  9  7
  2013-08-02  9  1

编辑

解决方法是管理空案例:

def frequent(x):
    if len(x)==0 : return -1
    (value, counts) = np.unique(x, return_counts=True)
    return value[counts.argmax()]

为了

In [11]: test.groupby(group).resample('48h',how=how_dict).head()
Out[11]: 
               0  1
0 2012-12-01   5  3
  2012-12-03   3  4
  2012-12-05 NaN -1
  2012-12-07   5  0
  2012-12-09   1  4

关于python - 使用 pandas.DataFrame.resample 的最常见值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/36459132/

25

4

0

文章推荐： python - Caffe 特征提取太慢？ caffe.Classifier 或 caffe.Net

文章推荐： python - 过滤 python 列表理解

文章推荐： python - Django 百分比字段

文章推荐： python - 如何使用 tweepy 获取 Twitter 用户的位置？

python - 使用 pandas.MultiIndex : Resampler. aggregate() & Resampler[column] 进行重采样
我正在尝试对数据框重新采样。首先，我想在结果中保留几个聚合。其次，对特定列有一个额外的感兴趣的聚合。由于此聚合仅与单个列相关，因此可以将重采样器限制在该列上，以免不必要地将聚合应用于其他列。此场景适
python - 如何在 pandas resample().mean() 和 resample().sum() 时禁用 nans 计算？
我需要根据月度数据计算年平均值。如果我的月度数据中有 nan 值，我希望全年也为 nan。这是我到目前为止的代码: station_data = pd.read_csv(station_data_f
r - R : Error in names(resamples) <- gsub ("^\\.", ""、names(resamples)) 中带有 SVM 的插入符号:尝试在 NULL 上设置属性
我正在尝试使用插入符和 doMC 在 R 中训练 SVM 模型。这是一个可重现的示例: library(mlbench) library(caret) library(doMC) registerDo
python - "Resample"根据其频率的时间戳
我有一个 pandas Timestamp 与日期时间和频率相关联。但是，日期时间似乎与频率无关。例如， >>> t = pd.Timestamp('2018-6-6', freq='W-FRI')
python - resample() 得到了一个意外的关键字参数 'how' ？
import pandas as pd import os files = os.listdir('D:\\Data\\200 Stocks 1 minute Data') for file in f
python - DataFrame.resample 不包括最后一行
所以我想使用填充方法对数据进行下采样我有一个数据: 2020-01-01 1.248310e+06 2021-01-01 1.259511e+06 2022-01-01 1.276312e+0
python - Pandas.resample 为非整数倍频
我必须将数据集从 10 分钟间隔重新采样为 15 分钟间隔，以使其与另一个数据集同步。根据我在 stackoverflow 上的搜索，我有一些如何继续的想法，但没有一个提供干净清晰的解决方案。问题
python - 在 how= of resample 中插入新的不存在的列
我正在阅读 resample a dataframe with different functions applied to each column? 解决方案是: frame.resample('1
python - pandas - resample - 在下采样之前进行上采样
我有一个 pandas 数据框，其中包含不同时间尺度的信息，即在某些时期我每秒有 100 个数据点，而在其他时期我每分钟有 1 个数据点。我的目标是使用固定时间窗口(例如:1 秒)重新采样此数据帧，
Python pandas resample 添加的日期不存在于原始数据中
我正在使用 pandas 将存储在 data_m 中的日内数据转换为每日数据。出于某种原因，resample 添加了当天数据中不存在的行。例如，1/8/2000 不在日内数据中，但日数据包含该日期的一
python - pandas resample 处理缺失数据
我正在使用 pandas 处理具有某些缺失值的月度数据。我希望能够使用 resample 方法计算年度统计数据，但多年来没有丢失数据。这里有一些代码和输出来演示: import pandas as
python - Pandas 命名聚合不适用于 resample agg
我正在执行一个简单的财务数据示例，试图制作一个经典的烛台图。为此，我必须计算每个时间单位的开盘价、最大值、最小值和收盘价。我决定将 resample 函数与 groupby 一起使用(针对每个符号)。
python - pandas dataframe resample 聚合函数使用具有自定义函数的多列？
下面是一个例子: # Generate some random time series dataframe with 'price' and 'volume' x = pd.date_range('2
python - 具有特定列聚合功能的 Pandas df.resample
与 pandas.DataFrame.resample我可以对 DataFrame 进行下采样: df.resample("3s", how="mean") 这会使用类似日期时间的索引重新采样数据框，
python - Pandas Resample-Sum 没有零填充
当使用平均聚合(每天到每月)重新采样系列时 -> 缺少的日期时间用 NaN 填充，这是可以的，因为我们可以简单地使用 .dropna() 删除它们功能，然而，总和/总聚合 -> 缺少的日期时间用 0
python - Pandas ，.resample ('B' 的意外行为)
我从时间戳为月底的每月系列开始。我想通过填充远期值将它们升级为业务(周一至周五)每日频率。我希望满足两个条件: 如果原始版本是周末，则在重新采样时永远不会丢失值时间序列始终向前填写:如果原始系列中的
python - Pandas.DataFrame.resample 出现意外结果
我的 2017 年 1 月数据集结构如下例所示。 Date ProductID ProductType Qty 1.1.2017 1000 101 7 1.1.2017 1
python - 如何在Python中按天对时间序列数据求和？ resample.sum() 没有效果
我是 Python 新手。如何根据日期对数据求和并绘制结果？我有一个 Series 对象，其数据如下: 2017-11-03 07:30:00 NaN 2017-11-03 09:18:0
python - 更改 pandas resample 中的周定义
目前，当我使用 pandas 重新采样功能数天到数周时，它使用星期日到星期六的周，但我希望它使用星期一到星期日的周。这可能吗？我尝试使用文档中的 loffset ，但它根本不会更改数据。 pivot_
python - 具有相同输入大小的快速和非常慢的 scipy.signal.resample
根据scipy.signal.resample的文档，速度应该根据输入的长度而变化: As noted, resample uses FFT transformations, which can be

首页

博学

6Ren·AI

商城

python - 使用 pandas.DataFrame.resample 的最常见值