python - pandas hub_table 获取列和行中的平均值-6ren

python - pandas hub_table 获取列和行中的平均值

转载作者：行者123 更新时间：2023-12-01 09:30:57

29

4

我有以下df

code    y_m        date_1        date_2
10     201710      2017-10-01    2017-10-06
10     201710      2017-10-07    2017-10-09
10     201711      2017-11-06    2017-11-08
10     201711      2017-11-02    2017-11-06
20     201710      2017-10-03    2017-10-04
20     201710      2017-10-07    2017-10-08
20     201711      2017-11-06    2017-11-09
20     201711      2017-11-02    2017-11-03

code和y_m是 str , date_1/2是 ISODate .

我想先groupby code和y_m ，并计算date_2-date_1创建一个新列 avg_days对于 Timedelta每组中的值，

code_yr_mon_grp_by = df.groupby(['code', 'y_m'])

code_yr_mon_gr_avg_days = code_yr_mon_grp_by.apply(lambda row: (row['date_2'] - row['date_1']) / np.timedelta64(1, 'D')).mean(level=[0, 1]).reset_index(name='avg_days')

这将生成

code   y_m      avg_days
10    201710     3.5
10    201711     3
20    201710     1
20    201711     2

然后我想将这个 df 转换为转置列 y_m 的矩阵要划船，请输入 avg_days作为矩阵单元格值，

     0     1        2            3             
0   -1     0     201710       201711       
1   0     2.375     2.25         2.5           
2   10    3.25      3.5          3                      
3   20    1.5       1            2

具体来说，-1表示一个虚拟值，指示特定 code 的 y_m 不存在值或维持矩阵形状； 0代表“所有”值，即 code 的平均值或y_m或code和y_m ，例如单元格 (1, 1) 平均 avg_days所有人的值(value)y_m和code ; (1,2)平均 avg_days对于 201710跨越code 10和20 .

但是当我尝试

def convert_to_matrix(df, p_tab_idx, p_tab_cols, p_tab_vals, p_tab_agg_func):
    df_tab = (df.pivot_table(index=p_tab_idx,
                         columns=p_tab_cols,
                         values=p_tab_vals,
                         margins=True,
                         aggfunc=p_tab_agg_func,
                         fill_value=-1,
                         margins_name='0'))

    # change order of index and columns values for reindex
    idx = df_tab.index[-1:].tolist() + df_tab.index[:-1].tolist()
    cols = df_tab.columns[-1:].tolist() + df_tab.columns[:-1].tolist()

    df_tab = (df_tab.reindex(index=idx, columns=cols)
          .reset_index()
          .rename(columns={p_tab_idx: -1})
          .rename_axis(None, 1))

    # add columns to first row
    df_tab = df_tab.columns.to_frame().T.append(df_tab).reset_index(drop=True)
    # reset columns names to range
    df_tab.columns = range(len(df_tab.columns))
    # converts column labels from int to str
    df_tab.columns = df_tab.columns.astype(str)

    return df_tab

code_yr_mon_gr_proc_days_p_tab = convert_to_matrix(code_yr_mon_gr_avg_days,
                                                    p_tab_idx='code',
                                                    p_tab_cols='y_m',
                                                    p_tab_vals='avg_days',
                                                    p_tab_agg_func='mean')

我收到错误

builtins.AttributeError: 'Index' object has no attribute 'to_frame'

我想知道如何解决这个问题并达到预期的结果。

最佳答案

如果 pandas 版本低于 0.21.0 其中 Index.to_frame未实现使用:

df_tab = (pd.DataFrame(df_tab.columns, index=df_tab.columns)
            .T
            .append(df_tab)
            .reset_index(drop=True))

相反:

df_tab = df_tab.columns.to_frame().T.append(df_tab).reset_index(drop=True)

关于python - pandas hub_table 获取列和行中的平均值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49966396/

29

4

0

文章推荐： java - JAX-WS SOAP 返回类型

文章推荐： python - 如何在 python 中创建一个包含 1 行的 sqlite3 数据库

文章推荐： python - 使用 python 打印图案的替代解决方案

python - hub_table 没有给出预期的结果
df: id flag year amt 1 'Y' 2016 100 1 'Y' 2017 200 1 'Y' 2018 100 2 NaN 2016 100 2 'Y
python - pandas hub_table 返回空数据框
当我尝试使用数据透视表对值进行分组时，我得到一个空数据框。让我们首先创建一些愚蠢的数据: import pandas as pd df = pd.DataFrame({"size":['large',
Python pandas hub_table 多个时间索引
我有以下数据框: import pandas as pd import datetime df = pd.DataFrame({'T': [datetime.datetime.now(), datet
python - pandas hub_table 获取列和行中的平均值
我有以下df code y_m date_1 date_2 10 201710 2017-10-01 2017-10-06 10 20
python - 使用pandas hub_table()将属性值对转换为表
我有一组属性，值对如下: date,01-01-2018 product,eggs price, 5 date,01-10-2018 product,milk price,3 我想创建一个像这样的表
python - Pandas hub_table 使用给定的索引和列列表
我需要一些帮助来弄清楚如何将 pandas 数据帧转换为具有给定索引和列列表的表(而不是 pandas 自动选择索引和列的默认行为)。如果这是微不足道的，我们深表歉意。我是 python/pandas
python - 像常规数据框一样过滤 pandas hub_table 结果
当我尝试过滤 Pandas 数据框时，如下面的示例所示，它工作正常: print data[data['ProductCategory'].isin(['ProductA'])] 但是当我尝试对生成的
python - Pandas hub_table 与 aggfunc 在不同数据上的工作方式有所不同
我有一个包含日期、买入和卖出值的数据框，并在该数据框上执行pivot_table获取每个有效日期的所有买入/卖出值(value)。 data = [('20170325', 'Buy', 400 ),
python - Pandas hub_table 多个 aggfunc 带边距
我注意到，当有多个 aggfunc 例如(“count”、“mean”、“sum”)时，我无法设置 margins=True。它会抛出KeyError: 'Level None not found'
python - 将 Pandas hub_table 子图绘制到 matplotlib 图中创建一个新图
我正在尝试使用以下代码在一张图中创建两个条形图: import matplotlib.pyplot as plt import pandas as pd df = read_csv(...) temp
python - 将 pandas hub_table 与 Interval 列一起使用会导致 TypeError
cat1 cat2 col_a col_b 0 (34.0, 38.0] (15.9, 47.0
python - Pandas hub_table 用 0 aggfunc ='sum' 替换 nan
我正在使用这种形式的多值数据透视表: pivot = df.pivot_table(index=[indices], columns=['column'], values=['start_value'

首页

博学

6Ren·AI

商城

python - pandas hub_table 获取列和行中的平均值