gpt4 book ai didi

python - 不能 'merge' 'DataFrameGroupBy'

转载 作者:太空宇宙 更新时间:2023-11-03 14:00:46 25 4
gpt4 key购买 nike

我有一个数据框,其中一列是分类数据,其余是 float 数据。我根据数据的类型将这两者分开。两个数据帧都有时间戳,时间戳是它们的索引。

我正在尝试聚合数字数据的统计数据以及分类数据的最常见标签 5 分钟。我分别处理每种类型,但无法将两组重新组装在一起。

Telemetry=All[FLOATTYPE]
grouped = Telemetry.groupby(Telemetry.index.floor('5T'))
# computing various stats
grouped1 = grouped.agg([ 'mean','std'])
Category=All[CATEGORICALTYPE]
grouped2 = Category.groupby(Category.index.floor('5T'))
grouped2=grouped2.agg(lambda x: x.value_counts().index[0] if len(x.dropna())!=0 else np.nan)

grouped = grouped.merge( grouped2, axis=1)

AttributeError: Cannot access callable attribute 'merge' of 'DataFrameGroupBy' objects, try using the 'apply' method

有什么方法可以通过像这样的一行来避免这个问题:

grouped1 = grouped.agg(lambda x: [ 'mean','std'] if x.astype(float) else  (x.value_counts().index[0] if len(x.dropna())!=0 else np.nan) )

或者根据索引将两个组合并在一起并避免错误。

最佳答案

考虑使用 join 代替 merge,默认情况下,它会对齐两个数据帧之间的索引。

final = grouped1.join(grouped2)

但是,您希望展平遥测的多个聚合 groupby 产生的分层列,以避免不同级别的意外结果(将引发警告): p>

grouped1 = Telemetry.groupby(Telemetry.index.floor('5T')).agg([ 'mean','std'])

from itertools import product

newcols = [str(i[0])+'_'+i[1]
for i in list(product(grouped1.columns.levels[0], grouped1.columns.levels[1]))]
grouped1.columns = newcols
<小时/>

下面用可重现的示例进行演示

数据

import numpy as np
import pandas as pd
import datetime as dt
import time

LETTERS = list('ABCDEFGHIJKLMNOPQRSTUVWXYZ')
epoch_time = int(time.time())

np.random.seed(1001)
ALL = pd.DataFrame({'NUM1': np.random.randn(50)*100,
'NUM2': np.random.uniform(0,1,50),
'NUM3': np.random.randint(100, size=50),
'CAT1': ["".join(np.random.choice(LETTERS,1)) for _ in range(50)],
'CAT2': ["".join(np.random.choice(['pandas', 'r', 'julia', 'sas', 'stata', 'spss'],1)) for _ in range(50)],
'CAT3': ["".join(np.random.choice(['postgres', 'mysql', 'sqlite', 'oracle', 'sql server', 'db2'],1)) for _ in range(50)]},
index=[dt.datetime.fromtimestamp(np.random.randint(epoch_time - 5000, epoch_time)) for _ in range(50)])

聚合

from itertools import product

# NUMERIC COLS --------------------------------------------------
Telemetry = ALL.filter(regex='NUM', axis=1)

grouped1 = Telemetry.groupby(Telemetry.index.floor('5T')).agg([ 'mean','std'])

newcols = [str(i[0])+'_'+i[1]
for i in list(product(grouped1.columns.levels[0], grouped1.columns.levels[1]))]
grouped1.columns = newcols

# CATEGORY COLS -------------------------------------------------
Category = ALL.filter(regex='CAT', axis=1)

grouped2 = Category.groupby(Category.index.floor('5T'))\
.agg(lambda x: x.value_counts().index[0] if len(x.dropna())!=0 else np.nan)

final = grouped1.join(grouped2)

输出

print(final)

# NUM1_mean NUM1_std NUM2_mean NUM2_std NUM3_mean NUM3_std CAT1 CAT2 CATR3
# 2018-03-13 13:55:00 -17.516103 59.562954 0.530788 0.217159 67.000000 17.568912 I julia sqlite
# 2018-03-13 14:00:00 85.189272 NaN 0.842956 NaN 43.000000 NaN Y sas oracle
# 2018-03-13 14:05:00 -16.833329 201.004717 0.737183 0.332332 55.500000 38.890873 M spss postgres
# 2018-03-13 14:10:00 84.936984 80.634218 0.754657 0.110415 80.600000 17.213367 V stata oracle
# 2018-03-13 14:15:00 99.512503 11.492072 0.521307 0.250584 23.500000 30.405592 E pandas sqlite
# 2018-03-13 14:20:00 -90.749756 65.721659 0.459464 0.377603 35.250000 30.192438 G sas db2
# 2018-03-13 14:25:00 -56.271685 104.440802 0.496268 0.348611 56.500000 28.310775 K spss postgres
# 2018-03-13 14:30:00 55.369341 50.215679 0.600296 0.399855 65.000000 41.004065 P r db2
# 2018-03-13 14:40:00 184.546043 NaN 0.892016 NaN 84.000000 NaN W julia db2
# 2018-03-13 14:45:00 -93.886027 61.489475 0.498042 0.286001 48.000000 25.429641 W stata postgres
# 2018-03-13 14:50:00 122.819400 NaN 0.168059 NaN 41.000000 NaN S stata sqlite
# 2018-03-13 14:55:00 -34.318532 40.225336 0.756454 0.335583 26.666667 21.385353 F stata sqlite
# 2018-03-13 15:00:00 -2.329881 NaN 0.894770 NaN 73.000000 NaN Y sas postgres
# 2018-03-13 15:05:00 -86.408659 31.446422 0.618246 0.158136 52.000000 59.396970 G julia postgres
# 2018-03-13 15:10:00 -20.309460 121.773576 0.479996 0.394707 52.000000 42.585209 U sas oracle
# 2018-03-13 15:15:00 -5.493293 217.143835 0.478187 0.530773 59.500000 55.861436 E stata postgres

关于python - 不能 'merge' 'DataFrameGroupBy',我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49264407/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com