gpt4 book ai didi

python - Pandas group by cumsum of lists - 为 lstm 做准备

转载 作者:行者123 更新时间:2023-12-05 05:07:31 24 4
gpt4 key购买 nike

使用来自 here 的相同示例但只需将“A”列更改为可以轻松分组的内容:

import pandas as pd
import numpy as np
# Get some time series data
df = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/timeseries.csv")
df["A"] = pd.Series([1]*3+ [2]*8)
df.head()

现在的输出是:

         Date  A       B       C      D      E      F      G
0 2008-03-18 1 164.93 114.73 26.27 19.21 28.87 63.44
1 2008-03-19 1 164.89 114.75 26.22 19.07 27.76 59.98
2 2008-03-20 1 164.63 115.04 25.78 19.01 27.04 59.61
3 2008-03-25 2 163.92 114.85 27.41 19.61 27.84 59.41
4 2008-03-26 2 163.45 114.84 26.86 19.53 28.02 60.09
5 2008-03-27 2 163.46 115.40 27.09 19.72 28.25 59.62
6 2008-03-28 2 163.22 115.56 27.13 19.63 28.24 58.65

当我们假设它是一个列表时,计算累积总和(来自链接问题的代码)效果很好:

# Put your inputs into a single list
input_cols = ["B", "C"]
df['single_input_vector'] = df[input_cols].apply(tuple, axis=1).apply(list)
# Double-encapsulate list so that you can sum it in the next step and keep time steps as separate elements
df['single_input_vector'] = df.single_input_vector.apply(lambda x: [list(x)])
# Use .cumsum() to include previous row vectors in the current row list of vectors
df['cumulative_input_vectors1'] = df["single_input_vector"].cumsum()

但在这种情况下,我如何cumsum 按“A”分组的列表?我希望这会起作用,但它不起作用:

df['cumu'] = df.groupby("A")["single_input_vector"].apply(lambda x: list(x)).cumsum()

代替 [[164.93, 114.73, 26.27], [164.89, 114.75, 26.... 我填入了一些行,其他行是 NaN。这就是我想要的(cols [B,C] 累积到 col A 组中):

      A       cumu       
0 1 [[164.93,114.73], [164.89,114.75], [164.63,115.04]]
0 2 [[163.92,114.85], [163.45,114.84], [163.46,115.40], [163.22, 115.56]]

此外,我该如何高效地执行此操作?我的数据集很大(大约 200 万行)。

最佳答案

它看起来不像你在做算术和,更像是沿着 axis=1 的 concat

首先groupby和concat

temp_series = df.groupby('A').apply(lambda x: [[a,b] for a, b in zip(x['B'], x['C'])] )

0    [[164.93, 114.73], [164.89, 114.75], [164.63, ...
1 [[163.92, 114.85], [163.45, 114.84], [163.46, ...

然后转换回数据框

df = temp_series.reset_index().rename(columns={0: 'cumsum'})

一行

df = df.groupby('A').apply(lambda x: [[a,b] for a, b in zip(x['B'], x['C'])] ).reset_index().rename(columns={0: 'cumsum'})

关于python - Pandas group by cumsum of lists - 为 lstm 做准备,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59013978/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com