gpt4 book ai didi

python - 在数据框上滚动函数

转载 作者:太空狗 更新时间:2023-10-29 22:30:10 25 4
gpt4 key购买 nike

我有以下数据框 C

>>> C
a b c
2011-01-01 0 0 NaN
2011-01-02 41 12 NaN
2011-01-03 82 24 NaN
2011-01-04 123 36 NaN
2011-01-05 164 48 NaN
2011-01-06 205 60 2
2011-01-07 246 72 4
2011-01-08 287 84 6
2011-01-09 328 96 8
2011-01-10 369 108 10

我想添加一个新列 d,我在固定窗口(此处为 6)上应用滚动函数,我以某种方式为每一行(或日期)< em>修复值c。这个滚动函数中的一个循环应该是(伪):

              a    b   c   d
2011-01-01 0 0 NaN a + b*2 (a,b from this row, '2' is from 'c' on 2011-01-06)
2011-01-02 41 12 NaN a + b*2 (a,b from this row, '2' is still from 2011-01-06)
2011-01-03 82 24 NaN a + b*2
2011-01-04 123 36 NaN a + b*2
2011-01-05 164 48 NaN a + b*2
2011-01-06 205 60 2 a + b*2
2011-01-07 246 72 4
2011-01-08 287 84 6
2011-01-09 328 96 8
2011-01-10 369 108 10

在这个“循环”之后,我想在 d 中获取所有这 6 个计算行并运行一个函数调用,该函数调用将返回 一个 值,这应该存储在另一列中,e 说:

              a    b   c   d                               e

2011-01-01 0 0 NaN a + b*2 ---| NaN
2011-01-02 41 12 NaN a + b*2 | NaN
2011-01-03 82 24 NaN a + b*2 | These values NaN
2011-01-04 123 36 NaN a + b*2 | are input to NaN
2011-01-05 164 48 NaN a + b*2 | function NaN
2011-01-06 205 60 2 a + b*2 ---| yielding X
2011-01-07 246 72 4 value X in
2011-01-08 287 84 6 column 'e'
2011-01-09 328 96 8
2011-01-10 369 108 10

然后,此过程将迭代到下一个 窗口(同样为 6 长),例如:

              a    b   c   d             e
2011-01-01 0 0 NaN
2011-01-02 41 12 NaN a + b*4 (a,b from this row, '4' is from 'c' now from 2011-01-07)
2011-01-03 82 24 NaN a + b*4 (a,b from this row, '4' is still from 2011-01-07)
2011-01-04 123 36 NaN a + b*4
2011-01-05 164 48 NaN a + b*4
2011-01-06 205 60 2 a + b*4 X
2011-01-07 246 72 4 a + b*4
2011-01-08 287 84 6
2011-01-09 328 96 8
2011-01-10 369 108 10

a b c d e

2011-01-01 0 0 NaN NaN
2011-01-02 41 12 NaN a + b*4 ---| NaN
2011-01-03 82 24 NaN a + b*4 | These values NaN
2011-01-04 123 36 NaN a + b*4 | are input to NaN
2011-01-05 164 48 NaN a + b*4 | function NaN
2011-01-06 205 60 2 a + b*4 | yielding X
2011-01-07 246 72 4 a + b*4 ---| value Y in Y
2011-01-08 287 84 6 column 'e'
2011-01-09 328 96 8
2011-01-10 369 108 10

希望这足够清楚,

谢谢,N

最佳答案

你可以使用pd.rolling_apply:

import numpy as np
import pandas as pd
df = pd.read_table('data', sep='\s+')

def foo(x, df):
window = df.iloc[x]
# print(window)
c = df.ix[int(x[-1]), 'c']
dvals = window['a'] + window['b']*c
return bar(dvals)

def bar(dvals):
# print(dvals)
return dvals.mean()

df['e'] = pd.rolling_apply(np.arange(len(df)), 6, foo, args=(df,))
print(df)

产量

              a    b   c       e
2011-01-01 0 0 NaN NaN
2011-01-02 41 12 NaN NaN
2011-01-03 82 24 NaN NaN
2011-01-04 123 36 NaN NaN
2011-01-05 164 48 NaN NaN
2011-01-06 205 60 2 162.5
2011-01-07 246 72 4 311.5
2011-01-08 287 84 6 508.5
2011-01-09 328 96 8 753.5
2011-01-10 369 108 10 1046.5

argskwargs 参数是added to rolling_apply in Pandas version 0.14.0 .

因为在我上面的例子中 df 是一个全局变量,所以它并不是真正必要的将其作为参数传递给 foo。您可以简单地从 def 中删除 dffoo 行,并在调用 rolling_apply 时省略 args=(df,)

但是,有时 df 可能未定义在 foo 可访问的范围内。在这种情况下,有一个简单的解决方法——关闭:

def foo(df):
def inner_foo(x):
window = df.iloc[x]
# print(window)
c = df.ix[int(x[-1]), 'c']
dvals = window['a'] + window['b']*c
return bar(dvals)
return inner_foo

df['e'] = pd.rolling_apply(np.arange(len(df)), 6, foo(df))

关于python - 在数据框上滚动函数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28190383/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com