gpt4 book ai didi

python - DataFrame 每 3 行取一次并向前填充

转载 作者:行者123 更新时间:2023-11-28 21:47:05 24 4
gpt4 key购买 nike

我有一个 DataFrame,索引中有 'Date''Id' 以及 'Portfolio'在列中。值(value)是投资组合中证券的权重。在索引的日期级别内,我想每 3 个日期取一次,并将安全权重向前填充到下一个“每隔 3 个”日期之后的日期。

设置

这是一个通用的 DataFrame 生产者。在末尾分配了 df

import pandas as pd
import numpy as np
from string import uppercase

def generic_portfolio_df(start, end, freq, num_port, num_sec, seed=314):
np.random.seed(seed)
portfolios = pd.Index(['Portfolio {}'.format(i) for i in uppercase[:num_port]],
name='Portfolio')
securities = ['s{:02d}'.format(i) for i in range(num_sec)]
dates = pd.date_range(start, end, freq=freq)
return pd.DataFrame(np.random.rand(len(dates) * num_sec, num_port),
index=pd.MultiIndex.from_product([dates, securities],
names=['Date', 'Id']),
columns=portfolios
).groupby(level=0).apply(lambda x: x / x.sum())

df = generic_portfolio_df('2014-12-31', '2015-05-30', 'BM', 3, 5)

df 看起来像这样:

Portfolio       Portfolio A  Portfolio B  Portfolio C
Date Id
2014-12-31 s00 0.326164 0.201597 0.085340
s01 0.278614 0.314448 0.266392
s02 0.258958 0.089224 0.293570
s03 0.092760 0.262511 0.084208
s04 0.043503 0.132221 0.270490
2015-01-30 s00 0.094124 0.041722 0.248013
s01 0.197860 0.346862 0.265287
s02 0.232504 0.261939 0.125719
s03 0.193050 0.286359 0.337316
s04 0.282462 0.063118 0.023664
2015-02-27 s00 0.266900 0.484163 0.074970
s01 0.239319 0.083138 0.123289
s02 0.067958 0.262626 0.262548
s03 0.181974 0.108668 0.301149
s04 0.243849 0.061405 0.238044
2015-03-31 s00 0.321438 0.149010 0.125168
s01 0.217779 0.067209 0.040285
s02 0.173066 0.293539 0.417372
s03 0.048929 0.415637 0.216490
s04 0.238788 0.074605 0.200685
2015-04-30 s00 0.089122 0.135514 0.234565
s01 0.048235 0.028141 0.327739
s02 0.026016 0.039664 0.073588
s03 0.413139 0.397875 0.323671
s04 0.423487 0.398807 0.040437
2015-05-29 s00 0.135831 0.071604 0.235099
s01 0.240086 0.242436 0.131698
s02 0.304451 0.380368 0.101653
s03 0.213468 0.035276 0.372894
s04 0.106164 0.270317 0.158656

问题

Within the dates level of the index, I'd like to take every 3rd date and forward fill the security weight to the date subsequent to the next "every third" date.

我希望它看起来像:

Portfolio       Portfolio A  Portfolio B  Portfolio C
Date Id
2014-12-31 s00 0.326164 0.201597 0.085340
s01 0.278614 0.314448 0.266392
s02 0.258958 0.089224 0.293570
s03 0.092760 0.262511 0.084208
s04 0.043503 0.132221 0.270490
2015-01-30 s00 0.326164 0.201597 0.085340
s01 0.278614 0.314448 0.266392
s02 0.258958 0.089224 0.293570
s03 0.092760 0.262511 0.084208
s04 0.043503 0.132221 0.270490
2015-02-27 s00 0.326164 0.201597 0.085340
s01 0.278614 0.314448 0.266392
s02 0.258958 0.089224 0.293570
s03 0.092760 0.262511 0.084208
s04 0.043503 0.132221 0.270490
2015-03-31 s00 0.321438 0.149010 0.125168
s01 0.217779 0.067209 0.040285
s02 0.173066 0.293539 0.417372
s03 0.048929 0.415637 0.216490
s04 0.238788 0.074605 0.200685
2015-04-30 s00 0.321438 0.149010 0.125168
s01 0.217779 0.067209 0.040285
s02 0.173066 0.293539 0.417372
s03 0.048929 0.415637 0.216490
s04 0.238788 0.074605 0.200685
2015-05-29 s00 0.321438 0.149010 0.125168
s01 0.217779 0.067209 0.040285
s02 0.173066 0.293539 0.417372
s03 0.048929 0.415637 0.216490
s04 0.238788 0.074605 0.200685

结论

虽然我仍然对其他人的答案感兴趣。我选择亚历山大的答案而不是我自己的答案是出于以下原因:

%%timeit
df = generic_portfolio_df('2014-12-31', '2015-05-30', 'BM', 3, 5)
df = df.unstack()
df.iloc[3:] = np.nan
df = df.ffill(limit=3).stack()

100 loops, best of 3: 11.6 ms per loop

%%timeit
df = generic_portfolio_df('2014-12-31', '2015-05-30', 'BM', 3, 5)
df0 = df.loc[pd.IndexSlice[::3, :], :]
diff = df.index.difference(df0.index)
df.ix[diff] = np.nan
df.groupby(level=1).ffill(limit=3)

100 loops, best of 3: 21 ms per loop

显然,使用stackunstack 效率更高。

最佳答案

# Create Boolean index of rows to delete (every third row is marked as False).
idx = len(df.unstack())
idx = [i % 3 > 0 for i in range(idx)]
>>> idx
[False, True, True, False, True, True]

# Unstack the dataframe so you just have a column of dates
df = df.unstack()

# Delete those in the `idx` index.
df.loc[idx, :] = np.nan

# Forward fill the retained dates, and then restack your dataframe.
df = df.ffill(limit=3).stack()

>>> df.tail()
Portfolio Portfolio A Portfolio B Portfolio C
Date Id
2015-05-29 s00 0.321438 0.149010 0.125168
s01 0.217779 0.067209 0.040285
s02 0.173066 0.293539 0.417372
s03 0.048929 0.415637 0.216490
s04 0.238788 0.074605 0.200685

关于python - DataFrame 每 3 行取一次并向前填充,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37060877/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com