gpt4 book ai didi

python - Pandas 中的随机数据 block

转载 作者:行者123 更新时间:2023-11-28 22:24:57 26 4
gpt4 key购买 nike

我需要从我的数据框 df 中获取随机数据 block 。我试过使用 df.sample(10),但它只生成单个样本,而不是连续的 block 。有没有办法对随机 block (例如,6 个连续数据点的 block )进行采样?

这是数据框的示例

Year_DoY_Hour
2015-11-20 12:00:00 NaN
2015-11-20 12:30:00 NaN
2015-11-20 13:00:00 NaN
2015-11-20 13:30:00 NaN
2015-11-20 14:00:00 NaN
2015-11-20 14:30:00 NaN
2015-11-20 15:00:00 0.083298
...
2016-04-30 13:00:00 0.055639
2016-04-30 13:30:00 0.030809
2016-04-30 14:00:00 0.079277
2016-04-30 14:30:00 0.040736
2016-04-30 15:00:00 0.066980
2016-04-30 15:30:00 0.076448
2016-04-30 16:00:00 0.066822
2016-04-30 16:30:00 0.073143
2016-04-30 17:00:00 NaN
2016-04-30 17:30:00 NaN
2016-04-30 18:00:00 NaN
2016-04-30 18:30:00 NaN
2016-04-30 19:00:00 NaN
2016-04-30 19:30:00 NaN

因此,从 df 开始,我需要创建 3 个随机选择的 6 行 block 。

例子:

block 1

2016-04-30 15:00:00    0.066980
2016-04-30 15:30:00 0.076448
2016-04-30 16:00:00 0.066822
2016-04-30 16:30:00 0.073143
2016-04-30 17:00:00 NaN
2016-04-30 17:30:00 NaN

block 2

2016-04-30 09:30:00    0.036728
2016-04-30 10:00:00 0.036108
2016-04-30 10:30:00 0.031045
2016-04-30 11:00:00 0.031762
2016-04-30 11:30:00 0.033714
2016-04-30 12:00:00 0.042499

block 3

2015-11-20 04:30:00         NaN
2015-11-20 05:00:00 NaN
2015-11-20 05:30:00 NaN
2015-11-20 06:00:00 NaN
2015-11-20 06:30:00 NaN
2015-11-20 07:00:00 NaN

block 应该是随机顺序,但 block 中的数据必须是顺序的。我还没有找到任何功能或类似的东西来做到这一点。

最佳答案

您可以生成一个从 0 到数据帧长度的随机数,然后在该索引处对数据帧进行切片。

import pandas as pd
import numpy as np

# create a fake data frame
index = pd.DatetimeIndex(start='2015-11-20', end='2016-04-30', freq='30min')
df = pd.DataFrame(np.random.normal(loc=10, size=len(index)), index=index, columns=['vals'])

# set the block size and the number of samples
block_size = 6
num_samples = 3
samples = [df.iloc[x:x+block_size] for x in np.random.randint(len(df), size=num_samples)]

# check results
samples[0]
vals
2016-01-06 00:30:00 10.313824
2016-01-06 01:00:00 9.445082
2016-01-06 01:30:00 11.952581
2016-01-06 02:00:00 9.496415
2016-01-06 02:30:00 10.404322
2016-01-06 03:00:00 8.506910

samples[1]
vals
2015-12-23 02:00:00 10.472048
2015-12-23 02:30:00 10.276933
2015-12-23 03:00:00 10.013481
2015-12-23 03:30:00 11.293218
2015-12-23 04:00:00 10.258379
2015-12-23 04:30:00 9.543600

samples[2]
vals
2016-01-10 06:00:00 10.809594
2016-01-10 06:30:00 8.953594
2016-01-10 07:00:00 10.254928
2016-01-10 07:30:00 9.911142
2016-01-10 08:00:00 10.377016
2016-01-10 08:30:00 11.907871

关于python - Pandas 中的随机数据 block ,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45938227/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com