gpt4 book ai didi

python - 在 Python 中将时间序列数据拆分为训练测试集和有效集

转载 作者:行者123 更新时间:2023-11-30 08:56:24 24 4
gpt4 key购买 nike

我正在开发一个项目,其中合并了 2 个时间序列数据集(例如 D1、D2)。 D1 的间隔为 5 分钟,而 D2 的间隔为 1 分钟,因此我转换了D1 为 1 分钟间隔,并与 D2 结合。现在我想根据这些条件将这个新数据集 D1D2 拆分为训练集、测试集和有效集:

Note: I have searched a lot and try to find a solution for my problem but couldn't any answer fit to my question, so don't mark this as duplicate, please!

  1. 有效集应为从数据集末尾算起的 60 个值。
  2. 然后,测试集应该是最新的值,直到有效集
  3. 然后,我将使用剩余数据设置训练集。

这是我现在进行分割的方式:

def split_train_test(dataset, train_size, test_size):
train = dataset[:train_size, :]
test = dataset[test_size:, :]
# split into input and outputs
train_X, train_y = train[:, :-1], train[:, -1]
test_X, test_y = test[:, :-1], test[:, -1]
# reshape input to be 3D [samples, timesteps, features]
train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))
test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))
print(train_X.shape, train_y.shape, test_X.shape)
return train, test, train_X, train_y, test_X, test_y

但现在我需要根据上述条件转换为train、test和split?

我怎样才能做到这一点?这也是分割时间序列数据集的正确方法吗?

最佳答案

试试这个:

valid_set = dataset.iloc[-60:, :]
test_set = dataset.iloc[-120:-60]
train_set = dataset.iloc[:-120]

概括:

def split_train_test(dataset, validation_size):
valid = dataset.iloc[-validation_size:, :]
train_test = dataset.iloc[:-validation_size)]

train_length = int(0.63 * len(train_test))

# split into input and outputs
train_X, train_y = train_test.iloc[:train_length, :-1], train_test.iloc[:train_length, -1]
test_X, test_y = train_test.iloc[train_length:, :-1], train_test.iloc[train_length:, -1]
valid_X, valid_y = valid.iloc[:, :-1], valid.iloc[:, -1]

return train_test, valid, train_X, train_y, test_X, test_y, valid_X, valid_y

您可以将%分割率作为参数传递到函数中,而不是像我一样将其硬编码到函数中。

关于python - 在 Python 中将时间序列数据拆分为训练测试集和有效集,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58974674/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com