gpt4 book ai didi

python - 滚动窗口重新访问 - 添加窗口滚动数量作为参数 - 前瞻分析

转载 作者:行者123 更新时间:2023-12-04 14:53:57 24 4
gpt4 key购买 nike

我一直在网上搜索可以创建 滚动窗口 的方法,以便我可以以通用的方式对时间序列执行称为前向分析的交叉验证技术。

但是,我还没有解决任何在 1) 窗口大小方面具有灵活性的解决方案(几乎所有方法都有这个;例如, pandas rolling 或有点不同的 np.roll )和 2) 窗口滚动数量,理解为多少我们想要滚动窗口的索引(即尚未找到任何包含此内容的索引)。

我一直在尝试优化和制作简洁的代码,在 this answer @coldspeed 的帮助下(我无法在那里发表评论,因为我没有达到所需的声誉;希望尽快到达那里!),但我还没有' 无法纳入滚动窗口数量。

我的想法:

  • 我已经尝试将 np.roll 与下面的示例一起使用,但没有成功。
  • 我还尝试修改下面的代码,乘以 ith 值,但我无法将其放入我想维护的列表理解中。

  • 3. 下面的例子对任何窗口大小都很好,但是,它只将窗口“滚动”向前一步,我希望它可以推广到任何步骤。

    那么, ¿有没有办法在列表理解方法中使用这两个参数?或者,有没有我没有发现的其他资源可以使这更容易? 非常感谢所有帮助。我的示例代码如下:
    In [1]: import numpy as np
    In [2]: arr = np.random.random((10,3))

    In [3]: arr

    Out[3]: array([[0.38020065, 0.22656515, 0.25926935],
    [0.13446667, 0.04386083, 0.47210474],
    [0.4374763 , 0.20024762, 0.50494097],
    [0.49770835, 0.16381492, 0.6410294 ],
    [0.9711233 , 0.2004874 , 0.71186102],
    [0.61729025, 0.72601898, 0.18970222],
    [0.99308981, 0.80017134, 0.64955358],
    [0.46632326, 0.37341677, 0.49950571],
    [0.45753235, 0.55642914, 0.31972887],
    [0.4371343 , 0.08905587, 0.74511753]])

    In [4]: inSamplePercentage = 0.4
    In [5]: outSamplePercentage = 0.3 * inSamplePercentage

    In [6]: windowSizeTrain = round(inSamplePercentage * arr.shape[0])
    In [7]: windowSizeTest = round(outSamplePercentage * arr.shape[0])
    In [8]: windowTrPlusTs = windowSizeTrain + windowSizeTest

    In [9]: sliceListX = [arr[i: i + windowTrPlusTs] for i in range(len(arr) - (windowTrPlusTs-1))]

    给定窗口长度为 5 和窗口滚动数量为 2,我可以指定如下内容:
    Out [15]: 

    [array([[0.38020065, 0.22656515, 0.25926935],
    [0.13446667, 0.04386083, 0.47210474],
    [0.4374763 , 0.20024762, 0.50494097],
    [0.49770835, 0.16381492, 0.6410294 ],
    [0.9711233 , 0.2004874 , 0.71186102]]),
    array([[0.4374763 , 0.20024762, 0.50494097],
    [0.49770835, 0.16381492, 0.6410294 ],
    [0.9711233 , 0.2004874 , 0.71186102],
    [0.61729025, 0.72601898, 0.18970222],
    [0.99308981, 0.80017134, 0.64955358]]),
    array([[0.9711233 , 0.2004874 , 0.71186102],
    [0.61729025, 0.72601898, 0.18970222],
    [0.99308981, 0.80017134, 0.64955358],
    [0.46632326, 0.37341677, 0.49950571],
    [0.45753235, 0.55642914, 0.31972887]]),
    array([[0.99308981, 0.80017134, 0.64955358],
    [0.46632326, 0.37341677, 0.49950571],
    [0.45753235, 0.55642914, 0.31972887],
    [0.4371343 , 0.08905587, 0.74511753]])]

    (这包含了最后一个数组,尽管它的长度小于 5)。

    或者:
    Out [16]: 

    [array([[0.38020065, 0.22656515, 0.25926935],
    [0.13446667, 0.04386083, 0.47210474],
    [0.4374763 , 0.20024762, 0.50494097],
    [0.49770835, 0.16381492, 0.6410294 ],
    [0.9711233 , 0.2004874 , 0.71186102]]),
    array([[0.4374763 , 0.20024762, 0.50494097],
    [0.49770835, 0.16381492, 0.6410294 ],
    [0.9711233 , 0.2004874 , 0.71186102],
    [0.61729025, 0.72601898, 0.18970222],
    [0.99308981, 0.80017134, 0.64955358]]),
    array([[0.9711233 , 0.2004874 , 0.71186102],
    [0.61729025, 0.72601898, 0.18970222],
    [0.99308981, 0.80017134, 0.64955358],
    [0.46632326, 0.37341677, 0.49950571],
    [0.45753235, 0.55642914, 0.31972887]])]

    (只有 lenght == 5 的数组 -> 但是,这可以通过一个简单的掩码从上面的数组中导出)。

    编辑:忘记提及 this also - 如果 Pandas 滚动对象支持 iter 方法,则可以做一些事情。

    最佳答案

    所以,给我两分钱( 在@Ben.T 的所有帮助下),这里是创建前向分析基本工具的代码,以了解您的模型/模型将如何执行更普遍的方式。

    非 anchor 定 WFA

    def walkForwardAnal(myArr, windowSize, rollQty):

    from numpy.lib.stride_tricks import as_strided

    ArrRows, ArrCols = myArr.shape

    ArrItems = myArr.itemsize

    sliceQtyAndShape = (int((ArrRows - windowSize) / rollQty + 1), windowSize, ArrCols)
    print('The final view shape is {}'.format(sliceQtyAndShape))

    ArrStrides = (rollQty * ArrCols * ArrItems, ArrCols * ArrItems, ArrItems)
    print('The final strides are {}'.format(ArrStrides))

    sliceList = list(as_strided(myArr, shape=sliceQtyAndShape, strides=ArrStrides, writeable=False))

    return sliceList

    wSizeTr = 400
    wSizeTe = 100
    wSizeTot = wSizeTr + wSizeTe
    rQty = 200

    sliceListX = wf.walkForwardAnal(X, wSizeTot, rQty)
    sliceListY = wf.walkForwardAnal(y, wSizeTot, rQty)

    for sliceArrX, sliceArrY in zip(sliceListX, sliceListY):

    ## Consider having to make a .copy() of each array, so that we don't modify the original one.

    # XArr = sliceArrX.copy() and hence, changing Xtrain, Xtest = XArr[...]
    # YArr = sliceArrY.copy() and hence, changing Ytrain, Ytest = XArr[...]

    Xtrain = sliceArrX[:-wSizeTe,:]
    Xtest = sliceArrX[-wSizeTe:,:]

    Ytrain = sliceArrY[:-wSizeTe,:]
    Ytest = sliceArrY[-wSizeTe:,:]

    anchor 定 WFA
    timeSeriesCrossVal = TimeSeriesSplit(n_splits=5)

    for trainIndex, testIndex in timeSeriesCrossVal.split(X):
    ## Check if the training and testing quantities make sense. If not, increase or decrease the n_splits parameter.

    Xtrain = X[trainIndex]
    Xtest = X[testIndex]

    Ytrain = y[trainIndex]
    Ytest = y[testIndex]

    然后,您可以创建以下内容(在两种方法中的任何一种中)并继续建模:
            # Fit on training set only - The targets (y) are already encoded in dummy variables, so no need to standarize them.
    scaler = StandardScaler()
    scaler.fit(Xtrain)

    # Apply transform to both the training set and the test set.
    trainX = scaler.transform(Xtrain)
    testX = scaler.transform(Xtest)

    ## PCA - Principal Component Analysis #### APPLY PCA TO THE STANDARIZED TRAINING SET! :::: Fit on training set only.
    pca = PCA(.95)
    pca.fit(trainX)

    # Apply transform to both the training set and the test set.
    trainX = pca.transform(trainX)
    testX = pca.transform(testX)

    ## Predict and append predictions...

    一种非 anchor 固情况下的一种具有广义滚动窗数量的衬垫:
    sliceListX = [arr[i: i + wSizeTot] for i in range(0, arr.shape[0] - wSizeTot+1, rQty)]

    关于python - 滚动窗口重新访问 - 添加窗口滚动数量作为参数 - 前瞻分析,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53797035/

    24 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com