python - 在 Python 中将单列拆分为多个子列/数组-6ren

python - 在 Python 中将单列拆分为多个子列/数组

转载作者：太空宇宙更新时间：2023-11-03 14:05:08

我正在尝试在 Python 中实现决策树算法来预测丢失的输入数据。

假设我有一列包含 99 个条目。在这 99 个条目中，有 20 个为 NaN。我想将这个单个数组分解为 x 个大小为 y(在本例中为 y = 5)的子数组。

具有完整单元的子数组被分配给特征，而包含NaN的子数组被分配给目标。

 # breaking target array into subarrays
subarray_size = 5
target = []
features = []

# complete break up and assign to array "chunks"
chunks = [test[x : x + subarray_size] for x in xrange(0, len(test), subarray_size)]

# assigns NaN containg subarray to "target" and filled subarrays to "features"
for i in chunks:
    if (np.where(np.isnan(i)))[0].shape[0]: 
        target.append(i)
    else:
        features.append(i)

代码一直有效，直到 for 循环结束。现在我有了功能和目标，我尝试了下面的代码块

from sklearn.cross_validation import train_test_split as tts

X_train, X_test, y_train, y_test = tts(features, target, test_size=0.2)

产生了这个错误:

    202     if len(uniques) > 1:
    203         raise ValueError("Found input variables with inconsistent numbers of"
--> 204                          " samples: %r" % [int(l) for l in lengths])
    205 
    206 

ValueError: Found input variables with inconsistent numbers of samples: [5, 15].

我认为错误发生在数组操作期间的某个地方。我很难修复它。有什么建议/见解/建议吗？

编辑:下面是示例“测试”列。不知道如何将其放入表格格式。抱歉视觉效果不佳。

Site2_ThirdIonizationEnergy

39.722
39.722
33.667
39.722
39.722
23.32
25.04
NaN
27.491
22.99
39.722
23.32
25.04
NaN
27.491
22.99
33.667
23.32
33.667
NaN
27.491
22.99
39.722
23.32
25.04
NaN
27.491
22.99
19.174
19.174
19.174
19.174
39.722
39.722
33.667
39.722
39.722
23.32
25.04
NaN
27.491
22.99
39.722
23.32
25.04
NaN
27.491
22.99
33.667
23.32
33.667
NaN
27.491
22.99
39.722
23.32
25.04
NaN
27.491
22.99
39.722
39.722
33.667
39.722
39.722
39.722
33.667
39.722
39.722
23.32
25.04
NaN
27.491
22.99
39.722
23.32
25.04
NaN
27.491
22.99
33.667
23.32
33.667
NaN
27.491
22.99
39.722
23.32
25.04
NaN
27.491
22.99
21.62
21.62
21.62
21.62
39.722
39.722
33.667