gpt4 book ai didi

python-3.x - 使用分层抽样将 pandas Dataframe 分成 4 个部分

转载 作者:行者123 更新时间:2023-12-05 07:07:58 30 4
gpt4 key购买 nike

我想通过分层抽样将一个 Dataframe 分成 4 个部分。确保所有类别形成“B”列应该出现在每个 block 中。如果任何类别没有足够的记录用于所有 block ,则将相同的记录复制到剩余的 block 中。

df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
'foo', 'bar', 'foo', 'foo',
'foo', 'bar', 'foo', 'bar',
'foo', 'bar', 'foo', 'foo', 'bar'],
'B' : ['one', 'one', 'two', 'three',
'two', 'two', 'one', 'three',
'one', 'one', 'two', 'three',
'two', 'two', 'one', 'three', 'four'],
'C' : np.random.randn(17), 'D' : np.random.randn(17)})

print(df)

A B C D
0 foo one 0.960627 0.318723
1 bar one 0.269439 -0.945565
2 foo two 0.210376 0.765680
3 bar three -0.375095 -1.617334
4 foo two -1.910716 -0.532117
5 bar two -0.277426 0.019717
6 foo one -0.260074 1.384464
7 foo three 0.072119 -1.077725
8 foo one 0.093446 -0.683513
9 bar one -0.154885 -1.453996
10 foo two -1.258207 1.406615
11 bar three -0.003332 -0.083092
12 foo two 1.250562 0.519337
13 bar two -0.837681 -1.465363
14 foo one -0.403992 -0.133496
15 foo three -0.757623 -0.459532
16 bar four -2.071840 0.802953

输出应该如下所示(“B”列中的所有类别都应出现在每个 block 中。索引无关紧要)

     A      B         C         D
0 foo one 0.200466 -0.394136
2 foo two 0.086008 -0.528286
3 bar three -1.979613 -1.345405
8 foo one -1.195563 -0.832880
15 foo three -0.737060 -0.437047
16 bar four -2.071840 0.802953

A B C D
1 bar one 1.177119 0.693766
4 foo two 0.452803 -0.595433
7 foo three 1.285687 1.107021
12 foo two 1.746976 1.449390
16 bar four -2.071840 0.802953

A B C D
6 foo one -0.095485 0.129541
5 bar two 0.803417 -0.219461
7 foo three 1.285687 1.107021
13 bar two 1.166246 -1.711505
16 bar four -2.071840 0.802953

A B C D
9 bar one 2.001238 -0.283411
10 foo two 0.865580 0.052533
11 bar three -0.437604 -0.652073
14 foo one -0.655985 -0.942792
16 bar four -2.071840 0.802953

最佳答案

这可能有帮助:df1, df2, df3, df4 = np.array_split(x_train, 4)来自:Split large Dataframe into smaller equal dataframes

关于python-3.x - 使用分层抽样将 pandas Dataframe 分成 4 个部分,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61982072/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com