gpt4 book ai didi

python - 训练测试拆分中 Shuffle 和 Random_State 之间的区别?

转载 作者:行者123 更新时间:2023-12-04 12:11:41 33 4
gpt4 key购买 nike

我在一个小数据集样本上尝试了这两种方法,它返回了相同的输出。那么问题来了,scikit的train-test-split方法中的“shuffle”和“random_state”参数有什么区别?

MWE 代码:

X, y = np.arange(10).reshape((5, 2)), range(5)
train_test_split(y, shuffle=False)

Out: [[0, 1, 2], [3, 4]]

train_test_split(y, random_state=0)

Out: [[0, 1, 2], [3, 4]]

最佳答案

有时,试验可能有助于理解函数的工作原理。

假设您有这样的 DataFrame:

   X  Y
0 A 2
1 A 3
2 A 2
3 B 0
4 B 0

我们将讨论您可以使用函数 train_test_split 执行的不同操作:

  • 如果您输入 train, test = train_test_split(df, test_size=2/5, shuffle=False, random_state=None) ,你总是会得到:
  • # TRAIN
    X Y
    0 A 2
    1 A 3
    2 A 2

    #TEST
    X Y
    3 B 0
    4 B 0

  • 如果您输入 train, test = train_test_split(df, test_size=2/5, shuffle=False, random_state=1)或任何其他整数 random_state ,你会得到相同的:
  • # TRAIN
    X Y
    0 A 2
    1 A 3
    2 A 2

    #TEST
    X Y
    3 B 0
    4 B 0

    This comes from the fact that you decided not to shuffle your dataset, so random_state is not used by the function.



  • 现在,如果你这样做 train, test = train_test_split(df, test_size=2/5, shuffle=True, random_state=None) ,您将获得如下所示的数据集:
  • # TRAIN
    X Y
    4 B 0
    0 A 2
    1 A 3

    # TEST
    X Y
    2 A 2
    3 B 0

    Note that entries have been shuffled. But note as well that if you run your code again, results might differ.



  • 最后,如果你这样做 train, test = train_test_split(df, test_size=2/5, shuffle=True, random_state=1)或任何其他整数 random_state ,您还将获得两个带有混洗条目的数据集:
  • # TRAIN
    X Y
    4 B 0
    0 A 2
    3 B 0

    # TEST
    X Y
    2 A 2
    1 A 3

    Only, this time, if you run the code again with the same random_state, the output will always remain the same. You have set a seed, which is useful for reproducibility of the results!

    关于python - 训练测试拆分中 Shuffle 和 Random_State 之间的区别?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58955816/

    33 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com