gpt4 book ai didi

python - 使用 iloc 建立索引

转载 作者:行者123 更新时间:2023-11-30 09:37:26 25 4
gpt4 key购买 nike

现在正在浏览 Kaggle 教程,虽然我通过查看输出和阅读文档了解了它的作用的基本概念,但我认为我需要确认这里发生的事情:

predictors = ["Pclass", "Sex", "Age", "SibSp", "Parch", "Fare", "Embarked"]
kf = KFold(titanic.shape[0], n_folds=3, random_state=1)

predictions = []

for train, test in kf:
train_predictors = (titanic[predictors].iloc[train,:])

我的主要问题是 iloc 函数的最后一行。其余的只是为了上下文。它只是将训练数据分开?

最佳答案

.iloc[]是访问row的主要方法和column索引 pandas DataFrames (或 Series ,在本例中仅 index )。解释得很好in the Indexing docs

在本例中,来自 scikit-learn docs :

KFold divides all the samples in k groups of samples, called folds (if k = n, this is equivalent to the Leave One Out strategy), of equal sizes (if possible). The prediction function is learned using k - 1 folds, and the fold left out is used for test. Example of 2-fold cross-validation on a dataset with 4 samples:

import numpy as np
from sklearn.cross_validation import KFold

kf = KFold(4, n_folds=2)
for train, test in kf:
print("%s %s" % (train, test))
[2 3] [0 1] [0 1] [2 3]

换句话说,KFold选择index位置,这些用于 for循环kf并传递至.iloc这样就选择了适当的row index (以及所有 columns )来自 titanic[predictors] DataFrame包含训练集。

关于python - 使用 iloc 建立索引,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34200874/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com