gpt4 book ai didi

R:检查训练数据中的变量

转载 作者:行者123 更新时间:2023-11-30 09:07:26 25 4
gpt4 key购买 nike

我正在使用 RData 文件中提供给我的训练数据,以及我自己使用我认为训练数据中存在的所有列构建的数据框架。

args = commandArgs(trailingOnly=TRUE)

model = readRDS(args[1])
m = model[[1]]

infile = fread(newDataPath, header=T)
setDF(infile)
i = infile[,!colnames(infile) %in% c("chr", "pos", "end")]

predictions = predict(m, i)

不过,运行这个,我明白了新数据中缺少训练数据中的变量

使用 colnames(i),我可以在 newdata 中找到变量列表,但是我如何对训练数据执行相同的操作 - 也就是说,我想一想,randomForest 类的对象吗?

最佳答案

您可以使用 str 查看模型的结构以查找列名称所在的位置。

我假设您使用的是randomForest包,但对于其他模型来说也是同样的想法。

library('randomForest')

model <- randomForest(Species ~ ., data = iris, ntree=5)

str(model)
#> List of 19
#> $ call : language randomForest(formula = Species ~ ., data = iris, ntree = 5)
#> $ type : chr "classification"
#> $ predicted : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
#> ..- attr(*, "names")= chr [1:150] "1" "2" "3" "4" ...
#> $ err.rate : num [1:5, 1:4] 0.0862 0.0753 0.114 0.0714 0.0833 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr [1:4] "OOB" "setosa" "versicolor" "virginica"
#> $ confusion : num [1:3, 1:4] 45 0 0 0 41 8 0 3 35 0 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:3] "setosa" "versicolor" "virginica"
#> .. ..$ : chr [1:4] "setosa" "versicolor" "virginica" "class.error"
#> $ votes : matrix [1:150, 1:3] 1 1 1 1 1 1 1 1 1 1 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:150] "1" "2" "3" "4" ...
#> .. ..$ : chr [1:3] "setosa" "versicolor" "virginica"
#> $ oob.times : num [1:150] 1 2 1 1 3 1 2 2 2 2 ...
#> $ classes : chr [1:3] "setosa" "versicolor" "virginica"
#> $ importance : num [1:4, 1] 20.53 4.33 19.17 55.25
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
#> .. ..$ : chr "MeanDecreaseGini"
#> $ importanceSD : NULL
#> $ localImportance: NULL
#> $ proximity : NULL
#> $ ntree : num 5
#> $ mtry : num 2
#> $ forest :List of 14
#> ..$ ndbigtree : int [1:5] 9 17 35 11 19
#> ..$ nodestatus: int [1:35, 1:5] 1 1 -1 -1 1 1 -1 -1 -1 0 ...
#> ..$ bestvar : int [1:35, 1:5] 4 4 0 0 2 3 0 0 0 0 ...
#> ..$ treemap : int [1:35, 1:2, 1:5] 2 4 0 0 6 8 0 0 0 0 ...
#> ..$ nodepred : int [1:35, 1:5] 0 0 3 1 0 0 2 2 3 0 ...
#> ..$ xbestsplit: num [1:35, 1:5] 1.65 0.8 0 0 2.25 4.75 0 0 0 0 ...
#> ..$ pid : num [1:3] 1 1 1
#> ..$ cutoff : num [1:3] 0.333 0.333 0.333
#> ..$ ncat : Named int [1:4] 1 1 1 1
#> .. ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
#> ..$ maxcat : int 1
#> ..$ nrnodes : int 35
#> ..$ ntree : num 5
#> ..$ nclass : int 3
#> ..$ xlevels :List of 4
#> .. ..$ Sepal.Length: num 0
#> .. ..$ Sepal.Width : num 0
#> .. ..$ Petal.Length: num 0
#> .. ..$ Petal.Width : num 0
#> $ y : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
#> ..- attr(*, "names")= chr [1:150] "1" "2" "3" "4" ...
#> $ test : NULL
#> $ inbag : NULL
#> $ terms :Classes 'terms', 'formula' language Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width
#> .. ..- attr(*, "variables")= language list(Species, Sepal.Length, Sepal.Width, Petal.Length, Petal.Width)
#> .. ..- attr(*, "factors")= int [1:5, 1:4] 0 1 0 0 0 0 0 1 0 0 ...
#> .. .. ..- attr(*, "dimnames")=List of 2
#> .. .. .. ..$ : chr [1:5] "Species" "Sepal.Length" "Sepal.Width" "Petal.Length" ...
#> .. .. .. ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
#> .. ..- attr(*, "term.labels")= chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
#> .. ..- attr(*, "order")= int [1:4] 1 1 1 1
#> .. ..- attr(*, "intercept")= num 0
#> .. ..- attr(*, "response")= int 1
#> .. ..- attr(*, ".Environment")=<environment: 0x7f9bed91f8d8>
#> .. ..- attr(*, "predvars")= language list(Species, Sepal.Length, Sepal.Width, Petal.Length, Petal.Width)
#> .. ..- attr(*, "dataClasses")= Named chr [1:5] "factor" "numeric" "numeric" "numeric" ...
#> .. .. ..- attr(*, "names")= chr [1:5] "Species" "Sepal.Length" "Sepal.Width" "Petal.Length" ...
#> - attr(*, "class")= chr [1:2] "randomForest.formula" "randomForest"

attr(model$terms, 'term.labels')
#> [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"

attr(model$terms, 'dataClasses')
#> Species Sepal.Length Sepal.Width Petal.Length Petal.Width
#> "factor" "numeric" "numeric" "numeric" "numeric"

关于R:检查训练数据中的变量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48342105/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com