gpt4 book ai didi

r - 如何修复 R : [. data.frame(newdata, , object$method$center, drop = FALSE) 中的此错误:选择了未定义的列

转载 作者:行者123 更新时间:2023-11-30 08:44:53 30 4
gpt4 key购买 nike

我正在尝试从一篇论文中重新创建一个随机森林模型,但代码似乎不起作用,我只是刚刚学习 R,这超出了我的理解范围,但我会尽力解释。

论文的源代码可以在这里找到:[( https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0181347.s002&type=supplementary)]

本文提供了两个数据集:训练和测试,然后为每个数据集创建两个子集(请参阅文本底部的数据 head()。Data can be found in supplementary paper here

(应该能够直接复制到.csv)代码如下:

sink("test.txt", split=TRUE)
print("#data process")
data_bin_train<-read.csv("training.csv", head=TRUE)
names(data_bin_train)
data_bin_test<-read.csv("test.csv", head=TRUE)
names(data_bin_test)
dspt_bin_train<-subset(data_bin_train,select=c(-Deamidation))
dspt_bin_test<-subset(data_bin_test,select=c(-Deamidation))
class_bin_train<-subset(data_bin_train, select=c(Deamidation))
class_bin_test<-subset(data_bin_test, select=c(Deamidation))

library("caret")
library("ROCR")
library("pROC")
fitControl <- trainControl(method = "CV",number = 10,returnResamp = "all", verboseIter = FALSE, classProbs = TRUE)
set.seed(2)

这个位工作正常。然后下一段代码是我收到错误的地方:

library("randomForest")
print("#Random Forest binary class via caret (randomForest)")
caret_rf_bin_randomf_cv10 <- train(Deamidation~., data=data_bin_train, method = "rf", preProcess = c("center", "scale"), tuneLength = 10, trControl = fitControl)
caret_rf_bin_randomf_cv10
varImp(caret_rf_bin_randomf_cv10)

rf_bin_Preds <- extractPrediction(list(caret_rf_bin_randomf_cv10),testX=dspt_bin_test[,1:13], testY=class_bin_test[,1])

[.data.frame(newdata, , object$method$center, drop = FALSE) 中出现错误:选择了未定义的列`任何帮助都会很棒!该论文使用了 R v 3.1.1 caret_6.0-35,而我正在运行两者的更新版本,这就是我相信错误来自的地方,但我不确定如何修复它,或者说实话错误甚至是。

谢谢

蒂诺马斯

下面是两个数据集的`sessionInfo()和Head()

R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)

Matrix products: default

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] randomForest_4.6-14 pROC_1.15.3 ROCR_1.0-7 gplots_3.0.1.1 caret_6.0-84
[6] ggplot2_3.2.1 lattice_0.20-38

loaded via a namespace (and not attached):
[1] Rcpp_1.0.3 pillar_1.4.3 compiler_3.5.3 gower_0.2.1 plyr_1.8.5 bitops_1.0-6
[7] iterators_1.0.12 class_7.3-15 tools_3.5.3 rpart_4.1-13 ipred_0.9-9 lubridate_1.7.4
[13] lifecycle_0.1.0 tibble_2.1.3 nlme_3.1-137 gtable_0.3.0 pkgconfig_2.0.3 rlang_0.4.2
[19] Matrix_1.2-15 foreach_1.4.7 rstudioapi_0.10 prodlim_2019.11.13 e1071_1.7-3 withr_2.1.2
[25] stringr_1.4.0 dplyr_0.8.3 caTools_1.17.1.3 gtools_3.8.1 generics_0.0.2 recipes_0.1.8
[31] stats4_3.5.3 grid_3.5.3 nnet_7.3-12 tidyselect_0.2.5 data.table_1.12.8 glue_1.3.1
[37] R6_2.4.1 survival_2.43-3 gdata_2.18.0 lava_1.6.6 reshape2_1.4.3 purrr_0.3.3
[43] magrittr_1.5 ModelMetrics_1.2.2 scales_1.1.0 codetools_0.2-16 MASS_7.3-51.1 splines_3.5.3
[49] assertthat_0.2.1 timeDate_3043.102 colorspace_1.4-1 KernSmooth_2.23-15 stringi_1.4.3 lazyeval_0.2.2
[55] munsell_0.5.0 crayon_1.3.4

训练.txt

PDB   `Residue #` `AA following A… attack_distance Half_life norm_B_factor_C norm_B_factor_CA norm_B_factor_CB norm_B_factor_CG
<chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 11BG 67 GLY 3.84 1.02 1.46 1.46 1.36 1.38
2 11BG 17 SER 4.81 11.8 0.692 0.706 1.18 1.62
3 11BG 71 CYS 4.11 55.5 0.174 0.481 0.574 0.782
4 11BG 44 THR 3.33 49.9 -1.24 -1.30 -1.35 -1.52
5 11BG 94 CYS 4.97 60 1.41 1.64 1.92 2.15
6 11BG 27 LEU 4.52 119 -0.898 -0.905 -0.820 -0.604

test.txt

PDB   `Residue #` `AA following A… attack_distance Half_life norm_B_factor_C norm_B_factor_CA norm_B_factor_CB norm_B_factor_CG
<chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1ACC 713 GLY 3.69 1.45 3.17 3.35 3.63 4.06
2 1ACC 719 GLY 4.64 1.04 0.688 0.865 1.42 1.83
3 1ACC 28 PHE 4.81 72.4 1.03 1.06 1.58 1.95
4 1ACC 52 ILE 4.73 279 0.944 1.13 1.29 1.46
5 1ACC 85 HIS 3.60 9.7 0.780 0.800 1.16 1.57
6 1ACC 104 LYS 4.51 55.5 2.22 2.47 2.69 2.91

最佳答案

问题出在两个功能上:

“PDB” - 根本不应该使用,因为它只是序列加入。

“AA.following.Asn” - 测试集不包含训练集的所有级别。

levels(data_bin_train[,3])
#output
[1] "ALA" "ARG" "ASP" "CYS" "GLU" "GLY" "HIS" "ILE" "LEU" "LYS" "MET" "PHE" "PRO" "SER" "THR" "TRP" "TYR" "VAL"

levels(data_bin_test[,3])
#output
[1] "ALA" "ARG" "ASP" "GLU" "GLY" "HIS" "ILE" "LEU" "LYS" "PHE" "PRO" "SER" "THR" "TRP" "TYR" "VAL"

如果从集合中省略这两个功能,代码将运行并产生合理的结果

library(caret)
fitControl <- trainControl(method = "CV",
number = 10,
returnResamp = "all",
verboseIter = FALSE,
classProbs = TRUE)
set.seed(2)
caret_rf_bin_randomf_cv10 <- train(Deamidation~.,
data = data_bin_train[,-c(1,3)],
method = "rf",
preProcess = c("center", "scale"),
tuneLength = 10,
trControl = fitControl)

rf_bin_Preds <- extractPrediction(list(caret_rf_bin_randomf_cv10),
testX = data_bin_test[,-c(1,3)],
testY = class_bin_test$Deamidation)

head(rf_bin_Preds)
#output
obs pred model dataType object
1 Yes Yes rf Training Object1
2 No No rf Training Object1
3 No No rf Training Object1
4 No No rf Training Object1
5 No No rf Training Object1
6 No No rf Training Object1

关于r - 如何修复 R : [. data.frame(newdata, , object$method$center, drop = FALSE) 中的此错误:选择了未定义的列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59681559/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com