gpt4 book ai didi

r - 如何遍历所有数据集(并确定它们的列数)?

转载 作者:行者123 更新时间:2023-12-04 11:35:45 25 4
gpt4 key购买 nike

我想遍历所有可用(= 已安装)软件包的数据集和
找出这些数据集是否有 6 列或更多列。这是我的试验:

dat.list <- data(package=.packages(all.available=TRUE))$results # list of all installed packages
colnames(dat.list) # "Package" "LibPath" "Item" (= name of data set) "Title" (= description)
idx <- c()
i <- 3
## for(i in nrow(dat.list)) {
nme <- dat.list[[i,"Item"]] # data set as string
data(list=nme, package=dat.list[[i,"Package"]]) # load the data
## => fails with warning: In data(list = nme, package = dat.list[[i, "Package"]]) :
## data set 'BJsales.lead (BJsales)' not found
dat <- eval(as.name(nme)) # assign the data to the variable dat
ncl <- ncol(dat)
if(!is.null(ncl) && ncl >= 6) idx <- c(idx, i)
## }

它显然
不起作用,所以我修复了一个索引(此处:3)以查看失败的内容。如何(如果不是通过上面的 nme)确定数据集的名称,以便将数据集存储在变量中,然后访问其列数?

更新
结合 jeremycg 和 nico 的帖子,我想出了这个(再次:在找出数据集的名称方面并不完美,但它贯穿始终):
dat.list <- data(package=.packages(all.available=TRUE))$results # list of all installed packages
idx <- c()
for (i in 1:nrow(dat.list))
{
require(dat.list[i, "Package"], character.only=TRUE)
raw.name <- dat.list[i, "Item"] # data set (and parenthetical suffix) as raw string
name <- gsub('\\s.*','', raw.name) # name of data set
dat <- tryCatch(get(name), error=function(e) e) # assign the data to the variable dat (if not erroneous)
if(is(dat, "simpleError")) {
warning("Element ",i," threw an error")
dat <- NA
}
ncl <- ncol(dat)
if(!is.null(ncl) && ncl >= 6)
idx <- c(idx, i)
}
dat.list[idx, c("Package", "Item")]

最佳答案

我猜您需要加载包才能访问数据。

所以你需要在循环的开头添加:

require(dat.list[[i, "Package"]], character.only = TRUE)

(请参阅 this question 了解为什么需要使用 charachter.only 变量)

请注意,您还需要从以下位置更改循环:
for(i in nrow(dat.list))


for(i in 1:nrow(dat.list))

还有另一个问题:一些数据集返回的名称也在括号中。例如:
wine.classes (wine)

所以我们需要把它们去掉。使用以下方法轻松完成:
dat.list[,3] <- sapply(strsplit(dat.list[,3], " "), function(x){x[1]})

最后, dat.list可以使用 [] 访问, 不需要 [[]] (更容易阅读!)。

所以,最后:
# List of all installed packages
dat.list <- data(package=.packages(all.available=TRUE))$results

# Remove package name in parentheses
dat.list[,3] <- sapply(strsplit(dat.list[, "Item"], " "),
function(x){x[1]})

idx <- c()
for (i in 1:nrow(dat.list))
{
require(dat.list[i, "Package"], character.only = T)
nme <- dat.list[i,"Item"] # data set as string
data(list=nme, package=dat.list[i,"Package"]) # load the data

dat <- eval(as.name(nme)) # assign the data to the variable dat
ncl <- ncol(dat)
if(!is.null(ncl) && ncl >= 6)
idx <- c(idx, i)
}

和:
> dat.list[idx, "Item"]
[1] "Seatbelts" "USJudgeRatings" "WorldPhones" "airquality"
[5] "anscombe" "attitude" "crimtab" "euro.cross"
[9] "infert" "longley" "mtcars" "occupationalStatus"
[13] "state.x77" "swiss" "volcano" "car.test.frame"
[17] "car90" "solder" "stagec" "bladder"
[21] "bladder1" "bladder2" "cancer" "cgd"
[25] "cgd0" "colon" "flchain" "heart"
[29] "jasa" "jasa1" "kidney" "lung"
[33] "mgus" "mgus1" "mgus2" "nwtco"
[37] "ovarian" "pbc" "pbcseq" "rats2"
[41] "transplant" "veteran" "soldat" "patch"
[45] "tooth"

关于r - 如何遍历所有数据集(并确定它们的列数)?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32055864/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com