r - 为什么 as.factor 在 apply 内部使用时会返回一个字符？-6ren

r - 为什么 as.factor 在 apply 内部使用时会返回一个字符？

转载作者：行者123 更新时间：2023-12-03 23:13:03

31

4

我想使用 apply() 将变量转换为因子:

a <- data.frame(x1 = rnorm(100),
                x2 = sample(c("a","b"), 100, replace = T),
                x3 = factor(c(rep("a",50) , rep("b",50))))

a2 <- apply(a, 2,as.factor)
apply(a2, 2,class)

结果是:

         x1          x2          x3 
"character" "character" "character"

我不明白为什么这会导致字符向量而不是因子向量。

最佳答案

apply将您的 data.frame 转换为字符矩阵。使用 lapply :

lapply(a, class)
# $x1
# [1] "numeric"
# $x2
# [1] "factor"
# $x3
# [1] "factor"

在第二个命令中应用将结果转换为字符矩阵，使用 lapply :

a2 <- lapply(a, as.factor)
lapply(a2, class)
# $x1
# [1] "factor"
# $x2
# [1] "factor"
# $x3
# [1] "factor"

但是对于简单的监视，您可以使用 str :

str(a)
# 'data.frame':   100 obs. of  3 variables:
#  $ x1: num  -1.79 -1.091 1.307 1.142 -0.972 ...
#  $ x2: Factor w/ 2 levels "a","b": 2 1 1 1 2 1 1 1 1 2 ...
#  $ x3: Factor w/ 2 levels "a","b": 1 1 1 1 1 1 1 1 1 1 ...

根据评论补充说明:

为什么 lapply 有效而 apply 无效？

第一件事就是 apply do 是将参数转换为矩阵。所以 apply(a)相当于 apply(as.matrix(a)) .如您所见 str(as.matrix(a))给你:

chr [1:100, 1:3] " 0.075124364" "-1.608618269" "-1.487629526" ...
- attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:3] "x1" "x2" "x3"

没有更多的因素，所以 class返回 "character"对于所有列。 lapply在列上工作，所以给你你想要的(它对每一列做类似 class(a$column_name) 的事情)。

您可以在帮助中查看 apply为什么 apply和 as.factor不起作用:

In all cases the result is coerced by as.vector to one of the basic vector types before the dimensions are set, so that (for example) factor results will be coerced to a character array.

为什么 sapply和 as.factor不起作用，您可以在帮助中查看 sapply :

Value (...) An atomic vector or matrix or list of the same length as X (...) If simplification occurs, the output type is determined from the highest type of the return values in the hierarchy NULL < raw < logical < integer < real < complex < character < list < expression, after coercion of pairlists to lists.

你永远不会得到因子矩阵或 data.frame。

如何将输出转换为 data.frame ?

简单，使用 as.data.frame正如你在评论中写道:

a2 <- as.data.frame(lapply(a, as.factor))
str(a2)
'data.frame':   100 obs. of  3 variables:
 $ x1: Factor w/ 100 levels "-2.49629293159922",..: 60 6 7 63 45 93 56 98 40 61 ...
 $ x2: Factor w/ 2 levels "a","b": 1 1 2 2 2 2 2 1 2 2 ...
 $ x3: Factor w/ 2 levels "a","b": 1 1 1 1 1 1 1 1 1 1 ...

但是如果你想用 factor 替换选定的字符列有一个技巧:

a3 <- data.frame(x1=letters, x2=LETTERS, x3=LETTERS, stringsAsFactors=FALSE)
str(a3)
'data.frame':   26 obs. of  3 variables:
 $ x1: chr  "a" "b" "c" "d" ...
 $ x2: chr  "A" "B" "C" "D" ...
 $ x3: chr  "A" "B" "C" "D" ...

columns_to_change <- c("x1","x2")
a3[, columns_to_change] <- lapply(a3[, columns_to_change], as.factor)
str(a3)
'data.frame':   26 obs. of  3 variables:
 $ x1: Factor w/ 26 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ x2: Factor w/ 26 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ x3: chr  "A" "B" "C" "D" ...

您可以使用它来替换所有列:

a3 <- data.frame(x1=letters, x2=LETTERS, x3=LETTERS, stringsAsFactors=FALSE)
a3[, ] <- lapply(a3, as.factor)
str(a3)
'data.frame':   26 obs. of  3 variables:
 $ x1: Factor w/ 26 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ x2: Factor w/ 26 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ x3: Factor w/ 26 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...

关于r - 为什么 as.factor 在 apply 内部使用时会返回一个字符？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/2392216/

31

4

0

文章推荐： Spring Security OAuth - 如何禁用登录页面？

文章推荐： django-cms - 显示页面的高级设置时出现 Django CMS 错误

文章推荐： perl - perl 的#line 指令的目的是什么？

factor-lang - 良好的学习资源 Factor
就目前而言，这个问题不适合我们的问答形式。我们希望答案得到事实、引用或专业知识的支持，但这个问题可能会引起辩论、争论、投票或扩展讨论。如果您觉得这个问题可以改进并可能重新打开，visit the he
factor-lang - 您打算使用 Factor 编程语言吗？
就目前而言，这个问题不适合我们的问答形式。我们希望答案得到事实、引用或专业知识的支持，但这个问题可能会引起辩论、争论、投票或扩展讨论。如果您觉得这个问题可以改进并可能重新打开，visit the he
factor-lang - 将引用保留为 Factor 中的元组成员
我想将引用保留为 Factor 中元组的成员。但是，当我尝试对其执行“调用”时，出现错误“无法将调用应用于运行时计算值”。请注意，将函数标记为“内联”不会执行任何操作。示例代码: USING: ac
r - 为什么使用 as.factor() 而不仅仅是 factor()
我最近看到 Matt Dowle 用 as.factor() 写了一些代码，具体来说 for (col in names_factors) set(dt, j=col, value=as.facto
r - dplyr 和 group_by : factor vs no factor
(所描述的行为是一个错误!) 我不明白 group_by 对作为因素的列和不是因素的列的影响。下面分析这两种可能性: library(dplyr) df % group_by(height) %>%
factor-lang - 使用 Factor 构建 Web 应用程序
有没有人用 Factor 构建了一个 Web 应用程序？ ?您在此过程中遇到了哪些绊脚石或问题？最佳答案试试 this . 关于factor-lang - 使用 Factor 构建 Web 应用程
akka - 配置 core-pool-size-factor 和 max-pool-size-factor 设置的一般好做法是什么？
例如，如果 CPU 有四个内核和八个内核线程。我应该设置核心池大小因子一直设置到8？的一般尺寸是多少？最大池大小因子关于核心池大小因子 ? 我提到的其他设置是否与 Akka 配置相关？最
r - 尝试使用 "bnlearn"实现一个简单的朴素贝叶斯分类器。不断出现错误 "variables must be either numeric, factors or ordered factors"
我正在尝试通过重新创建给我的数据结果来在 R 中实现 NB 分类器。现在我只是对训练数据本身进行测试，看看准确性如何。数据集中有 29 个变量，其中一个称为“状态”。它有两个值:Win 和 Lose
factorization - 关于两个数之间关系的问题
当一个可被另一个整除时，数字的位之间是否存在任何关系？ 36位与9位或4位或12位、10位(1010)与5位(101)、21位(10101)与7位(00111)的位序列有什么关系？谢谢。如果有些句子
prime-factoring - 如何在没有除法的线性筛算法中寻找一个整数的因式分解？
我学会了一种叫做“线性筛”的算法https://cp-algorithms.com/algebra/prime-sieve-linear.html能够在线性时间内得到所有小于 N 的素数。这个算法有
r - factor() 中的未指定水平
我正在使用 R 中的一个数据集，它带有一个代码本，它基本上告诉我因子变量的不同级别的标签应该是什么。例如，使用密码本，我可以看到在我的“性别”变量中，0 是“女性”，1 是“男性”。我正在使用此信息相
R sapply is.factor
我试图将数据集分成具有因子变量和非因子变量的部分。我正在做类似的事情: 这部分工作: factorCols <- sapply(df1, is.factor) factorDf <- df1[,fa
R:为什么我在将列转换为因子后没有得到类型或类 "factor"？
我有以下设置。 df <- data.frame(aa = rnorm(1000), bb = rnorm(1000)) apply(df, 2, typeof) # aa bb
r - as.factor 不将整数转换为因子
我现在正在自学 R。我正在尝试使用以下内容将整数变量转换为分类变量。 train[, c("Store", "DayOfWeek")] str(mtcars) 'data.frame': 32
factor-lang - 如何对一个数进行因数立方？
我正在玩 Factor，试图对串联编程有一点了解。写一个词来平方一个数字是微不足道的: : square ( n -- n ) dup * ; 但对于我来说，我似乎无法弄清楚如何计算一个数字的立方:
factors - 找出一个数的两个因数，它们彼此相似或接近
给定一个数 x，我如何找到两个数 y 和 z，这样 x = y * z 和 y==Z 或者 y 和 z 彼此接近？此外，x、y、z 都是整数。示例: x = 16484, y=z=128; x=
R:为什么我在将列转换为因子后没有得到类型或类 "factor"？
我有以下设置。 df <- data.frame(aa = rnorm(1000), bb = rnorm(1000)) apply(df, 2, typeof) # aa bb
javascript - Factors 函数获取最小公倍数或素数
关闭。这个问题需要多问focused 。目前不接受答案。想要改进此问题吗？更新问题，使其仅关注一个问题 editing this post . 已关闭 3 年前。 Improve this ques
斯普伦克 : How to figure out replication Factor
如果你觉得这听起来很傻，我提前道歉，我是 splunk 的新手，并且学过 udemy 类(class)，但无法弄清楚这一点。 If I check my indexes.conf file in cl
r - 为什么在unicode字符串上的as.factor()对于每个操作系统都返回不同的结果？
为什么此代码:as.factor(c("\U201C", '"3', "1", "2", "\U00B5"))，在每个操作系统上返回因子级别的不同顺序？在Linux上: > as.factor(c(

首页

博学

6Ren·AI

商城

r - 为什么 as.factor 在 apply 内部使用时会返回一个字符？