gpt4 book ai didi

r - 为数值因子水平赋值

转载 作者:行者123 更新时间:2023-12-04 11:55:58 25 4
gpt4 key购买 nike

<分区>

我已经浏览过不同的链接,例如:How to convert a factor to an integer\numeric without a loss of information?

但无法解决问题

我有一个数据框

 SYMBOL             PVALUE1             PVALUE2
1 10-Mar 0.813027629406118 0.78820189558684
2 10-Sep 0.00167287722066533 0.00167287722066533
3 11-Mar 0.21179810441316 0.464576340307205
4 11-Sep 0.00221961024320294 0.00221961024320294
5 12-Sep 0.934667427815304 0.986884425214009
6 15-Sep 0.00167287722066533 0.00167287722066533
7 1-Dec 0.464576340307205 0.0911572830792113
8 1-Mar 0.00818426308604705 0.0252302356363697
9 1-Sep 0.60516237199519 0.570568468332992
10 2-Mar 0.0103975819620539 0.00382292568622066
11 2-Sep 0.00167287722066533 0.00167287722066533

当我尝试 str()

str(df)
'data.frame': 20305 obs. of 3 variables:
$ SYMBOL : Factor w/ 21050 levels "","10-Mar","10-Sep",..: 2 3 4 5 6 7 8 9 10 11 ...
$ PVALUE1: Factor w/ 209 levels "0","0.000109570493049298",..: 169 22 110 24 181 22 139 39 149 44 ...
$ PVALUE2: Factor w/ 216 levels "0","0.000109570493049298",..: 172 20 141 23 201 20 90 61 150 29 ...

我试试 mode()

sapply(df,mode)
SYMBOL PVALUE1 PVALUE2
"numeric" "numeric" "numeric"

当我尝试根据以下条件为两个数字列 (2,3) 分配值时

df$Score <- rowSums(ifelse(df[,-1]==0, 0, 
ifelse(df[, -1]<= 0.05, 2, ifelse(df[,-1]>= 0.065,-2,1))))

I get Warning messages:
1: In Ops.factor(left, right) : ‘<=’ not meaningful for factors
2: In Ops.factor(left, right) : ‘<=’ not meaningful for factors
3: In Ops.factor(left, right) : ‘>=’ not meaningful for factors
4: In Ops.factor(left, right) : ‘>=’ not meaningful for factors

输出是这样的:

SYMBOL             PVALUE1             PVALUE2       Score
1 10-Mar 0.813027629406118 0.78820189558684 NA
2 10-Sep 0.00167287722066533 0.00167287722066533 NA
3 11-Mar 0.21179810441316 0.464576340307205 NA
4 11-Sep 0.00221961024320294 0.00221961024320294 NA
5 12-Sep 0.934667427815304 0.986884425214009 NA
6 15-Sep 0.00167287722066533 0.00167287722066533 NA

如果因子已经是数字,为什么上面的代码不起作用并给出 NA。我应该如何进行。

编辑 dput()

structure(list(SYMBOL = structure(1:6, .Label = c("10-Mar", "10-Sep", 
"11-Mar", "11-Sep", "12-Sep", "15-Sep"), class = "factor"), PVALUE1 = structure(c(4L,
1L, 3L, 2L, 5L, 1L), .Label = c("0.00167287722066533", "0.00221961024320294",
"0.21179810441316", "0.813027629406118", "0.934667427815304"), class = "factor"),
PVALUE2 = structure(c(4L, 1L, 3L, 2L, 5L, 1L), .Label = c("0.00167287722066533",
"0.00221961024320294", "0.464576340307205", "0.78820189558684",
"0.986884425214009"), class = "factor")), .Names = c("SYMBOL",
"PVALUE1", "PVALUE2"), row.names = c(NA, 6L), class = "data.frame")

我也试过这个:

  indx <- sapply(df, is.factor)
df[indx] <- lapply(df[indx], function(x) as.numeric(levels(x))[x])

indx returns

SYMBOL PVALUE1 PVALUE2
TRUE TRUE TRUE
Warning message:
In FUN(X[[3L]], ...) : NAs introduced by coercion

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com