gpt4 book ai didi

r - 如何根据 r 中的有序向量替换列中的所有值

转载 作者:行者123 更新时间:2023-12-04 09:30:33 25 4
gpt4 key购买 nike

我正在尝试用有序类别替换数据框列中的所有数值。这是一个虚拟数据框:

df <- data.frame(a = c(1:100), b = sample(c(0,20), size = 100, replace = TRUE), c = c(1:100))

请注意,实际的数据框是使用 haven::read_dta() 导入的 dta 文件。实际数据框可以在 GSS here 上找到.我正在处理 2018 年的文件,并想用一组类别替换 b 中的所有值,即 0 到 20,如下所示:

educ_vec <- c("No formal schooling", "1st grade", "2nd grade", "3rd grade", "4th grade", "5th grade", "6th grade", "7th grade", "8th grade", "9th grade", "10th grade", "11th grade", "12th grade", "1 year of college", "2 years of college", "3 years of college", "4 years of college", "5 years of college", "6 years of college", "7 years of college", "8 years of college")
educ_fac <- factor(educ_vec, ordered = TRUE, levels = educ_vec)

如果我对每个类别都使用 mutateifelse,这个过程太长了,而且它不会保留 educ_fac 中的顺序.我尝试了几种方法一步到位,但没有成功。一种方法是这样的:

gss_df %>% 
mutate(educ = fct_recode(educ,
"No formal schooling" = 0,
"1st grade" = 1,
"2nd grade" = 2,
"3rd grade" = 3,
"4th grade" = 4,
"5th grade" = 5,
"6th grade" = 6,
"7th grade" = 7,
"8th grade" = 8,
"9th grade" = 9,
"10th grade" = 10,
"11th grade" = 11,
"12th grade" = 12,
"1 year of college" = 13,
"2 years of college" = 14,
"3 years of college" = 15,
"4 years of college" = 16,
"5 years of college" = 17,
"6 years of college" = 18,
"7 years of college" = 19,
"8 years of college" = 20))

Error: `f` must be a factor (or character vector or numeric vector).

其他两种方式类似,但均未成功:

gss_df %>% 
mutate(educ = fct_recode(educ, educ_fac))

Error: `f` must be a factor (or character vector or numeric vector).
gss_df %>% 
mutate(educ = recode_factor(educ, educ_vec, ordered = TRUE))

Error in UseMethod("recode") : no applicable method for 'recode' applied to an object of class "haven_labelled"

谁能解决这个问题?

最佳答案

由于某些原因无法读入dta文件,下面我模拟数据给大家展示一下我的建议。您从 educ_vec 向量开始。

educ_vec <- c("No formal schooling", "1st grade", 
"2nd grade", "3rd grade", "4th grade", "5th grade",
"6th grade", "7th grade", "8th grade", "9th grade",
"10th grade", "11th grade", "12th grade", "1 year of college",
"2 years of college", "3 years of college", "4 years of college",
"5 years of college", "6 years of college", "7 years of college",
"8 years of college")

如果你查看 educ_vec ,它已经是你想要的格式了

# this is meant for 0
educ_vec[1]
[1] "No formal schooling"
# this is meant for 20
educ_vec[21]
[1] "8 years of college"

如果你的分数是 i,新的分类值将是 educ_vec[i+1];所以我们可以在下面使用它:

set.seed(100)
gss_df <- data.frame(educ=sample(0:20,30,replace=TRUE))
gss_df %>%
mutate(new=factor(educ_vec[educ+1],ordered = TRUE, levels = educ_vec))

educ new
1 9 9th grade
2 5 5th grade
3 15 3 years of college
4 18 6 years of college
5 13 1 year of college
6 11 11th grade
7 5 5th grade
8 3 3rd grade
9 5 5th grade
10 1 1st grade
11 6 6th grade
12 6 6th grade
13 10 10th grade
14 17 5 years of college
15 11 11th grade
16 2 2nd grade
17 18 6 years of college
18 7 7th grade
19 17 5 years of college
20 1 1st grade
21 18 6 years of college
22 3 3rd grade
23 3 3rd grade
24 19 7 years of college
25 15 3 years of college
26 20 8 years of college
27 6 6th grade
28 15 3 years of college
29 10 10th grade
30 19 7 years of college

是的,如果在数据中找不到某些因素,它会起作用:

gss_df <- data.frame(educ=0:5)%>%
mutate(new=factor(educ_vec[educ+1],ordered = TRUE, levels = educ_vec))

educ new
1 0 No formal schooling
2 1 1st grade
3 2 2nd grade
4 3 3rd grade
5 4 4th grade
6 5 5th grade

您可以看到新列是预期类别的一个因素。

str(gss_df)
'data.frame': 6 obs. of 2 variables:
$ educ: int 0 1 2 3 4 5
$ new : Ord.factor w/ 21 levels "No formal schooling"<..: 1 2 3 4 5 6

如果您的分数不在 0-20 范围内,例如 -1、-2 或 21,22 等,那么我建议您执行以下操作:

names(educ_vec) = 0:20
gss_df <- data.frame(educ=c(-1,0,20,21))
# you can also use mutate
gss_df$new <- educ_vec[match(gss_df$educ,names(educ_vec))]
gss_df

educ new
1 -1 <NA>
2 0 No formal schooling
3 20 8 years of college
4 21 <NA>

如果在你的 educ_vec 中找不到对应的名字,Match 将返回一个 NA

关于r - 如何根据 r 中的有序向量替换列中的所有值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59079478/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com