gpt4 book ai didi

r - 一个家庭长辈最高受教育年限如何计算

转载 作者:行者123 更新时间:2023-12-03 09:27:57 25 4
gpt4 key购买 nike

假设我有这样的数据框:

   family relationship meanings              edu
1 1 A respondent 12
2 1 B respondent's spouse 18
3 1 C A's father 10
4 1 D A's mother 9
5 1 E1 A's first son 15
6 1 F1 E1's spouse 14
7 1 G11 E1's first son 3
8 1 G12 E1's second son 1
9 1 E2 A's second son 13
10 2 A respondent 21
11 2 B respondent's spouse 16
12 2 C A's father 12
13 2 D A's mother 16
14 2 E1 A's first son 18
15 2 F1 E1's spouse 15
16 2 E2 A's second son 17
17 2 E3 A's third son 16
family表示家庭号码。 relationship表示一个家庭的关系。 meanings表示第二列的含义, relationship .

relationship in the first family

我想计算一个家庭中父代的最大受教育年限。
我们不需要配偶的信息。

预期结果如下:
   family id      edu fedu 
1 1 A 12 10
2 1 C 10 NA
3 1 E1 15 18
4 1 E2 13 18
5 1 G11 3 15
6 1 G12 1 15
7 2 A 21 16
8 2 C 12 NA
9 2 E1 18 21
10 2 E2 17 21
11 2 E3 16 21

这是数据:
 d = structure(list(family = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2), relationship = c("A", "B", "C", "D", "E1", "F1", "G11", "G12", "E2", "A", "B", "C", "D", "E1", "F1", "E2", "E3"), meanings = c("respondent", "respondent's spouse", "A's father","A's mother", "A's first son", "E1's spouse", "E1's first son","E1's second son", "A's second son", "respondent", "respondent's spouse","A's father", "A's mother", "A's first son", "E1's spouse", "A's second son","A's third son"), edu = c(12, 18, 10, 9, 15, 14, 3, 1, 13, 21,16, 12, 16, 18, 15, 17, 16)), row.names = c(NA, -17L), class = c("tbl_df", "tbl", "data.frame"))

最佳答案

这是我尝试过的。我认为有必要创建一个生成变量。查看您问题中的示例图片,CD是第一代。 AB是第2代。 EF是第三代,和G是第4代。第一个mutate()case_when()创建了生成变量。然后,我通过 family 定义了组和 generation .对于每组,我确定了最长教育时间(即 max_ed_duration)。因为你说你不需要配偶的信息,所以我在 meanings 中删除了包含母亲或配偶的行。 .然后,我再次使用 family 定义组.对于每个家庭,如果 generation为 1,将 NA 分配给 fedu .否则,分配 max_ed_duration值从上一代到fedu .最后我按family整理了数据和 relationship .

library(dplyr)

mutate(mydf, generation = case_when(relationship %in% c("C", "D") ~ 1,
relationship %in% c("A", "B") ~ 2,
grepl(x = relationship, pattern = "^E|F") ~ 3,
grepl(x = relationship, pattern = "^G") ~ 4)) %>%
group_by(family, generation) %>%
mutate(max_ed_duration = max(edu)) %>%
filter(!grepl(x = meanings, pattern = "mother|spouse")) %>%
group_by(family) %>%
mutate(fedu = if_else(generation == 1,
NA_real_,
max_ed_duration[match(x = generation - 1, table = generation)])) %>%
arrange(family, relationship)

# family relationship meanings edu generation max_ed_duration fedu
# <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 1 A respondent 12 2 18 10
# 2 1 C A's father 10 1 10 NA
# 3 1 E1 A's first son 15 3 15 18
# 4 1 E2 A's second son 13 3 15 18
# 5 1 G11 E1's first son 3 4 3 15
# 6 1 G12 E1's second son 1 4 3 15
# 7 2 A respondent 21 2 21 16
# 8 2 C A's father 12 1 16 NA
# 9 2 E1 A's first son 18 3 18 21
#10 2 E2 A's second son 17 3 18 21
#11 2 E3 A's third son 16 3 18 21

数据
mydf <- structure(list(family = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 
2, 2, 2, 2, 2), relationship = c("A", "B", "C", "D", "E1", "F1",
"G11", "G12", "E2", "A", "B", "C", "D", "E1", "F1", "E2", "E3"
), meanings = c("respondent", "respondent's spouse", "A's father",
"A's mother", "A's first son", "E1's spouse", "E1's first son",
"E1's second son", "A's second son", "respondent", "respondent's spouse",
"A's father", "A's mother", "A's first son", "E1's spouse", "A's second son",
"A's third son"), edu = c(12, 18, 10, 9, 15, 14, 3, 1, 13, 21,
16, 12, 16, 18, 15, 17, 16)), class = "data.frame", row.names = c(NA,
-17L))

关于r - 一个家庭长辈最高受教育年限如何计算,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59466424/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com