r - 一个家庭长辈最高受教育年限如何计算-6ren

r - 一个家庭长辈最高受教育年限如何计算

转载作者：行者123 更新时间：2023-12-03 09:27:57

假设我有这样的数据框:

   family relationship meanings              edu
 1      1 A            respondent             12
 2      1 B            respondent's spouse    18
 3      1 C            A's father             10
 4      1 D            A's mother              9
 5      1 E1           A's first son          15
 6      1 F1           E1's spouse            14
 7      1 G11          E1's first son          3
 8      1 G12          E1's second son         1
 9      1 E2           A's second son         13
10      2 A            respondent             21
11      2 B            respondent's spouse    16
12      2 C            A's father             12
13      2 D            A's mother             16
14      2 E1           A's first son          18
15      2 F1           E1's spouse            15
16      2 E2           A's second son         17
17      2 E3           A's third son          16

family表示家庭号码。 relationship表示一个家庭的关系。 meanings表示第二列的含义， relationship .

我想计算一个家庭中父代的最大受教育年限。
我们不需要配偶的信息。

预期结果如下:

   family id      edu fedu 
 1      1 A        12 10   
 2      1 C        10 NA   
 3      1 E1       15 18   
 4      1 E2       13 18   
 5      1 G11       3 15   
 6      1 G12       1 15   
 7      2 A        21 16   
 8      2 C        12 NA   
 9      2 E1       18 21   
10      2 E2       17 21   
11      2 E3       16 21

这是数据:

 d = structure(list(family = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2), relationship = c("A", "B", "C", "D", "E1", "F1", "G11", "G12", "E2", "A", "B", "C", "D", "E1", "F1", "E2", "E3"), meanings = c("respondent", "respondent's spouse", "A's father","A's mother", "A's first son", "E1's spouse", "E1's first son","E1's second son", "A's second son", "respondent", "respondent's spouse","A's father", "A's mother", "A's first son", "E1's spouse", "A's second son","A's third son"), edu = c(12, 18, 10, 9, 15, 14, 3, 1, 13, 21,16, 12, 16, 18, 15, 17, 16)), row.names = c(NA, -17L), class = c("tbl_df", "tbl", "data.frame"))

最佳答案

这是我尝试过的。我认为有必要创建一个生成变量。查看您问题中的示例图片，C和 D是第一代。 A和 B是第2代。 E和 F是第三代，和G是第4代。第一个mutate()与 case_when()创建了生成变量。然后，我通过 family 定义了组和 generation .对于每组，我确定了最长教育时间(即 max_ed_duration)。因为你说你不需要配偶的信息，所以我在 meanings 中删除了包含母亲或配偶的行。 .然后，我再次使用 family 定义组.对于每个家庭，如果 generation为 1，将 NA 分配给 fedu .否则，分配 max_ed_duration值从上一代到fedu .最后我按family整理了数据和 relationship .

library(dplyr)

mutate(mydf, generation = case_when(relationship %in% c("C", "D") ~ 1,
                                    relationship %in% c("A", "B") ~ 2,
                                    grepl(x = relationship, pattern = "^E|F") ~ 3,
                                    grepl(x = relationship, pattern = "^G") ~ 4)) %>% 
  group_by(family, generation) %>% 
  mutate(max_ed_duration = max(edu)) %>% 
  filter(!grepl(x = meanings, pattern = "mother|spouse")) %>% 
  group_by(family) %>%
  mutate(fedu = if_else(generation == 1,
                        NA_real_,
                        max_ed_duration[match(x = generation - 1, table = generation)])) %>% 
  arrange(family, relationship)

#   family relationship meanings          edu generation max_ed_duration  fedu
#    <dbl> <chr>        <chr>           <dbl>      <dbl>           <dbl> <dbl>
# 1      1 A            respondent         12          2              18    10
# 2      1 C            A's father         10          1              10    NA
# 3      1 E1           A's first son      15          3              15    18
# 4      1 E2           A's second son     13          3              15    18
# 5      1 G11          E1's first son      3          4               3    15
# 6      1 G12          E1's second son     1          4               3    15
# 7      2 A            respondent         21          2              21    16
# 8      2 C            A's father         12          1              16    NA
# 9      2 E1           A's first son      18          3              18    21
#10      2 E2           A's second son     17          3              18    21
#11      2 E3           A's third son      16          3              18    21

数据

mydf <- structure(list(family = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 
2, 2, 2, 2, 2), relationship = c("A", "B", "C", "D", "E1", "F1", 
"G11", "G12", "E2", "A", "B", "C", "D", "E1", "F1", "E2", "E3"
), meanings = c("respondent", "respondent's spouse", "A's father", 
"A's mother", "A's first son", "E1's spouse", "E1's first son", 
"E1's second son", "A's second son", "respondent", "respondent's spouse", 
"A's father", "A's mother", "A's first son", "E1's spouse", "A's second son", 
"A's third son"), edu = c(12, 18, 10, 9, 15, 14, 3, 1, 13, 21, 
16, 12, 16, 18, 15, 17, 16)), class = "data.frame", row.names = c(NA, 
-17L))

关于r - 一个家庭长辈最高受教育年限如何计算，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/59466424/

文章推荐： swift - 为什么 UIScreen.bounds 在 iOS11 中不正确

文章推荐： json - 如何在json swift中解析时保持原始 double 值？

文章推荐： flutter - 在Flutter应用程序中集成Paypal付款

c++ - 双向链表的自定义实现不起作用(教育)
我实现了自己的简单版本的双向链表。不幸的是，它似乎有一个错误。列表的头部似乎移动到新的Node , 每次加一个 push_back .正因为如此，print将无限期地打印最后一个值。链表: stru
clojure - 什么是 Clojure 教育？
调用 (eduction ..) 在 REPL 中返回一些看起来像序列的东西(即像“(1 2 3)”这样的东西)。但是，如果您检查此返回值是否是一个序列(通过 seq?)，则会得到 false。问题
c++ - 教育 - 使用 Rcpp 了解递归函数的可变性能
这个问题不是实际问题，我只是在寻找对观察到的事件的合理解释。我正在阅读 Seamless R and C++ Integration with Rcpp (Use R!) 德克·埃德尔比特尔 (Dir
audio - 如何通过 Altera DE2-115 教育 FPGA 板上的 Wolfson WM8731 CODEC 获取音频输出？
我和我的小组正在尝试用 DE2-115 板为我们的本科顶点项目创建一个合成器。我们唯一想不通的是如何将频率映射到通过板上音频端口正确输出的“键”。我们搜索了网络，所有提供的文档都包含编解码器的数据表

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

r - 一个家庭长辈最高受教育年限如何计算