gpt4 book ai didi

r - 是否有更优雅的方法将 88 个级别的变量折叠为 5 个级别的变量?

转载 作者:行者123 更新时间:2023-12-04 11:02:07 25 4
gpt4 key购买 nike

我有一个包含 88 个级别(县)的分类变量,我想将它们聚合到五个更大的地理区域中。有没有比大量 ifelse 语句(如下所示)更优雅的方法来做到这一点?

survey.responses$admin<-ifelse(survey.responses$CNTY=="Lake","Northeast",
ifelse(survey.responses$CNTY=="Traverse","Northwest",
ifelse(survey.responses$CNTY=="Ramsey","Central",
ifelse(survey.responses$CNTY=="Cottonwood","South","out of state")

除了想象 CNTY 有 88 个级别!有什么想法吗?

最佳答案

两个快速的方法,我推荐merge一种用于较大的套装。

数据

dat <- data.frame(cnty = c("Lake", "Traverse", "Ramsey", "Cottonwood"),
stringsAsFactors = FALSE)
  • 合并/加入 .我更喜欢这个有几个原因,最重要的是维护匹配和 read.csv 的 CSV 文件很容易。将 CSV 导入 ref查找表。我会故意将“湖”排除在外,以显示不匹配时会发生什么。

    ref <- data.frame(cnty = c("Cottonwood", "Ramsey", "Traverse", "SomeOther"),
    admin = c("South", "Central", "Northwest", "NeverNeverLand"),
    stringsAsFactors = FALSE)
    out <- merge(dat, ref, by = "cnty", all.x = TRUE)
    out
    # cnty admin
    # 1 Cottonwood South
    # 2 Lake <NA>
    # 3 Ramsey Central
    # 4 Traverse Northwest

    默认值以这种方式分配:

    out$admin[is.na(out$admin)] <- "out of state"
    out
    # cnty admin
    # 1 Cottonwood South
    # 2 Lake out of state
    # 3 Ramsey Central
    # 4 Traverse Northwest

    如果您正在使用 tidyverse 的其他组件,这可以用

    library(dplyr)
    left_join(dat, ref, by = "cnty") %>%
    mutate(admin = if_else(is.na(admin), "out of state", admin))
  • 查找 .这适用于小事,但可能不适合您。 (同样,我已经将“Lake”注释掉以显示不匹配。)

    c(Cottonwood="South", # Lake="Northeast",
    Ramsey="Central", Traverse="Northwest")[dat$cnty]
    # <NA> Traverse Ramsey Cottonwood
    # NA "Northwest" "Central" "South"
  • 关于r - 是否有更优雅的方法将 88 个级别的变量折叠为 5 个级别的变量?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58740548/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com