gpt4 book ai didi

regex - 基于匹配其他列的部分字符串在数据框中创建新列

转载 作者:行者123 更新时间:2023-12-03 14:03:10 24 4
gpt4 key购买 nike

我有一个包含 2 列的数据框 GLGLDESC并想添加名为 KIND 的第三列基于列 GLDESC 内的一些数据.

数据框如下:

      GL                             GLDESC
1 515100 Payroll-Indir Salary Labor
2 515900 Payroll-Indir Compensated Absences
3 532300 Bulk Gas
4 539991 Area Charge In
5 551000 Repairs & Maint-Spare Parts
6 551100 Supplies-Operating
7 551300 Consumables

对于数据表的每一行:
  • GLDESC包含单词 Payroll字符串中的任何位置然后我想要 KIND成为 Payroll
  • GLDESC包含单词 Gas字符串中的任何位置然后我想要 KIND成为 Materials
  • 在所有其他情况下,我想要 KIND成为 Other

  • 我在 stackoverflow 上寻找了类似的例子,但没有找到,还查看了 R对于 switch、grep、apply 和正则表达式上的傻瓜,只尝试匹配 GLDESC 的一部分列然后填写 KIND列的帐户类型,但无法使其工作。

    最佳答案

    由于您只有两个条件,您可以使用嵌套的 ifelse :

    #random data; it wasn't easy to copy-paste yours  
    DF <- data.frame(GL = sample(10), GLDESC = paste(sample(letters, 10),
    c("gas", "payroll12", "GaSer", "asdf", "qweaa", "PayROll-12",
    "asdfg", "GAS--2", "fghfgh", "qweee"), sample(letters, 10), sep = " "))

    DF$KIND <- ifelse(grepl("gas", DF$GLDESC, ignore.case = T), "Materials",
    ifelse(grepl("payroll", DF$GLDESC, ignore.case = T), "Payroll", "Other"))

    DF
    # GL GLDESC KIND
    #1 8 e gas l Materials
    #2 1 c payroll12 y Payroll
    #3 10 m GaSer v Materials
    #4 6 t asdf n Other
    #5 2 w qweaa t Other
    #6 4 r PayROll-12 q Payroll
    #7 9 n asdfg a Other
    #8 5 d GAS--2 w Materials
    #9 7 s fghfgh e Other
    #10 3 g qweee k Other

    编辑 10/3/2016(..受到比预期更多的关注之后)

    处理更多模式的一种可能解决方案是迭代所有模式,并在匹配时逐步减少比较量:
    ff = function(x, patterns, replacements = patterns, fill = NA, ...)
    {
    stopifnot(length(patterns) == length(replacements))

    ans = rep_len(as.character(fill), length(x))
    empty = seq_along(x)

    for(i in seq_along(patterns)) {
    greps = grepl(patterns[[i]], x[empty], ...)
    ans[empty[greps]] = replacements[[i]]
    empty = empty[!greps]
    }

    return(ans)
    }

    ff(DF$GLDESC, c("gas", "payroll"), c("Materials", "Payroll"), "Other", ignore.case = TRUE)
    # [1] "Materials" "Payroll" "Materials" "Other" "Other" "Payroll" "Other" "Materials" "Other" "Other"

    ff(c("pat1a pat2", "pat1a pat1b", "pat3", "pat4"),
    c("pat1a|pat1b", "pat2", "pat3"),
    c("1", "2", "3"), fill = "empty")
    #[1] "1" "1" "3" "empty"

    ff(c("pat1a pat2", "pat1a pat1b", "pat3", "pat4"),
    c("pat2", "pat1a|pat1b", "pat3"),
    c("2", "1", "3"), fill = "empty")
    #[1] "2" "1" "3" "empty"

    关于regex - 基于匹配其他列的部分字符串在数据框中创建新列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19747384/

    24 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com