gpt4 book ai didi

r - 命名水果的两列上的自定义聚合

转载 作者:行者123 更新时间:2023-12-04 09:16:50 26 4
gpt4 key购买 nike

我想通过以下有点特殊的方式按名称聚合数据框的两列:

  • 放下 parts通过特殊聚合两列 fruits 结果中的列和 parts
  • parts Apple、Banana 和 Strawberry 的值无关紧要,一切都得到了总结,parts葡萄和猕猴桃的值(value)应该成为新的fruits姓名
  • 结果(底部)应该有 8 个聚合行而不是 20

  • 乍一看,这可能听起来很简单,但经过数小时的反复试验,我没有找到任何有用的解决方案。这是示例:
    theDF <- data.frame(dates = as.Date(c(today()+20)),
    fruits = c("Apple","Apple","Apple","Apple","Banana","Banana","Banana","Banana",
    "Strawberry","Strawberry","Strawberry","Strawberry","Grape", "Grape",
    "Grape","Grape", "Kiwi","Kiwi","Kiwi","Kiwi"),
    parts = c("Big Green Apple","Apple2","Blue Apple","XYZ Apple4",
    "Yellow Banana1","Small Banana","Banana3","Banana4",
    "Red Small Strawberry","Red StrawberryY","Big Strawberry",
    "StrawberryZ","Green Grape", "Blue Grape", "Blue Grape",
    "Blue Grape","Big Kiwi","Small Kiwi","Big Kiwi","Middle Kiwi"),
    stock = as.vector(sample(1:20)) )

    当前数据框:

    enter image description here

    所需的输出:

    enter image description here

    最佳答案

    我们可以使用 data.table 。如果有像结尾字符是大写字母或要删除的“部分”列中的数字这样的模式,我们可以使用 sub 来做到这一点,并与“日期”一起用作分组变量,并获得“股票”的 sum .

    library(data.table)
    setDT(theDF)[,.(stock = sum(stock)) , .(dates, fruits = sub("([0-9]|[A-Z])$", "", parts))]
    # dates fruits stock
    #1: 2016-06-19 Apple 46
    #2: 2016-06-19 Banana 35
    #3: 2016-06-19 Strawberry 38
    #4: 2016-06-19 Green Grape 12
    #5: 2016-06-19 Blue Grape 21
    #6: 2016-06-19 Big Kiwi 37
    #7: 2016-06-19 Small Kiwi 14
    #8: 2016-06-19 Middle Kiwi 7

    或者使用 dplyr ,我们可以类似地实现相同的方法。
    library(dplyr)
    theDF %>%
    group_by(dates, fruits = sub('([0-9]|[A-Z])$', '', parts)) %>%
    summarise(stock = sum(stock))

    更新

    如果没有模式,仅基于手动识别'fruits'中的元素,则创建一个 vector的元素,使用 %chin%获取'i'中的逻辑索引,分配( :=)'parts'中对应于'的值i' 到 'fruits',然后按 'dates'、'fruits' 分组并获得 'stock' 的 sum
    setDT(theDF)[as.character(fruits) %chin% c("Grape", "Kiwi"),
    fruits := parts][, .(stock = sum(stock)), .(dates, fruits)]

    数据
    theDF <- structure(list(dates = structure(c(16971, 16971, 16971, 16971, 
    16971, 16971, 16971, 16971, 16971, 16971, 16971, 16971, 16971,
    16971, 16971, 16971, 16971, 16971, 16971, 16971), class = "Date"),
    fruits = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 5L,
    5L, 5L, 5L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L), .Label = c("Apple",
    "Banana", "Grape", "Kiwi", "Strawberry"), class = "factor"),
    parts = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 14L,
    15L, 16L, 16L, 11L, 10L, 10L, 10L, 9L, 13L, 9L, 12L), .Label = c("Apple1",
    "Apple2", "Apple3", "Apple4", "Banana1", "Banana2", "Banana3",
    "Banana4", "Big Kiwi", "Blue Grape", "Green Grape", "Middle Kiwi",
    "Small Kiwi", "StrawberryX", "StrawberryY", "StrawberryZ"
    ), class = "factor"), stock = c(8, 19, 15, 4, 6, 18, 1, 10,
    9, 16, 11, 2, 12, 13, 5, 3, 17, 14, 20, 7)), .Names = c("dates",
    "fruits", "parts", "stock"), row.names = c(NA, -20L), class = "data.frame")

    关于r - 命名水果的两列上的自定义聚合,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37524605/

    26 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com