gpt4 book ai didi

R - 从排序数据构建新变量

转载 作者:行者123 更新时间:2023-12-04 09:26:36 25 4
gpt4 key购买 nike

这是关于 this 的更新/跟进题。答案概述了他们不符合新要求。

我正在寻找一种有效的方法(data.table?)为每个 ID 构造两个新度量。

措施一和措施二需要满足以下条件:

条件一:查找三行的序列:

  • 第一个 count > 0
  • 第二个“count >1”和
  • 第三个 count ==1

措施 1 的条件 2:

  • 取序列第三行 product 中元素的值:
  • 在序列第二行的product
  • 不在顺序第一行的stock中。

措施 2 的条件 2:

  • 取序列最后一行 product 中元素的值:
  • 不在序列第二行的产品
  • 不在顺序第一行的stock中。

数据:

df2 <- data.frame(ID = c(1,1,1,1,1,1,1,2,2,2,3,3,3,3),
seqs = c(1,2,3,4,5,6,7,1,2,3,1,2,3,4),
count = c(2,1,3,1,1,2,3,1,2,1,3,1,4,1),
product = c("A", "B", "C", "A,C,E", "A,B", "A,B,C", "D", "A", "B", "A", "A", "A,B,C", "D", "D"),
stock = c("A", "A,B", "A,B,C", "A,B,C,E", "A,B,C,E", "A,B,C,E", "A,B,C,D,E", "A", "A,B", "A,B", "A", "A,B,C", "A,B,C,D", "A,B,C,D"))

> df2
ID seqs count product stock
1 1 1 2 A A
2 1 2 1 B A,B
3 1 3 3 C A,B,C
4 1 4 1 A,C,E A,B,C,E
5 1 5 1 A,B A,B,C,E
6 1 6 2 A,B,C A,B,C,E
7 1 7 3 D A,B,C,D,E
8 2 1 1 A A
9 2 2 2 B A,B
10 2 3 1 A A,B
11 3 1 3 A A
12 3 2 1 A,B,C A,B,C
13 3 3 4 D A,B,C,D
14 3 4 1 D A,B,C,D

所需的输出如下所示:

   ID seq1 seq2 seq3 measure1   measure2
1: 1 2 3 4 C E
2: 2 1 2 3
3: 3 2 3 4 D

你会如何编写代码?

最佳答案

要做到这一点,您需要了解以下几点:

  • shift 函数比较组中的值
  • separate_rows 函数拆分您的字符串以获取规范化数据 View 。
library(data.table)
dt <- data.table(ID = c(1,1,1,1,1,1,1,2,2,2,3,3,3,3),
seqs = c(1,2,3,4,5,6,7,1,2,3,1,2,3,4),
count = c(2,1,3,1,1,2,3,1,2,1,3,1,4,1),
product = c("A", "B", "C", "A,C,E", "A,B", "A,B,C", "D", "A", "B", "A", "A", "A,B,C", "D", "D"),
stock = c("A", "A,B", "A,B,C", "A,B,C,E", "A,B,C,E", "A,B,C,E", "A,B,C,D,E", "A", "A,B", "A,B", "A", "A,B,C", "A,B,C,D", "A,B,C,D"))

dt[, count.2 := shift(count, type = "lead")]
dt[, count.3 := shift(count, n = 2, type = "lead")]

dt[, product.2 := shift(product, type = "lead")]
dt[, product.3 := shift(product, n = 2, type = "lead")]


dt <- dt[count > 0 & count.2 > 1 & count.3 == 1]
dt <- unique(dt, by = "ID")

library(tidyr)
dt.measure <- separate_rows(dt, product.3, sep = ",")
dt.measure <- separate_rows(dt.measure, stock, sep = ",")
dt.measure <- separate_rows(dt.measure, product, sep = ",")

dt.measure[, measure.1 := (product.3 == product.2 & product.3 != stock)]
dt.measure[, measure.2 := (product.3 != product.2 & product.3 != stock)]
res <- dt.measure[,
.(
measure.1 = max(ifelse(measure.1, product.3, NA_character_), na.rm = TRUE),
measure.2 = max(ifelse(measure.2, product.3, NA_character_), na.rm = TRUE)
),
ID
]

dt <- merge(dt, res, by = "ID")
dt[, .(ID, measure.1, measure.2)]
# ID measure.1 measure.2
# 1: 1 C E
# 2: 2 <NA> <NA>
# 3: 3 D <NA>

关于R - 从排序数据构建新变量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57529311/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com