gpt4 book ai didi

r - R 中的配对观察

转载 作者:行者123 更新时间:2023-12-01 13:57:19 28 4
gpt4 key购买 nike

让我们采用以下示例数据集:

counterparty1 <- c("A","B","B","B","B")
counterparty2 <- c("B","C","A","A","C")
counterparty1_side <- c("buy","sell","buy","sell","sell")
price <- c(1.2,3.7,2.5,1.2,3.7)
sample.data <- data.frame(counterparty1,counterparty2,counterparty1_side,price)

第 1 行和第 4 行实际上给出了相同的观察结果 - 唯一的问题是第 1 行表示“A”购买 Assets (暗示“B”出售),而第 4 行表示“B”出售 Assets (暗示“B”出售 Assets ) “A”购买)。

我想要代码来创建以下数据集:

counterparty1 <- c("A","B","B","B","B")
counterparty2 <- c("B","C","A","A","C")
counterparty1_side <- c("buy","sell","buy","sell","sell")
price <- c(1.2,3.7,2.5,1.2,3.7)
transaction_number <- c(1,2,3,1,4)
duplicate <- c(1,0,0,1,0)
clean.data <- data.frame(counterparty1,counterparty2,counterparty1_side,price,transaction_number,duplicate)

实际上,我的数据集当然要大得多,所以我无法进行硬编码。

更新:我添加了第 5 行,该行与第 2 行相同,包括交易对手 1 和 2 的顺序相同。我希望“重复”变量仅将第 1 行和第 4 行标记为重复项(因为它们是相反的),而不是第 2 行和第 5 行。

最佳答案

更新的答案:

解决OP的后续问题,指出如果相同的交易发生两次,则不应将其视为重复项。(例如乙方丙方出售东西两次花费$3.7K);阅读评论和更新的问题。

library(dplyr)
sample.data %>%
mutate(transaction=if_else(counterparty1_side=="buy",
paste0(counterparty1,counterparty2),
paste0(counterparty2,counterparty1))) %>%
group_by_all %>%
mutate(dup_dum = 1:n()) %>%
group_by(transaction, dup_dum) %>%
mutate(transaction_number = group_indices(),
duplicate = +(n()!=n_distinct(transaction, dup_dum))) %>%
ungroup() %>% select(-transaction, -dup_dum)

#> # A tibble: 5 x 6
#> counterparty1 counterparty2 counterparty1_s~ price transaction_num~ duplicate
#> <fct> <fct> <fct> <dbl> <int> <int>
#> 1 A B buy 1.2 1 1
#> 2 B C sell 3.7 3 0
#> 3 B A buy 2.5 2 0
#> 4 B A sell 1.2 1 1
#> 5 B C sell 3.7 4 0

原始答案:

考虑受骗者(如果他们只是因为对方角色发生了变化而被骗,或者他们是真正的受骗者,这并不重要)(查看对问题的编辑以查看问题的第一个版本)。

library(dplyr)

sample.data %>%
mutate(transaction=if_else(counterparty1_side=="buy",
paste0(counterparty1,counterparty2),
paste0(counterparty2,counterparty1))) %>%
group_by(transaction) %>%
mutate(transaction_number = group_indices(),
duplicate = +(n()!=n_distinct(transaction))) %>%
ungroup() %>% select(-transaction)

# # A tibble: 4 x 6
# counterparty1 counterparty2 counterparty1_side price transaction_number duplicate
# <fct> <fct> <fct> <dbl> <int> <int>
# 1 A B buy 1.2 1 1
# 2 B C sell 3.7 3 0
# 3 B A buy 2.5 2 0
# 4 B A sell 1.2 1 1

关于r - R 中的配对观察,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57011800/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com