gpt4 book ai didi

r - 基于两个变量进行分组,包括它们各自的组合(例如 A - B 与 B - A 相同)

转载 作者:行者123 更新时间:2023-12-02 09:05:28 27 4
gpt4 key购买 nike

我在工作中遇到编码问题时陷入困境。我有一个包含三个变量 var1 和 var2 以及长度的数据框。后者是 var1 和 var2 之间的相互长度,例如边界。

最终我想计算 var1 - var2 的每个组合(var2 - var1 被视为相同)占 var1 和 var2 中每个唯一元素的总长度的百分比。因为这听起来太复杂了,所以我举了一些例子来说明我陷入困境的地方。

library(tidyverse)

df <- tibble(
var1 = c("A","B","A","D","A"),
var2 = c("B","A","D","A","B"),
Length = c(10,12,5,20,34))



#First I wanted the total length of each variable, irrespective of it occurring in var1 or var2
# I think that I figured this out. Let me know it its a bit convoluted

var_unique <- unique(c(unique(df$var1),unique(df$var2)))
names(var_unique) <- var_unique

total_length <- map_df(var_unique, function(x){

df %>%
filter( var1 == x | var2 == x )%>%
summarise(var_total_length = sum(Length))

},.id = "var" )

total_length
#> # A tibble: 3 x 2
#> var var_total_length
#> <chr> <dbl>
#> 1 A 81
#> 2 B 56
#> 3 D 25

# Second I need the length of each combination of var1 and var2.
#I would like the "A" - "B" should be the same than "B" - "A"
# Grouping does not work in this case. This is where I am stuck

#Neiter this

df %>% group_by(var1,var2) %>%
mutate(combination_length = sum(Length))
#> # A tibble: 5 x 4
#> # Groups: var1, var2 [4]
#> var1 var2 Length combination_length
#> <chr> <chr> <dbl> <dbl>
#> 1 A B 10 44
#> 2 B A 12 12
#> 3 A D 5 5
#> 4 D A 20 20
#> 5 A B 34 44

# nor that one does the job, because it looks at individual combination of var1 and var2.

df %>% group_by(var1,var2) %>%
summarise(combination_length = sum(Length))
#> # A tibble: 4 x 3
#> # Groups: var1 [3]
#> var1 var2 combination_length
#> <chr> <chr> <dbl>
#> 1 A B 44
#> 2 A D 5
#> 3 B A 12
#> 4 D A 20



# this is the dataframe that I would like. Rows 1,2 and 5 of df should be considered the
# same group

tibble(
var1 = c("A","B","A","D","A"),
var2 = c("B","A","D","A","B"),
Length = c(10,12,5,20,34),
combination_length = c(56,56,25,25,56))
#> # A tibble: 5 x 4
#> var1 var2 Length combination_length
#> <chr> <chr> <dbl> <dbl>
#> 1 A B 10 56
#> 2 B A 12 56
#> 3 A D 5 25
#> 4 D A 20 25
#> 5 A B 34 56



# Ultimately i want to divide each combination by the total length of the variable
# occurring in the combination to obtain the percentage of each boundary for each unique variable

reprex package于2019年11月27日创建(v0.3.0)

我认为有一些方法可以让它比我尝试的更简单。

最佳答案

我们可以在group_by中使用排序的var1var2,这可以使用pmax来完成分分钟

library(dplyr)

df %>%
group_by(group1 = pmin(var1, var2), group2 = pmax(var1, var2)) %>%
mutate(combination_length = sum(Length)) %>%
ungroup() %>%
select(-group1, -group2)

# var1 var2 Length combination_length
# <chr> <chr> <dbl> <dbl>
#1 A B 10 56
#2 B A 12 56
#3 A D 5 25
#4 D A 20 25
#5 A B 34 56

关于r - 基于两个变量进行分组,包括它们各自的组合(例如 A - B 与 B - A 相同),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59069132/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com