gpt4 book ai didi

r - 通过创建唯一列复制行

转载 作者:行者123 更新时间:2023-12-01 16:15:59 25 4
gpt4 key购买 nike

有一种情况

one    two    three    type 
x x chocolate
x vanilla
x x strawberry

如果我想根据列中的“x”进行复制,例如:

one    two    three    type 
x chocolate
x chocolate
x vanilla
x strawberry
x strawberry

这里的目标是在每一行中有一个“x”,因此如果每一行中有多个“x”,则复制整行,并且只为每个副本保留该行的一个唯一“x”。

复制

dat <- data.frame(one = c("x", "x", "x"), two = c("x", "", "x"), three = c("", "", ""), type = c("chocolate", "vanilla",
"strawberry"))

最佳答案

使用 tidyr::gather

与 MKR 和 Anant 的解决方案相同。

library(tidyverse)
dat %>%
mutate(i=row_number()) %>%
gather("Key", "Val", -type,-i) %>%
rowid_to_column %>%
spread(Key, Val,fill = "") %>%
`[`(rowSums(.=="x")>0,) %>%
arrange(i) %>%
select(type,one,two,three)

# type one two three
# 1 chocolate x
# 2 chocolate x
# 3 vanilla x
# 4 strawberry x
# 5 strawberry x

尽管您可能对此感到满意(如果至少填充了一个值,则会出现 three 列,但行会被打乱):

dat %>%
na_if("") %>%
gather("Key", "Val", -type,na.rm=TRUE) %>%
rowid_to_column %>%
spread(Key, Val,fill = "") %>%
select(-1)

# type one two
# 1 chocolate x
# 2 vanilla x
# 3 strawberry x
# 4 chocolate x
# 5 strawberry x

绑定(bind)数据框

另一种将数据拆分到不同数据框中然后绑定(bind)它们的方法(看起来像@Accumulation 在他的回答中试图解释的内容):

dat %>%
map(1:3,~filter(.y[c(.x,4)],.y[.x]=="x"),.) %>%
bind_rows %>%
modify(as.character) %>%
`[<-`(is.na(.),value="") %>%
select(one,two,three,type)

# one two three type
# 1 x chocolate
# 2 x vanilla
# 3 x strawberry
# 4 x chocolate
# 6 x strawberry

使用合并

dat %>%
na_if("") %>%
bind_rows(.["type"]) %>%
map(1:3,~.y[c(.x,4)],.) %>%
reduce(merge,all=TRUE) %>%
`[<-`(is.na(.),value="") %>%
`[`(rowSums(.=="x")==1,) %>%
distinct

# type one two three
# 1 chocolate x
# 2 chocolate x
# 3 strawberry x
# 4 strawberry x
# 5 vanilla x

使用 unnest

dat %>%
imap_dfc(~{
i <- match(.y,names(dat))
if(.y != "type") map(.x,~`[<-`(character(3),i,.x)) else .x}) %>%
unnest %>%
`[`(rowSums(.=="x")==1,)

# # A tibble: 5 x 4
# type one two three
# <chr> <chr> <chr> <chr>
# 1 chocolate x
# 2 chocolate x
# 3 vanilla x
# 4 strawberry x
# 5 strawberry x

结合使用 unnest 和 diag

dat %>%
rowwise %>%
transmute(type,cols = list(setNames(data.frame(diag(c(one,two,three)=="x")),c('one','two','three')))) %>%
unnest %>%
modify_at(2:4, ~c('','x')[.x+1]) %>%
`[`(rowSums(.=="x")==1,)

# # A tibble: 5 x 4
# type one two three
# <chr> <chr> <chr> <chr>
# 1 chocolate x
# 2 chocolate x
# 3 vanilla x
# 4 strawberry x
# 5 strawberry x

编辑 以在评论中满足 OP 的要求:

我们采用第一个解决方案并对源数据使用 right_join 以确保所有行都存在,然后将 NA 替换为空字符串。我们还清理了命令的第一行,因为我们不再需要 i 来进行排列。

dat2 %>%
gather("Key", "Val", -type) %>%
rowid_to_column %>%
spread(Key, Val,fill = "") %>%
`[`(rowSums(.=="x")>0,) %>%
right_join(dat2["type"]) %>%
`[<-`(is.na(.),value="") %>%
select(type,one,two,three)

# type one two three
# 1 chocolate x
# 2 chocolate x
# 3 vanilla x
# 4 strawberry x
# 5 strawberry x
# 6 hazelnut

关于r - 通过创建唯一列复制行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48652712/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com