gpt4 book ai didi

r - 根据条件使用 left_join 合并数据

转载 作者:行者123 更新时间:2023-12-04 15:29:30 24 4
gpt4 key购买 nike

我有两个要合并的数据框。

“数据”看起来像:

    Filled_Ticker2LP publishYear              CO_1_Name
1: SONC 2005 sonic corp
2: SONC 2005 sonic corp
3: <NA> 2005 cascade bancorp inc.
4: JCP 2005 jc penney company inc

“补偿”看起来像:

   tic   fyear conm           
<chr> <int> <chr>
1 JCP 2004 penney (j c) co
2 JCP 2005 penney (j c) co
3 JCP 2006 penney (j c) co
4 JCP 2007 penney (j c) co
5 JCP 2008 penney (j c) co

我想使用 left_join(或来自 data.table 包等的东西)将这两个数据集连接在一起

我目前可以根据年份和符号加入它,例如 SONCJCP

mergedData <- data %>% 
left_join(comp, by = c("Filled_Ticker2LP" = "tic", "publishYear" = "fyear"))

“mergedData”看起来像:

    Filled_Ticker2LP publishYear              CO_1_Name            conm
1: SONC 2005 sonic corp sonic corp
2: SONC 2005 sonic corp sonic corp
3: <NA> 2005 cascade bancorp inc. <NA>
4: JCP 2005 jc penney company inc penney (j c) co

效果很好,但它在 Filled_Ticker2LP 列中有一个 NA(来自 data 数据集。

我想尝试使用我现有的当前方法加入数据,但如果 Filled_Ticker2LP 列中有一个 NA 值,我想更改从加入 ticFilled_Ticker2LP 到匹配公司名称的“匹配链接”。 conmCO_1_Name

也就是说,数据当前无法加入,因为在 Filled_Ticker2LP 列下的 data 数据集中,用于观察 3 它有一个 NA 值。但是,此观察仍然可以与 comp 数据相结合,因为在列 data$CO_1_Name 观察 3 中有 cascade Bancorp。此结果也出现在 conm 列下观察 30 - 53 的 comp 数据中。

我在想一个 if 语句:

if not NA in data$Filled_Ticker2LP then join using Filled_Ticker2LP and tic else join using CO_1_Name and conm.

附加

我还注意到列中有一些空格。

data %>% 
mutate(
CO_1_Name = str_trim(CO_1_Name)
)

数据 1:

comp <- structure(list(tic = c("JCP", "JCP", "JCP", "JCP", "JCP", "JCP", 
"JCP", "JCP", "JCP", "JCP", "JCP", "JCP", "JCP", "JCP", "JCP",
"SONC", "SONC", "SONC", "SONC", "SONC", "SONC", "SONC", "SONC",
"SONC", "SONC", "SONC", "SONC", "SONC", "SONC", "CACB", "CACB",
"CACB", "CACB", "CACB", "CACB", "CACB", "CACB", "CACB", "CACB",
"CACB", "CACB", "CACB", "CACB", "CACB", "CACB", "CACB", "CACB",
"CACB", "CACB", "CACB", "CACB", "CACB", "CACB"), fyear = c(2004L,
2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 2011L, 2012L, 2013L,
2014L, 2015L, 2016L, 2017L, 2018L, 2005L, 2006L, 2007L, 2008L,
2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2015L, 2016L, 2017L,
2018L, 2005L, 2005L, 2006L, 2006L, 2007L, 2007L, 2008L, 2008L,
2009L, 2009L, 2010L, 2010L, 2011L, 2011L, 2012L, 2012L, 2013L,
2013L, 2014L, 2014L, 2015L, 2015L, 2016L, 2016L), conm = c("penney (j c) co",
"penney (j c) co", "penney (j c) co", "penney (j c) co", "penney (j c) co",
"penney (j c) co", "penney (j c) co", "penney (j c) co", "penney (j c) co",
"penney (j c) co", "penney (j c) co", "penney (j c) co", "penney (j c) co",
"penney (j c) co", "penney (j c) co", "sonic corp", "sonic corp",
"sonic corp", "sonic corp", "sonic corp", "sonic corp", "sonic corp",
"sonic corp", "sonic corp", "sonic corp", "sonic corp", "sonic corp",
"sonic corp", "sonic corp", "cascade bancorp", "cascade bancorp",
"cascade bancorp", "cascade bancorp", "cascade bancorp", "cascade bancorp",
"cascade bancorp", "cascade bancorp", "cascade bancorp", "cascade bancorp",
"cascade bancorp", "cascade bancorp", "cascade bancorp", "cascade bancorp",
"cascade bancorp", "cascade bancorp", "cascade bancorp", "cascade bancorp",
"cascade bancorp", "cascade bancorp", "cascade bancorp", "cascade bancorp",
"cascade bancorp", "cascade bancorp")), row.names = c(NA, -53L
), class = c("tbl_df", "tbl", "data.frame"))

数据 2:

data <- structure(list(Filled_Ticker2LP = c("SONC", "SONC", NA, "JCP", 
"JCP", "JCP", "SONC", "SONC", "JCP", "JCP", "JCP", "JCP", "SONC",
"JCP", "JCP", "JCP", "SONC", "JCP", "JCP", "SONC", "JCP", "JCP",
"JCP", "JCP", "JCP", "JCP", "JCP", "JCP", "JCP", "JCP", "JCP",
"JCP", "JCP", "JCP", "JCP", "SONC"), publishYear = c(2005L, 2005L,
2005L, 2005L, 2005L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L,
2006L, 2007L, 2007L, 2007L, 2007L, 2007L, 2008L, 2008L, 2008L,
2009L, 2009L, 2010L, 2010L, 2010L, 2010L, 2011L, 2011L, 2011L,
2011L, 2011L, 2012L, 2013L, 2015L, 2015L, 2016L), CO_1_Name = c(" sonic corp",
" sonic corp", " cascade bancorp inc.", " jc penney company inc",
" jc penney company inc", " jc penney company inc", " sonic corp",
" sonic corp", " jc penney company inc", " jc penney company inc",
" jc penney company inc", " jc penney company inc", " sonic corp",
" jc penney company inc", " jc penney company inc", " jc penney company inc",
" sonic corp", " jc penney company inc", " jc penney company inc",
" sonic corp", " jc penney company inc", " jc penney company inc",
" jc penney company inc", " macy's incorporated", " macy's incorporated",
" jc penney company inc", " macy's incorporated", " macy's incorporated",
" jc penney company inc", " apple inc", " apple inc", " macy's incorporated",
" jc penney company inc", " jc penney company inc", " jc penney company inc",
" sonic corp")), .internal.selfref = <pointer: 0x55603dbefe00>, row.names = c(NA,
-36L), class = c("data.table", "data.frame"))

最佳答案

如果我们需要按条件加入,一种选择是根据 NA 在特定列的“数据”中的出现来拆分,然后使用 map2 做连接

library(dplyr)
library(purrr)
library(stringr)
data %>%
mutate(
CO_1_Name = str_trim(CO_1_Name)
) %>%
group_split(grp = is.na(Filled_Ticker2LP), keep = FALSE) %>%
map2_dfr(list(setNames('tic', "Filled_Ticker2LP"),
setNames( "conm", "CO_1_Name")), ~
left_join(.x, comp, by = c(.y, "publishYear" = "fyear")))

关于r - 根据条件使用 left_join 合并数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61510133/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com