gpt4 book ai didi

r - 合并两个数据帧 : specifically merge a selection of columns based on two conditions?

转载 作者:行者123 更新时间:2023-12-04 07:24:20 25 4
gpt4 key购买 nike

我在同一 2 名患者上有两个数据集。对于第二个数据集,我想向第一个数据集添加新信息,但我似乎无法正确获取代码。
我的第一个(不完整)数据集有患者 ID、测量时间(T0 或 FU1)、出生年份、CT 扫描日期和两个结果(legs_mass 和 total_mass):

library(tidyverse)
library(dplyr)
library(magrittr)
library(lubridate)

df1 <- structure(list(ID = c(115, 115, 370, 370), time = structure(c(1L,
6L, 1L, 6L), .Label = c("T0", "T1M0", "T1M6", "T1M12", "T2M0",
"FU1"), class = "factor"), year_of_birth = c(1970, 1970, 1961,
1961), date_ct = structure(c(16651, 17842, 16651, 18535), class = "Date"),
legs_mass = c(9.1, NA, NA, NA), total_mass = c(14.5, NA,
NA, NA)), row.names = c(NA, -4L), class = c("tbl_df", "tbl",
"data.frame"))

# Which gives the following dataframe
df1

# A tibble: 4 x 6
ID time year_of_birth date_ct legs_mass total_mass
<dbl> <fct> <dbl> <date> <dbl> <dbl>
1 115 T0 1970 2015-08-04 9.1 14.5
2 115 FU1 1970 2018-11-07 NA NA
3 370 T0 1961 2015-08-04 NA NA
4 370 FU1 1961 2020-09-30 NA NA

第二个数据集添加到legs_mass 和total_mass 列中:
df2 <- structure(list(ID = c(115, 370), date_ct = structure(c(17842, 
18535), class = "Date"), ctscan_label = c("PXE115_CT_20181107_xxxxx-3.tif",
"PXE370_CT_20200930_xxxxx-403.tif"), legs_mass = c(956.1, 21.3
), total_mass = c(1015.9, 21.3)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame"))

# Which gives the following dataframe:
df2

# A tibble: 2 x 5
ID date_ct ctscan_label legs_mass total_mass
<dbl> <date> <chr> <dbl> <dbl>
1 115 2018-11-07 PXE115_CT_20181107_xxxxx-3.tif 956. 1016.
2 370 2020-09-30 PXE370_CT_20200930_xxxxx-403.tif 21.3 21.3
我正在尝试做的是...
  • 根据 ID 号和 date_ct,将 leg_mass 和 total_mass 列值从 df2 添加到 df1。
  • 将 df2 的新列(不在 df1 中的列;ctscan_label)添加到 df1,同样基于 ct 和患者 ID 的日期。
    这样最终的数据集 df3 如下所示:
  • df3 <- structure(list(ID = c(115, 115, 370, 370), time = structure(c(1L, 
    6L, 1L, 6L), .Label = c("T0", "T1M0", "T1M6", "T1M12", "T2M0",
    "FU1"), class = "factor"), year_of_birth = c(1970, 1970, 1961,
    1961), date_ct = structure(c(16651, 17842, 16651, 18535), class = "Date"),
    legs_mass = c(9.1, 956.1, NA, 21.3), total_mass = c(14.5,
    1015.9, NA, 21.3)), row.names = c(NA, -4L), class = c("tbl_df",
    "tbl", "data.frame"))

    # Corresponding to the following tibble:
    # A tibble: 4 x 6
    ID time year_of_birth date_ct legs_mass total_mass
    <dbl> <fct> <dbl> <date> <dbl> <dbl>
    1 115 T0 1970 2015-08-04 9.1 14.5
    2 115 FU1 1970 2018-11-07 956. 1016.
    3 370 T0 1961 2015-08-04 NA NA
    4 370 FU1 1961 2020-09-30 21.3 21.3
    我已经尝试了来自 baseR 的合并功能和 rbind ,以及来自 dplyr 的 bind_rows但似乎无法做对。
    有什么帮助吗?

    最佳答案

    您可以连接两个数据集并使用 coalesce从两个数据集中保留一个非 NA 值。

    library(dplyr)

    left_join(df1, df2, by = c("ID", "date_ct")) %>%
    mutate(leg_mass = coalesce(legs_mass.x , legs_mass.y),
    total_mass = coalesce(total_mass.x, total_mass.y)) %>%
    select(-matches('\\.x|\\.y'), -ctscan_label)

    # ID time year_of_birth date_ct leg_mass total_mass
    # <dbl> <fct> <dbl> <date> <dbl> <dbl>
    #1 115 T0 1970 2015-08-04 9.1 14.5
    #2 115 FU1 1970 2018-11-07 956. 1016.
    #3 370 T0 1961 2015-08-04 NA NA
    #4 370 FU1 1961 2020-09-30 21.3 21.3

    关于r - 合并两个数据帧 : specifically merge a selection of columns based on two conditions?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68300310/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com