gpt4 book ai didi

r - 合并列,根据其他df更新列,填充NA

转载 作者:行者123 更新时间:2023-12-01 09:15:12 24 4
gpt4 key购买 nike

一开始我想指出,我在 SO 上找到了多种解决方案,但没有一个符合我的期望。

我必须去 DF 的:

1.

E                           F              G        H
chr1_100203723_100203724 NA NA NA
chr1_100212951_100212952 rs760764323 A,G, 0.000008,0.999992,
chr1_10032235_10032236 NA NA NA
chr1_100327060_100327061 NA NA NA
chr1_100346889_100346890 NA NA NA
chr1_100347237_100347238 rs749372877 C,G,T, 0.000008,0.000008,0.999983,
chr1_100357190_100357191 NA NA NA
chr1_100358057_100358058 NA NA NA
chr2_182852606_182852607 NA NA NA
chr2_202492077_202492078 NA NA NA
chr2_203760838_203760839 NA NA NA
chr2_215976351_215976352 NA NA NA
chr2_220354644_220354645 NA NA NA
chr2_234749403_234749404 NA NA NA
chr2_11802110_11802111 NA NA NA
chr2_31167747_31167748 NA NA NA

2.

E                           F               G       H
chr1_100203723_100203724 NA NA NA
chr1_100212951_100212952 NA NA NA
chr1_10032235_10032236 NA NA NA
chr1_100327060_100327061 NA NA NA
chr1_100346889_100346890 NA NA NA
chr1_100347237_100347238 NA NA NA
chr1_100357190_100357191 NA NA NA
chr1_100358057_100358058 NA NA NA
chr2_182852606_182852607 rs773426830 C,T, 0.999967,0.000033,
chr2_202492077_202492078 rs750583431 C,G, 0.000013,0.999987,
chr2_203760838_203760839 NA NA NA
chr2_215976351_215976352 rs113648834 C,T, 0.999934,0.000066,
chr2_220354644_220354645 NA NA NA
chr2_234749403_234749404 NA NA NA
chr2_11802110_11802111 rs371327070 A,G, 0.000044,0.999956,
chr2_31167747_31167748 rs201375957 A,C,T, 0.000008,0.999887,0.000105,

期望的输出:

E                           F               G       H
chr1_100203723_100203724 NA NA NA
chr1_100212951_100212952 rs760764323 A,G, 0.000008,0.999992,
chr1_10032235_10032236 NA NA NA
chr1_100327060_100327061 NA NA NA
chr1_100346889_100346890 NA NA NA
chr1_100347237_100347238 rs749372877 C,G,T, 0.000008,0.000008,0.999983,
chr1_100357190_100357191 NA NA NA
chr1_100358057_100358058 NA NA NA
chr2_182852606_182852607 rs773426830 C,T, 0.999967,0.000033,
chr2_202492077_202492078 rs750583431 C,G, 0.000013,0.999987,
chr2_203760838_203760839 NA NA NA
chr2_215976351_215976352 rs113648834 C,T, 0.999934,0.000066,
chr2_220354644_220354645 NA NA NA
chr2_234749403_234749404 NA NA NA
chr2_11802110_11802111 rs371327070 A,G, 0.000044,0.999956,
chr2_31167747_31167748 rs201375957 A,C,T, 0.000008,0.999887,0.000105,

如您所见,DF1 由 DF2 列 F、G、H 更新,其中列 E 是我的唯一索引。我试图做 merge() 但这个函数没有更新我的行,它只是将 DF2 的列添加到 DF1。我还尝试使用 data.tabletidyverse 进行更新,我的行已经更新,但其他行转到 NAs... 最后我决定用嵌套的 ifelse() 做简单的 lapply(),但是我不知道如何同时更新所有三列,而且这对我来说太慢了每个 DF 中 50000 行数据...

到目前为止我做了什么:

DF1$F <- sapply(1:nrow(DF1), function(i) ifelse(DF1[i,1]==DF2[i,1] & is.na(DF1[i,1]), DF2[i,1], DF[i,1]))

最佳答案

你可以在基础 R 中做到这一点:

as.data.frame(Map(function(x,y) ifelse(is.na(x),y,x),DF1,DF2))

使用库 purrr 您可以拥有更漂亮更紧凑的形式(请参阅 Soto 的回答,了解使用 dplyr 的更紧凑的形式):

library(purrr)
map2_df(DF1,DF2,~ifelse(is.na(.x),.y,.x))

在这两种情况下(技术上,第一种情况是 data.frame,第二种情况是 tibble):

输出

                            E           F      G                           H
1 chr1_100203723_100203724 <NA> <NA> <NA>
2 chr1_100212951_100212952 rs760764323 A,G, 0.000008,0.999992,
3 chr1_10032235_10032236 <NA> <NA> <NA>
4 chr1_100327060_100327061 <NA> <NA> <NA>
5 chr1_100346889_100346890 <NA> <NA> <NA>
6 chr1_100347237_100347238 rs749372877 C,G,T, 0.000008,0.000008,0.999983,
7 chr1_100357190_100357191 <NA> <NA> <NA>
8 chr1_100358057_100358058 <NA> <NA> <NA>
9 chr2_182852606_182852607 rs773426830 C,T, 0.999967,0.000033,
10 chr2_202492077_202492078 rs750583431 C,G, 0.000013,0.999987,
11 chr2_203760838_203760839 <NA> <NA> <NA>
12 chr2_215976351_215976352 rs113648834 C,T, 0.999934,0.000066,
13 chr2_220354644_220354645 <NA> <NA> <NA>
14 chr2_234749403_234749404 <NA> <NA> <NA>
15 chr2_11802110_11802111 rs371327070 A,G, 0.000044,0.999956,
16 chr2_31167747_31167748 rs201375957 A,C,T, 0.000008,0.999887,0.000105,

数据

DF1 <- read.table(text="E                           F              G        H
chr1_100203723_100203724 NA NA NA
chr1_100212951_100212952 rs760764323 A,G, 0.000008,0.999992,
chr1_10032235_10032236 NA NA NA
chr1_100327060_100327061 NA NA NA
chr1_100346889_100346890 NA NA NA
chr1_100347237_100347238 rs749372877 C,G,T, 0.000008,0.000008,0.999983,
chr1_100357190_100357191 NA NA NA
chr1_100358057_100358058 NA NA NA
chr2_182852606_182852607 NA NA NA
chr2_202492077_202492078 NA NA NA
chr2_203760838_203760839 NA NA NA
chr2_215976351_215976352 NA NA NA
chr2_220354644_220354645 NA NA NA
chr2_234749403_234749404 NA NA NA
chr2_11802110_11802111 NA NA NA
chr2_31167747_31167748 NA NA NA",header=T,stringsAsFactors=F)


DF2 <- read.table(text="E F G H
chr1_100203723_100203724 NA NA NA
chr1_100212951_100212952 NA NA NA
chr1_10032235_10032236 NA NA NA
chr1_100327060_100327061 NA NA NA
chr1_100346889_100346890 NA NA NA
chr1_100347237_100347238 NA NA NA
chr1_100357190_100357191 NA NA NA
chr1_100358057_100358058 NA NA NA
chr2_182852606_182852607 rs773426830 C,T, 0.999967,0.000033,
chr2_202492077_202492078 rs750583431 C,G, 0.000013,0.999987,
chr2_203760838_203760839 NA NA NA
chr2_215976351_215976352 rs113648834 C,T, 0.999934,0.000066,
chr2_220354644_220354645 NA NA NA
chr2_234749403_234749404 NA NA NA
chr2_11802110_11802111 rs371327070 A,G, 0.000044,0.999956,
chr2_31167747_31167748 rs201375957 A,C,T, 0.000008,0.999887,0.000105,",header=T,stringsAsFactors=F)

关于r - 合并列,根据其他df更新列,填充NA,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46293590/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com