gpt4 book ai didi

R:通过计算另一个数据帧中 CSV 列中字符串的出现次数,将计数出现列添加到数据帧

转载 作者:行者123 更新时间:2023-12-04 12:01:02 26 4
gpt4 key购买 nike

我有一个数据框df1:

df1 <- structure(list(Id = c(0, 1, 3, 4), Support = c(17, 15, 10, 18
), Genes = structure(c(3L, 1L, 4L, 2L), .Label = c("BMP2,TGFB1,BMP3,MAPK12,GDF11,MAPK13,CITED1",
"CBLC,TGFA,MAPK12,YWHAE,YWHAQ,MAPK13,SPRY4", "FOS,BCL2,PIK3CD,NFKBIA,TNFRSF10B",
"MAPK12,YWHAE,YWHAQ,MAPK13,SPRY4,PIK3CD"), class = "factor")), class = "data.frame", row.names = c(NA,
-4L))

和另一个数据框df2:

df2 <- structure(list(V1 = structure(c(6L, 7L, 8L, 4L, 3L, 1L, 5L, 2L, 
9L), .Label = c("BCL2", "BMP3", "CBLC", "CDC23", "CITED1", "FOS",
"MAPK13", "SPRY4", "TGFA"), class = "factor")), class = "data.frame", row.names = c(NA,
-9L))

如何通过计算 Genes 列中 df2 中每个字符串的出现次数来在 df1 中创建一个新列,以实现所需的输出?

    Id    |    Support    |     Genes    |    Counts    |
---------------------------------------------------------
0 | 17 |FOS,BCL2,... | 2 |
1 | 15 |BMP2,TFGB1,...| 3 |
3 | 10 |MAPK12,YWHAE..| 1 |
4 | 18 |CBLC,TGFA,... | 4 |

最佳答案

这是使用 stringr 库的另一个选项。这遍历 df 的 Genes 列并使用 df2 数据框作为模式。

#convert factors columns into characters
df$Genes<-as.character(df$Genes)
df2$V1<-as.character(df2$V1)

library(stringr)
#loop over the strings against the pattern from df2
df$Counts<-sapply(df$Genes, function(x){
sum(str_count(x, df2$V1))
})



df
Id Support Genes Counts
1 0 17 FOS,BCL2,PIK3CD,NFKBIA,TNFRSF10B 2
2 1 15 BMP2,TGFB1,BMP3,MAPK12,GDF11,MAPK13,CITED1 3
3 3 10 MAPK12,YWHAE,YWHAQ,MAPK13,SPRY4,PIK3CD 2
4 4 18 CBLC,TGFA,MAPK12,YWHAE,YWHAQ,MAPK13,SPRY4 4

关于R:通过计算另一个数据帧中 CSV 列中字符串的出现次数,将计数出现列添加到数据帧,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54837433/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com