gpt4 book ai didi

r - 如何根据正则表达式在 dplyr 中分隔列

转载 作者:行者123 更新时间:2023-12-04 10:40:16 25 4
gpt4 key购买 nike

我有以下数据框:

df <- structure(list(X2 = c("BB_137.HVMSC", "BB_138.combined.HVMSC", 
"BB_139.combined.HVMSC", "BB_140.combined.HVMSC", "BB_141.HVMSC",
"BB_142.combined.HMSC-bm")), .Names = "X2", row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))

看起来像这样
> df
# A tibble: 6 x 1
X2
<chr>
1 BB_137.HVMSC
2 BB_138.combined.HVMSC
3 BB_139.combined.HVMSC
4 BB_140.combined.HVMSC
5 BB_141.HVMSC
6 BB_142.combined.HMSC-bm

我想要做的是分成两列(以 . 作为分隔符),将最后一个字段保留为第二列
              col1 col2
BB_137 HVMSC
BB_138.combined HVMSC
BB_139.combined HVMSC
BB_140.combined HVMSC
BB_141 HVMSC
BB_142.combined HMSC-bm

正确的做法是什么?

我的尝试是这样的:
> df %>% separate(X2, into = c("sid","status", "tiss"), sep = "[.]") 
# A tibble: 6 x 3
sid status tiss
* <chr> <chr> <chr>
1 BB_137 HVMSC <NA>
2 BB_138 combined HVMSC
3 BB_139 combined HVMSC
4 BB_140 combined HVMSC
5 BB_141 HVMSC <NA>
6 BB_142 combined HMSC-bm

Warning message: Too few values at 2 locations: 1, 5

最佳答案

我们可以在单独的函数中使用负前瞻作为分隔符。

library(tidyr)
separate(data = df, col = X2, into = c("col1", "col2"), sep = "(\\.)(?!.*\\.)")

# col1 col2
# <chr> <chr>
#1 BB_137 HVMSC
#2 BB_138.combined HVMSC
#3 BB_139.combined HVMSC
#4 BB_140.combined HVMSC
#5 BB_141 HVMSC
#6 BB_142.combined HMSC-bm

正则表达式取自 this回答。

关于r - 如何根据正则表达式在 dplyr 中分隔列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46147639/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com