gpt4 book ai didi

r - 将一个字符串分解成不同行的多个字符串

转载 作者:行者123 更新时间:2023-12-02 08:52:11 25 4
gpt4 key购买 nike

我有一个数据框,其中包含一个长字符串,每个字符串都与一个“样本”相关联:

Sample  Data
1 000000000000000000000000000N01000000000000N0N000000000N00N0000NN00N0N000000100000N00N0N0000000NNNN011111111111111111111111111111110000000000000000000N000000N0000000000N
2 000000000000000000000000000N01000000000000N0N000000000N00N0000NN00N0N000000100000N00N0N0000000NNNN011111111111111111111111111111110000000000000000000N000000N0000000000N

我想编写一个简单的方法,将这个字符串按以下格式分成 5 个部分:

Sample X
CCT6 - Characters 1-33
GAT1 - Characters 34-68
IMD3 - Characters 69-99
PDR3 - Characters 100-130
RIM15 - Characters 131-168

为每个样本提供如下所示的输出:

Sample 1
CCT6 - 000000000000000000000000000N01000
GAT1 - 000000000N0N000000000N00N0000NN00N0
IMD3 - N000000100000N00N0N0000000NNNN0
PDR3 - 1111111111111111111111111111111
RIM15 - 0000000000000000000N000000N0000000000N

我已经能够使用 substr 函数将长字符串分成单独的部分,但我希望能够将其自动化,这样我就可以在一个输出中获得所有 5 个部分。理想情况下,此输出也将是一个数据框。

最佳答案

这就是 ?read.fwf 的用途。

首先是一些看起来像您的问题的数据:

x <- data.frame(Sample = c(1, 2), Data = c("000000000000000000000000000N01000000000000N0N000000000N00N0000NN00N0N000000100000N00N0N0000000NNNN011111111111111111111111111111110000000000000000000N000000N0000000000N", 
"000000000000000000000000000N01000000000000N0N000000000N00N0000NN00N0N000000100000N00N0N0000000NNNN011111111111111111111111111111110000000000000000000N000000N0000000000N"),
stringsAsFactors = FALSE)

现在使用read.fwf,指定每个字段的宽度和它们的名称,并且都应该是character模式。我们将示例数据的文本列包装在 textConnection 中,以便我们可以将其视为 read.* 和其他函数通常理解的连接。

(strs <- read.fwf(textConnection(x$Data), widths = c(33, 35, 31, 31, 38), colClasses = "character", col.names = c("CCT6", "GAT1", "IMD3", "PDR3", "RIM15")))


CCT6 GAT1 IMD3 PDR3 RIM15
1 000000000000000000000000000N01000 000000000N0N000000000N00N0000NN00N0 N000000100000N00N0N0000000NNNN0 1111111111111111111111111111111 0000000000000000000N000000N0000000000N
2 000000000000000000000000000N01000 000000000N0N000000000N00N0000NN00N0 N000000100000N00N0N0000000NNNN0 1111111111111111111111111111111 0000000000000000000N000000N0000000000N

现在遍历行并按照您的示例打印出每一行:

for (i in 1:nrow(strs)) {
writeLines(paste("Sample", i))
writeLines(paste(names(strs), strs[i, ], sep = " - "))
}

给予,例如:

Sample 2
CCT6 - 000000000000000000000000000N01000
GAT1 - 000000000N0N000000000N00N0000NN00N0
IMD3 - N000000100000N00N0N0000000NNNN0
PDR3 - 1111111111111111111111111111111
RIM15 - 0000000000000000000N000000N0000000000N

关于r - 将一个字符串分解成不同行的多个字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/7735227/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com