gpt4 book ai didi

r - 将文本向量转换为R中的数字矩阵/data.frame的更快方法?

转载 作者:行者123 更新时间:2023-12-04 04:59:17 26 4
gpt4 key购买 nike

我正在使用R解析一些服务器日志,这些日志生成如下所示的列表:

myLog <- c("[1,2,3]","[4,5,6]","[7,8,9]")

我想从它们产生的是一个矩阵,看起来像这样:
myMatrix <- matrix(c(c(1,2,3),c(4,5,6),c(7,8,9)),nrow=3,byrow=T)

它们来自查询varchar类型的数据库字段,因此我认为我无法使用任何文件读取技巧。

我倾向于一次拥有很多这样的行,几百万行。

我一直在做以下事情,这很慢:
splitDat <- sapply(inputVector,function(y){
y1 <- gsub("\\[","",y)
y2 <- gsub("\\]","",y1)
y3 <- strsplit(y2,split=", ")
y4 <- unlist(y3)
})

有没有更有效的方法?单线正则表达式?

最佳答案

您可以尝试使用stringi包对此向量化

library(stringi)
matrix(as.numeric(unlist(stri_extract_all_regex(myLog, pattern = "\\d"))),
nrow = 3, byrow = TRUE)

# [,1] [,2] [,3]
# [1,] 1 2 3
# [2,] 4 5 6
# [3,] 7 8 9

基准
library(stringi)
library(gsubfn)
library(microbenchmark)

set.seed(123)
myLog <- c("[1,2,3]","[4,5,6]","[7,8,9]")
myLog <- sample(myLog, 1e4, replace = TRUE)

Res <- microbenchmark(
David = matrix(as.numeric(unlist(stri_extract_all_regex(myLog, pattern = "\\d"))), nrow = 3, byrow = TRUE),
Thela = matrix(as.numeric(unlist(strsplit(myLog,"\\[|\\]|,"))),nrow=length(myLog),byrow=TRUE)[,-1],
BD1 = matrix(as.numeric(scan(text=gsub("\\D"," ",myLog),what="")), nrow=length(myLog),byrow=T),
BD2 = matrix(as.numeric(scan(text=gsub("[],[]"," ",myLog), what="")),nrow=length(myLog), byrow=T),
GG1 = read.table(text = gsub("\\D", " ", myLog)),
GG2 = read.pattern(text = myLog, pat = "\\d")
)

Res
# Unit: milliseconds
# expr min lq mean median uq max neval
# David 12.01351 12.90111 16.41127 13.98826 15.62786 101.65117 100
# Thela 25.49944 27.09937 29.83234 28.32153 30.24141 80.79836 100
# BD1 92.39541 94.81445 101.20524 98.07333 102.41877 172.60835 100
# BD2 91.91578 94.66958 104.02773 96.94019 103.99383 206.37865 100
# GG1 91.28813 94.29219 98.63825 96.57544 100.57172 140.97998 100
# GG2 470.43382 514.58552 551.94922 540.86479 570.88711 815.75789 100

boxplot(Res)

关于r - 将文本向量转换为R中的数字矩阵/data.frame的更快方法?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27433853/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com