gpt4 book ai didi

r - 使用 R 导入杂乱的数据

转载 作者:行者123 更新时间:2023-12-04 14:15:37 27 4
gpt4 key购买 nike

有没有人对以适当的形式将以下数据导入 R 有任何想法?我试过 strsplit 函数为:test <- strsplit(test,"[[:space:]]+")其中 test 是包含以下杂乱数据的文件名。不知何故,我最终只有一个字符变量。我想以适当的形式有八个不同的变量。请你帮助我好吗?

Black Eagles    01/12 - 12/11   1500 W  7.0 420 48  Away +3
Blue State 02/18 - 04/21 1293 L 8.0 490 48 Home +1
Hawks 01/13 - 02/17 1028 L 4.0 46 460 Away
New Apple 09/23 - 11/23 563 L 3.0 470 47 Home +2
Black White 07/05 - 09/26 713 L 5.2 500 45 Home +4
PBO 10/24 - 10/30 1495 L 1.9 47 410 Away

最佳答案

这怎么样?

> nicelyFormatted
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] "Black Eagles" "01/12" "12/11" "1500" "W" "7.0" "420" "48" "Away" "+3"
[2,] "Blue State" "02/18" "04/21" "1293" "L" "8.0" "490" "48" "Home" "+1"
[3,] "Hawks" "01/13" "02/17" "1028" "L" "4.0" "46" "460" "Away" NA
[4,] "New Apple" "09/23" "11/23" "563" "L" "3.0" "470" "47" "Home" "+2"
[5,] "Black White" "07/05" "09/26" "713" "L" "5.2" "500" "45" "Home" "+4"
[6,] "PBO" "10/24" "10/30" "1495" "L" "1.9" "47" "410" "Away" NA

这是用于获取上表的代码:
library(stringr)

# Open Connection to file
pathToFile <- path.expand("~/path/to/file/myfile.txt")
f <- file(pathToFile, "rb")

# Read in lines
rawText <- readLines(f)


# Find the dahses
dsh <- str_locate_all(rawText, " - ")

# Splice, using the dashes as a guide
lng <- length(rawText)
spliced <- sapply(1:lng, function(i)
spliceOnDash(rawText[[i]], dsh[[c(i, 1)]], dsh[[c(i, 2)]])
)

# make it purtty
nicelyFormatted <- formatNicely(spliced)
nicelyFormatted


#-------------------#
# FUNCTIONS #
#-------------------#


spliceOnDash <- function(strn, start, end) {

# split around the date
pre <- substr(strn, 1, start-6)
dates <- substr(strn, start-5, end+5)
post <- substr(strn, end+6, str_length(strn))

# Clean up
pre <- str_trim(pre)

# replace all double spaces with single spaces
while(str_detect(post, " ")) {
post <- str_replace_all(str_trim(post), " ", " ")
}

# splice on space
post <- str_split(post, " ")

# if dates are one field, remove this next line
dates <- str_split(dates, " - ")

# return
c(unlist(pre), unlist(dates), unlist(post))
}

# Function to clean up the list into a nice table
formatNicely <- function(spliced) {
lngst <- max(sapply(spliced, length))
t(sapply(spliced, function(x)
if(length(x) < lngst) c(x, rep(NA, lngst-length(x))) else x ))
}

关于r - 使用 R 导入杂乱的数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13355148/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com