gpt4 book ai didi

r - fread中的空白未被识别为NA

转载 作者:行者123 更新时间:2023-12-04 02:28:57 24 4
gpt4 key购买 nike

我有一个大文件,必须在R中导入。为此,我使用了freadfread将数字字段中的空格识别为NA,但未将字符和integer64字段中的空格识别为NA。
fread将空格识别为字符字段的空单元格,并将空格识别为0的integer64字段。

当我使用read.table导入相同的数据时,它将所有空格识别为NA。

请找到可复制的示例,

library(data.table)
x1 <- c("","","")
x2 <- c("1006678566","","1011160152")
x3 <- c("","ac","")
x4 <- c("","2","3")
df <- cbind.data.frame(x1,x2,x3,x4)
write.csv(df,"tr.csv")

tr1 <- fread("tr.csv", header=T, fill = T,
sep= ",", na.strings = c("",NA), data.table = F,
stringsAsFactors = FALSE)

tr2 <- read.table("tr.csv", fill = TRUE, header=T,
sep= ",", na.strings = c(""," ", NA),
stringsAsFactors = FALSE)

import by fread

详细输出:
Input contains no \n. Taking this to be a filename to open
[01] Check arguments
Using 4 threads (omp_get_max_threads()=4, nth=4)
NAstrings = [<<>>, <<NA>>]
None of the NAstrings look like numbers.
show progress = 1
0/1 column will be read as integer
[02] Opening the file
Opening file tr.csv
File opened, size = 409 bytes.
Memory mapped ok
[03] Detect and skip BOM
[04] Arrange mmap to be \0 terminated
\n has been found in the input and different lines can end with different line endings (e.g. mixed \n and \r\n in one file). This is common and ideal.
[05] Skipping initial rows if needed
Positioned on line 1 starting: <<"","x1","x2","x3","x4","x5","x>>
[06] Detect separator, quoting rule, and ncolumns
Using supplied sep ','
sep=',' with 7 fields using quote rule 0
Detected 7 columns on line 1. This line is either column names or first data row. Line starts as: <<"","x1","x2","x3","x4","x5","x>>
Quote rule picked = 0
fill=true and the most number of columns found is 7
[07] Detect column types, good nrow estimate and whether first row is column names
'header' changed by user from 'auto' to true
Number of sampling jump points = 1 because (407 bytes from row 1 to eof) / (2 * 407 jump0size) == 0
Type codes (jump 000) : 56A255A Quote rule 0
All rows were sampled since file is small so we know nrow=16 exactly
[08] Assign column names
[09] Apply user overrides on column types
After 0 type and 0 drop user overrides : 56A255A
[10] Allocate memory for the datatable
Allocating 7 column slots (7 - 0 dropped) with 16 rows
[11] Read the data
jumps=[0..1), chunk_size=1048576, total_size=373
Read 16 rows x 7 columns from 409 bytes file in 00:00.042 wall clock time
[12] Finalizing the datatable
Type counts:
1 : bool8 '2'
3 : int32 '5'
1 : int64 '6'
2 : string 'A'
=============================
0.009s ( 22%) Memory map 0.000GB file
0.029s ( 68%) sep=',' ncol=7 and header detection
0.002s ( 5%) Column type detection using 16 sample rows
0.001s ( 2%) Allocation of 16 rows x 7 cols (0.000GB) of which 16 (100%) rows used
0.001s ( 2%) Reading 1 chunks (0 swept) of 1.000MB (each chunk 16 rows) using 1 threads
+ 0.000s ( 0%) Parse to row-major thread buffers (grown 0 times)
+ 0.000s ( 0%) Transpose
+ 0.001s ( 2%) Waiting
0.000s ( 0%) Rereading 0 columns due to out-of-sample type exceptions
0.042s Total

请帮我解决这个问题。

谢谢!

最佳答案

如果要避免在读取文件后进行其他操作,可以尝试使用quote = FALSE写入CSV时。这样可以防止在值周围使用引号" ",并且所有缺少的值现在都应阅读为NA。它看起来应该像这样-

# also turned off row names to prevent an additional column when reading the file.
write.csv(df, "tr.csv", quote = FALSE, row.names = FALSE)
输出-
tr1 <- fread("tr.csv", header=T, fill = T,
sep= ",", na.strings = c("",NA), data.table = F,
stringsAsFactors = FALSE)
tr1

x1 x2 x3 x4
1 NA 1006678566 <NA> NA
2 NA NA ac 2
3 NA 1011160152 <NA> 3

tr2 <- read.table("tr.csv", fill = TRUE, header=T,
sep= ",", na.strings = c(""," ", NA),
stringsAsFactors = FALSE)
tr2

x1 x2 x3 x4
1 NA 1006678566 <NA> NA
2 NA NA ac 2
3 NA 1011160152 <NA> 3

关于r - fread中的空白未被识别为NA,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51019041/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com