gpt4 book ai didi

r - R 中是否有用于将列(数据框或表的)自动转换为其原始向量类型的功能

转载 作者:行者123 更新时间:2023-12-04 18:39:17 25 4
gpt4 key购买 nike

实际上,我担心不同向量类型的数据,它是如何产生的。有些列最初是整数或数字类型,但显示为字符类型。

如果我通过 read.csv() 读取数据帧,它会猜测向量的类型并自动转换它们。我找不到相同的 fread()data.table() 。数据附在这里

structure(list(V1 = c("1", "2", "3", "4", "5", "6"), ID = c("109", 
"110", "111", "112", "113", "114"), SignalIntensity = c(7.58043495940162,
11.2698560261255, 8.60063586764357, 9.54355755391806, 10.1812351379984,
8.11689493952339), SNR = c(1.34218273720186, 9.75097840763912,
1.80485348504829, 3.20137685049428, 4.64599368338536, 1.42263609838542
)), .Names = c("V1", "ID", "SignalIntensity", "SNR"), row.names = c(NA,
6L), class = "data.frame")

当我使用 read.csv() 读取数据框时
str(df)

data.frame': 20469 obs. of 4 variables:
$ X : int 1 2 3 4 5 6 7 8 9 10 ...
$ ID : int 109 110 111 112 113 114 116 117 118 119 ...
$ SignalIntensity: num 6.18 10.17 7.29 8.9 9.59 ...
$ SNR : num 0.845 4.384 1.073 2.319 3.713 ...

fread() 和 read.table() 读取相同的数据帧
'data.frame':   20469 obs. of  4 variables:
$ V1 : chr "1" "2" "3" "4" ...
$ ID : chr "109" "110" "111" "112" ...
$ SignalIntensity: num 6.18 10.17 7.29 8.9 9.59 ...
$ SNR : num 0.845 4.384 1.073 2.319 3.713 ...


read.table()
'data.frame': 20470 obs. of 2 variables:
$ V1: int NA 1 2 3 4 5 6 7 8 9 ...
$ V2: chr ",\"ID\",\"SignalIntensity\",\"SNR\"" ",\"109\",6.18230893141024,0.845357691456258" ",\"110\",10.1727771385494,4.38370775906105" ",\"111\",7.29227469267823,1.07257511609212" ...

我想知道任何需要丢失原始向量类型数据的开销。除了 read.csv() 之外的任何自动转换??

编辑: fread(....,verbose=TRUE)
Input contains no \n. Taking this to be a filename to open
File opened, filesize is 0.000949 GB.
Memory mapping ... ok
Detected eol as \r\n (CRLF) in that order, the Windows standard.
Using line 30 to detect sep (the last non blank line in the first 'autostart') ... sep=','
Found 4 columns
First row with 4 fields occurs on line 1 (either column names or first row of data)
All the fields on line 1 are character fields. Treating as the column names.
Count of eol after first data row: 20470
Subtracted 1 for last eol and any trailing empty lines, leaving 20469 data rows
Type codes ( first 5 rows): 4433
Type codes (+ middle 5 rows): 4433
Type codes (+ last 5 rows): 4433
Type codes: 4433 (after applying colClasses and integer64)
Type codes: 4433 (after applying drop or select (if supplied)
Allocating 4 column slots (4 - 0 dropped)
0.001s ( 2%) Memory map (rerun may be quicker)
0.000s ( 1%) sep and header detection
0.004s ( 12%) Count rows (wc -l)
0.001s ( 2%) Column type detection (first, middle and last 5 rows)
0.000s ( 0%) Allocation of 20469x4 result (xMB) in RAM
0.025s ( 82%) Reading data
0.000s ( 0%) Allocation for type bumps (if any), including gc time if triggered
0.000s ( 0%) Coercing data already read in type bumps (if any)
0.000s ( 0%) Changing na.strings to NA
0.030s Total

最佳答案

似乎 fread 中存在一些错误(?)带设置 colClasses (我会等待@Arun 的回复)。同时,您可以使用 type.convert 解决此问题。在通过引用重新分配列的同时读取数据后

indx <- which(sapply(df, is.character))
df[, (indx) := lapply(.SD, type.convert), .SDcols = indx]
str(df)
# Classes ‘data.table’ and 'data.frame': 6 obs. of 4 variables:
# $ V1 : int 1 2 3 4 5 6
# $ ID : int 109 110 111 112 113 114
# $ SignalIntensity: num 7.58 11.27 8.6 9.54 10.18 ...
# $ SNR : num 1.34 9.75 1.8 3.2 4.65 ...
# - attr(*, ".internal.selfref")=<externalptr>

关于r - R 中是否有用于将列(数据框或表的)自动转换为其原始向量类型的功能,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30192001/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com