gpt4 book ai didi

read.table 自动列名

转载 作者:行者123 更新时间:2023-12-02 21:19:17 25 4
gpt4 key购买 nike

我注意到,当通过

读取大型 csv 文件时
output <- read.table( ..., header = TRUE, sep = ",")

创建的数据框有一些空白列。这些列遵循命名模式

 colnames(output)
"Factor.1" "Factor.2" "etc" "Stuff" "X" "X.1" "X.2" "X.3" "X.4" "X.5"
"X.6" "X.7" "X.8" "X.9" "X.10" "X.11" "X.12" "X.13"
"X.14" "X.15" "X.16" "X.17" "X.18" "X.19" "X.20" "X.21"
"X.22" "X.23" "X.24" "X.25" "X.26" "X.27" "X.28" "X.29"
"X.30" "X.31" "X.32" "X.33"

我注意到在 ?read.table 中它指出

col.names: a vector of optional names for the variables. The default is to use "V" followed by the column number.

为什么它对我使用 X 而不是 V?

编辑:这就是 csv 文件的样子

Date,Duration,Count,Factor 1,Factor 2,Factor 3,Hour,Day,Month,Year,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1/1/2012 0:00,9.99,10,GC,LS,FT,0,7,1,2012,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1/1/2012 1:00,9.63125,8,GC,LS,FT,1,7,1,2012,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1/1/2012 2:00,7.388888889,3,GC,LS,FT,2,7,1,2012,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1/1/2012 3:00,7.087037037,9,GC,LS,FT,3,7,1,2012,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

...

最佳答案

这是来自 read.table() 的相关代码片段

if (header) {
.External(C_readtablehead, file, 1L, comment.char,
blank.lines.skip, quote, sep, skipNul)
if (missing(col.names))
col.names <- first
else if (length(first) != length(col.names))
warning("header and 'col.names' are of different lengths")
}

这是 if (missing(col.names)) col.names <- first这很重要。从那里,我们可以返回并获取 first ,针对这种情况定义为

first <- scan(textConnection(file), what = "", sep = ",", 
nlines = 1, quiet = TRUE, skip = 0, strip.white = TRUE)

结果

#  [1] "Date"     "Duration" "Count"    "Factor 1" "Factor 2" "Factor 3" "Hour"     "Day"      "Month"   
# [10] "Year" "" "" "" "" "" "" "" ""
# [19] "" "" "" "" "" "" "" "" ""
# [28] "" "" "" "" "" "" "" "" ""
# [37] "" "" "" "" "" "" "" ""

然后,make.names()被调用col.names ,得到你的名字

make.names(first, unique = TRUE)
# [1] "Date" "Duration" "Count" "Factor.1" "Factor.2" "Factor.3" "Hour" "Day" "Month"
# [10] "Year" "X" "X.1" "X.2" "X.3" "X.4" "X.5" "X.6" "X.7"
# [19] "X.8" "X.9" "X.10" "X.11" "X.12" "X.13" "X.14" "X.15" "X.16"
# [28] "X.17" "X.18" "X.19" "X.20" "X.21" "X.22" "X.23" "X.24" "X.25"
# [37] "X.26" "X.27" "X.28" "X.29" "X.30" "X.31" "X.32" "X.33"

我们得到X的原因而不是V如文档中所述,是因为 if(header) 之后的下一个条件是

else if (missing(col.names)) 
col.names <- paste0("V", 1L:cols)

但我们从未做到这一点,并且 make.names()连接到 X默认情况下。除了这个解释之外,还有更多内容。最好的办法是浏览 read.table来源(很复杂)。

数据:

file <- "Date,Duration,Count,Factor 1,Factor 2,Factor 3,Hour,Day,Month,Year,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1/1/2012 0:00,9.99,10,GC,LS,FT,0,7,1,2012,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1/1/2012 1:00,9.63125,8,GC,LS,FT,1,7,1,2012,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1/1/2012 2:00,7.388888889,3,GC,LS,FT,2,7,1,2012,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1/1/2012 3:00,7.087037037,9,GC,LS,FT,3,7,1,2012,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"

关于read.table 自动列名,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28619973/

25 4 0