gpt4 book ai didi

R read.table() 从 pc vs mac

转载 作者:可可西里 更新时间:2023-11-01 10:44:05 26 4
gpt4 key购买 nike

如果您是 Mac 用户并运行以下代码,您将获得一个包含 173,962 行的数据框。如果您是 Windows 用户,您的数据集将只有 8,999 行。谁能告诉我为什么?我怎样才能在我的电脑上将数据读入 R?

这是我的数据:.txt file

d<-read.table("Stream4_1.13.16t.txt", sep="", skip=10, quote = "", fill=T, header=F) 

我正在处理由一种独特且不常见的软件(被动集成转发器系统,PIT)创建的检测数据,该软件有时会“打乱”一行数据并产生类似于 Wingdings 字体中的奇怪字符。我的文件是用空格分隔的文本文件。我有预感这些字符可能会导致读取问题,但为什么 Mac 会有所不同?

为了检查是否需要更改编码,我运行了以下命令:

d<-read.table("Stream4_1.13.16t.txt", sep="", fileEncoding="UTF-8", skip=10, quote = "", fill=T, header=F)
d<-read.table("Stream4_1.13.16t.txt", sep="", fileEncoding="latin1", skip=10, quote = "", fill=T, header=F)

得到了这个:警告信息:在 scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : 在输入连接“Stream4_1.13.16t.txt”上发现无效输入

最佳答案

老实说,我不知道答案(虽然我下面的建议可能有值(value))并且我只使用答案框,因为这个评论太大并且需要格式化能力。 (我是 SO 受访者,他告诉您这个文件在 Mac 上很容易阅读。)这是该文件的顶部(使用从您之前的问题下载的数据):

=~=~=~=~=~=~=~=~=~=~=~= PuTTY log 2016.01.13 16:42:02 =~=~=~=~=~=~=~=~=~=~=~=
u uh
NEW 2016-01-08 16:27:25.52 171953
Total 171953

>up
Upload #1
Reader: S4 Site: AA
--------- upload 1 start ---------
E 2024-01-13 15:30:52.11 reader not responding
D 2016-01-08 16:27:34.83 00:00:00.38 HA 900_226000745066 A2 4 77
D 2016-01-08 16:27:34.41 00:00:00.00 HA 900_226000745066 A4 1 218
D 2016-01-08 16:27:34.59 00:00:00.62 HA 900_226000745066 A4 6 1
D 2016-01-08 16:27:34.41 00:00:00.00 HA 900_226000745066 A2 1 35
D 2016-01-08 16:27:34.59 00:00:00.62 HA 900_226000745066 A2 6 1
# snipped a bunch of lines
D 2016-01-08 16:29:26.86 00:00:00.86 HA 900_226000745060 A2 8 85
D 2016-01-08 16:29:28.21 00:00:00.32 HA 900_226000745060 A2 4 3
D 2016-01-08 16:29:28.70 00:00:00.12 HA 900_226000745060 A2 2 1
E 2016-01-08 16:36:00.00 date/time changed. Was 2016-01-08 16:29:37.80
D 2016-01-08 16:36:06.95 00:00:00.26 HA 900_226000745018 A2 3 136
D 2016-01-08 16:36:07.63 00:00:00.12 HA 900_226000745060 A2 2 3
D 2016-01-08 16:36:09.41 00:00:00.25 HA 900_226000745060 A2 3 13
D 2016-01-08 16:36:17.41 00:00:00.48 HA 900_226000745060 A2 5 65
D 2016-01-08 22:09:05.68 00:00:00.00 HA 00000 0 900 226000745000 2260007450?0 A4 1 0
D 2016-01-10 18:16:56.61 00:00:00.00 HA 00000 0 900 22600070 900 2260007450;1 A4 1 1
D 2016-01-08 16:36:30.62 00:00:00.19 HA 900_226000745060 A2 3 7
D 2016-01-08 16:36:31.04 00:00:00.31 HA 900_226000745060 A2 4 1
D 2016-01-08 16:36:33.02 00:00:00.12 HA 900_226000745066 A2 2 13
D 2016-01-08 16:36:41.07 00:00:00.13 HA 900_226000745066 A2 2 65
D 2016-01-10 15:38:39.74 00:00:00.00 HA 00000 0 900 22600074 900 2260007450?6 A2 1 1
D 2016-01-08 16:36:42.19 00:00:00.00 HA 900_226000745066 A2 1 1
D 2016-01-08 16:36:54.03 00:00:00.73 HA 900_226000745060 A2 7 101
D 2016-01-08 16:36:55.14 00:00:00.24 HA 900_226000745060 A2 3 2
D 2016-01-08 16:36:55.56 00:00:00.00 HA 900_226000745060 A2 1 1
D 2016-01-08 16:36:57.96 00:00:00.00 HA 900_226000745060 A2 1 19

尝试用 read.table 读取这个文件是一个注定要失败的希望。对于以“D”开头的行,这是带有数据的行,它是固定宽度格式。一个更明智的方法(一旦你弄清楚了编码和字体混淆)是使用 readLines 创建一个 txt_obj,用 grepl("^D ", txt_obj) 然后用read.fwf解析

关于R read.table() 从 pc vs mac,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34933858/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com