gpt4 book ai didi

r - 是否需要使用 awk 预处理文件,还是可以直接在 R 中完成?

转载 作者:行者123 更新时间:2023-12-04 09:42:39 26 4
gpt4 key购买 nike

我曾经用 awk 处理 csv 文件,这是我的第一个脚本:

tail -n +2 shifted_final.csv | awk -F, 'BEGIN {old=$2} {if($2!=old){print $0; old=$2;}}' | less

此脚本在第 2 列中查找重复值(如果第 n 行的值与第 n+1、n+2 ... 行的值相同)并仅打印第一次出现。例如,如果您提供以下输入:
ord,orig,pred,as,o-p
1,0,0,1.0,0
2,0,0,1.0,0
3,0,0,1.0,0
4,0,0,0.0,0
5,0,0,0.0,0
6,0,0,0.0,0
7,0,0,0.0,0
8,0,0,0.0,0
9,0,0,0.0,0
10,0,0,0.0,0
11,0,0,0.0,0
12,0,0,0.0,0
13,0,0,0.0,0
14,0,0,0.0,0
15,0,0,0.0,0
16,0,0,0.0,0
17,0,0,0.0,0
18,0,0,0.0,0
19,0,0,0.0,0
20,0,0,0.0,0
21,0,0,0.0,0
22,0,0,0.0,0
23,4,0,0.0,4
24,402,0,1.0,402
25,0,0,1.0,0

然后输出将是:
1,0,0,1.0,0
23,4,0,0.0,4
24,402,0,1.0,402
25,0,0,1.0,0

编辑:
我让添加第二个脚本有点困难:

第二个脚本执行相同的操作,但打印最后一次重复出现:
tail -n +2 shifted_final.csv | awk -F, 'BEGIN {old=$2; line=$0} {if($2==old){line=$0}else{print line; old=$2; line=$0}} END {print $0}' | less

它的输出将是:
22,0,0,0.0,0
23,4,0,0.0,4
24,402,0,1.0,402
25,0,0,1.0,0

我认为 R 是应该处理此类任务的强大语言,但我发现只有有关从 R 等调用 awk 脚本的问题。如何在 R 中执行此操作?

最佳答案

关于您的问题的更新,一个更通用的解决方案,感谢@nicola:

Idx.first <- c(TRUE, tbl$orig[-1] != tbl$orig[-nrow(tbl)])
##
R> tbl[Idx.first,]
# ord orig pred as o.p
# 1 1 0 0 1 0
# 23 23 4 0 0 4
# 24 24 402 0 1 402
# 25 25 0 0 1 0

如果您想使用 最后 在运行中出现一个值,而不是 第一 , 只需追加 TRUE到@nicola 的索引表达式,而不是在前面添加它:
Idx.last <- c(tbl$orig[-1] != tbl$orig[-nrow(tbl)], TRUE)
##
R> tbl[Idx.last,]
# ord orig pred as o.p
# 22 22 0 0 0 0
# 23 23 4 0 0 4
# 24 24 402 0 1 402
# 25 25 0 0 1 0

无论哪种情况, tbl$orig[-1] != tbl$orig[-nrow(tbl)]将第 2 列中的第 2 到第 n 个值与第 2 列中的第 1 到 n-1 个值进行比较。结果是一个逻辑向量,其中 TRUE元素表示连续值的变化。由于比较的长度为 n-1,因此推送额外的 TRUE前面的值(情况 1)将选择运行中的第一次出现,同时添加额外的 TRUE到后面(情况 2)将选择运行中的最后一个事件。

数据:
tbl <- read.table(text = "ord,orig,pred,as,o-p
1,0,0,1.0,0
2,0,0,1.0,0
3,0,0,1.0,0
4,0,0,0.0,0
5,0,0,0.0,0
6,0,0,0.0,0
7,0,0,0.0,0
8,0,0,0.0,0
9,0,0,0.0,0
10,0,0,0.0,0
11,0,0,0.0,0
12,0,0,0.0,0
13,0,0,0.0,0
14,0,0,0.0,0
15,0,0,0.0,0
16,0,0,0.0,0
17,0,0,0.0,0
18,0,0,0.0,0
19,0,0,0.0,0
20,0,0,0.0,0
21,0,0,0.0,0
22,0,0,0.0,0
23,4,0,0.0,4
24,402,0,1.0,402
25,0,0,1.0,0",
header = TRUE,
sep = ",")

关于r - 是否需要使用 awk 预处理文件,还是可以直接在 R 中完成?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33742758/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com