gpt4 book ai didi

replace - 使用 gawk 查找特定列并将以下列替换为特定值

转载 作者:行者123 更新时间:2023-12-02 13:02:21 25 4
gpt4 key购买 nike

我试图找到我的数据有重复行的所有位置并删除重复行。另外,我正在查找第二列的值为 90 的位置,并将以下第二列替换为我指定的特定数字。

我的数据如下所示:

 #      Type    Response        Acc     RT      Offset    
1 70 0 0 0.0000 57850
2 31 0 0 0.0000 59371
3 41 0 0 0.0000 60909
4 70 0 0 0.0000 61478
5 31 0 0 0.0000 62999
6 41 0 0 0.0000 64537
7 41 0 0 0.0000 64537
8 70 0 0 0.0000 65106
9 11 0 0 0.0000 66627
10 21 0 0 0.0000 68165
11 90 0 0 0.0000 68700
12 31 0 0 0.0000 70221

我希望我的数据看起来像:

 #      Type    Response        Acc     RT      Offset
1 70 0 0 0.0000 57850
2 31 0 0 0.0000 59371
3 41 0 0 0.0000 60909
4 70 0 0 0.0000 61478
5 31 0 0 0.0000 62999
6 41 0 0 0.0000 64537
8 70 0 0 0.0000 65106
9 11 0 0 0.0000 66627
10 21 0 0 0.0000 68165
11 90 0 0 0.0000 68700
12 5 0 0 0.0000 70221

我的代码:

 BEGIN {
priorline = "";
ERROROFFSET = 50;
ERRORVALUE[10] = 1;
ERRORVALUE[11] = 2;
ERRORVALUE[12] = 3;
ERRORVALUE[30] = 4;
ERRORVALUE[31] = 5;
ERRORVALUE[32] = 6;

ORS = "\n";
}

NR == 1 {
print;
getline;
priorline = $0;
}

NF == 6 {

brandnewline = $0
mytype = $2
$0 = priorline
priorField2 = $2;

if (mytype !~ priorField2) {
print;
priorline = brandnewline;
}

if (priorField2 == "90") {
mytype = ERRORVALUE[mytype];
}
}

END {print brandnewline}


##Here the parameters of the brandnewline is set to the current line and then the
##proirline is set to the line on which we just worked on and the brandnewline is
##set to be the next new line we are working on. (i.e line 1 = brandnewline, now
##we set priorline = brandnewline, thus priorline is line 1 and brandnewline takes
##on line 2) Next, the same parameters were set with column 2, mytype being the
##current column 2 value and priorField2 being the same value as mytype moves to
##the next column 2 value. Finally, we wrote an if statement where, if the value
##in column 2 of the current line !~ (does not equal) value of column two of the
##previous line, then the current line will be print otherwise it will just be
##skipped over. The second if statement recognizes the lines in which the value
##90 appeared and replaces the value in column 2 with a previously defined
##ERRORVALUE set for each specific type (type 10=1, 11=2,12=3, 30=4, 31=5, 32=6).

我已经能够成功删除重复行,但是,我无法执行代码的下一部分,即替换我在 BEGIN 中指定为 ERRORVALUES 的值 (10=1, 11=2, 12=3, 30=4, 31=5, 32=6) 以及包含该值的实际列。本质上,我只想用我的 ERRORVALUE 替换该行中的值。

如果有人能帮助我,我将非常感激。

最佳答案

一个挑战是您不能只将一行与前一行进行比较,因为 ID 号会不同。

awk '
BEGIN {
ERRORVALUE[10] = 1
# ... etc
}

# print the header
NR == 1 {print; next}

NR == 2 || $0 !~ prev_regex {
prev_regex = sprintf("^\\s+\\w+\\s+%s\\s+%s\\s+%s\\s+%s\\s+%s",$2,$3,$4,$5,$6)
if (was90) $2 = ERRORVALUE[$2]
print
was90 = ($2 == 90)
}
'

对于第二列被更改的行,这会破坏行格式:

 #      Type    Response        Acc     RT      Offset
1 70 0 0 0.0000 57850
2 31 0 0 0.0000 59371
3 41 0 0 0.0000 60909
4 70 0 0 0.0000 61478
5 31 0 0 0.0000 62999
6 41 0 0 0.0000 64537
8 70 0 0 0.0000 65106
9 11 0 0 0.0000 66627
10 21 0 0 0.0000 68165
11 90 0 0 0.0000 68700
12 5 0 0 0.0000 70221

如果这是一个问题,您可以将 gawk 的输出通过管道传输到 column -t ,或者如果您知道行格式是固定的,请在 awk 程序中使用 printf()。

关于replace - 使用 gawk 查找特定列并将以下列替换为特定值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/9707995/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com