I am taking the following data and trying to output the average with awk.
我正在获取以下数据,并尝试使用awk输出平均值。
$ cat chores.csv
Chore Name,Assigned to,estimate,done?
Laundry,Chelsey,45,N
Wash Windows,Sam,60,Y
Mop kitchen,Sam,20,N
Clean cookware,Chelsey,30,N
Unload dishwasher,Chelsey,10,N
Dust living room,Chelsey,20,N
Wash the dog,Sam,40,N
Here is the script I wrote:
以下是我写的剧本:
#!/bin/awk
BEGIN {
NR>1
FS=","
} $3 > 0 {
i++ ; tot+=$3
avg=tot/i
}
END{
printf "\nAverage: %.2f\n ", avg
}
When I run it, I get an incorrect output
当我运行它时,我得到了不正确的输出
awk -f avg.awk chores.csv
Average: 28.12
The answer should be 32.14
答案应该是32.14
更多回答
Add debugging prints, especially at the end print "tot="tot "\ti=" i
. Worst case is to add debugging in the main loop. Good luck.
添加调试打印,特别是在打印“tot=”tot“\ti=”i的末尾。最糟糕的情况是在主循环中添加调试。祝好运。
Only the conditions immediately before a block gate whether that block is run. When you put NR>1
in the BEGIN
instead of before the { i++; tot+=$3 }
, you stop it from having any use.
只有紧接在块之前的条件才能确定该块是否正在运行。当你把nr>1放在开头,而不是放在{i++;tot+=$3}之前,你就停止了它的任何用途。
BTW, think of computing avg
in the END block so you're doing it only once instead of over and over for every line..
顺便说一句,想想在END块中计算avg,这样你就只做一次,而不是一遍又一遍地为每一行计算。
Once you get an answer to the question you asked, make sure to add a test for i
being non-zero before you try to divide by it, e.g. print (i ? tot/i : 0)
or similar.
一旦您得到了问题的答案,请确保在尝试除以i之前添加i为非零的测试,例如print(i?TOT/I:0)或类似。
You are counting the header line, even though it doesn't have the number you want.
您正在计算标题行,即使它没有您想要的数字。
Change to:
更改为:
#!/bin/awk
BEGIN {
FS=","
}
NR > 1 && $3 > 0 { # NR > 1 check moved here
i++;
tot += $3
}
END {
avg=tot/i
printf "\nAverage: %.2f\n ", avg
}
This also removes the NR > 1
from the BEGIN
block, where it's not needed, and calculates the average only once, in the END
, instead of for each row, as you're only printing that in the end anyway. Makes the code a bit cleaner.
这还从BEGIN块中删除了不需要的NR>1,并在最后只计算一次平均值,而不是每行,因为无论如何都只打印一次。使代码更简洁一些。
Your attempt at screening out the header line clearly isn't working. An obvious possibility would be something on this general order (untested, but simple enough I'd expect it to work anyway):
你试图筛选出标题行的尝试显然没有奏效。一种明显的可能性是这样的一般顺序(未经测试,但很简单,我预计它会起作用):
$3 ~ /^[0-9]+$/ {
i++;
tot+=$3;
avg=tot/i;
}
Personally, I'd probably compute avg
only once, in the END
clause though. It's not clear to me what the NR>1
is intended to do. Maybe you intended it to be part of a pattern instead of an action? And even with a trivial awk script, it's worth the trouble to indent decently, so the script would looks something like this:
就我个人而言,我可能只会在End子句中计算一次avg。我不清楚NR>1的目的是什么。也许你打算让它成为一种模式的一部分,而不是一种行为?即使使用一个简单的awk脚本,也值得费力适当地缩进,因此该脚本将如下所示:
#!/bin/awk
BEGIN {
FS=","
}
$3 ~ /^[0-9]+$/ {
i++ ;
tot+=$3
}
END{
avg=tot/i
printf "\nAverage: %.2f\n ", avg
}
To simplify the script for another answer:
要简化另一个答案的脚本,请执行以下操作:
awk -F, '$3 > 0 { i++; tot += $3 } END { printf "\nAverage: %.2f\n ", tot/(i-1) }'
You do not need in gawk
to skip first line, it is interpreted as 0
. Also division can be added in printf
command.
你不需要在gawk中跳过第一行,它被解释为0。也可以在printf命令中添加除法。
Also IMHO estimate column can be zero, but can't be negative so you can skip the check and also not required to use counter (variable i
). So the script ca become even sampler
此外,IMHO估计列可以为零,但不能为负,因此您可以跳过检查,也不需要使用计数器(变量i)。因此,剧本可以变得更具采样性
awk -F, '{ tot += $3 } END { printf "\nAverage: %.2f\n ", tot/(NR-1) }'
更多回答
@CharlesDuffy: Although "apparently" is often used to mean something like "probably", I'm using its real meaning, so this sentence is essentially equivalent to: "It is apparent that your attempt at screening out the header line isn't working."
@CharlesDuffy:虽然“显然”经常被用来表示“可能”之类的意思,但我使用的是它的真正含义,所以这句话基本上等同于:“很明显,你试图筛选出标题行的尝试没有奏效。”
@CharlesDuffy: I guess--I've edited to strengthen the statement a bit. Not sure it makes a big difference, but I guess it doesn't hurt anything, anyway.
@CharlesDuffy:我想--我做了一些编辑,以加强这一声明。我不确定这会有什么不同,但我想无论如何都不会有什么伤害。
Those versions aren't equivalent. The second script includes 0s in the average while the first one didn't. Granted, there are no 0s in the example data, but that doesn't mean they can't occur.
这些版本并不等同。第二个脚本在平均值中包含0,而第一个脚本中没有。当然,示例数据中没有0,但这并不意味着它们不会出现。
@MaksVerver, correct. The OP do not mention about zeroes, so I just add example in this direction :)
@MaksVerver,正确。OP没有提到零,所以我只在这个方向上添加了示例:)
我是一名优秀的程序员,十分优秀!