我有一个文件test1.log
04/15/2016 02:22:46 PM - kneaddata.knead_data - INFO: Running kneaddata v0.5.1
04/15/2016 02:22:46 PM - kneaddata.utilities - INFO: Decompressing gzipped file ...
Input Reads: 69766650 Surviving: 55798391 (79.98%) Dropped: 13968259 (20.02%)
TrimmomaticSE: Completed successfully
04/15/2016 02:32:04 PM - kneaddata.utilities - DEBUG: Checking output file from Trimmomatic : /home/liaoming/kneaddata_v0.5.1/WGC066610D/WGC066610D_kneaddata.trimmed.fastq
04/15/2016 05:32:31 PM - kneaddata.utilities - DEBUG: 55798391 reads; of these:
55798391 (100.00%) were unpaired; of these:
55775635 (99.96%) aligned 0 times
17313 (0.03%) aligned exactly 1 time
5443 (0.01%) aligned >1 times
0.04% overall alignment rate
以及其他格式相同但内容不同的文件,例如test2.log、test3.log到test60.log
我想从这些文件中提取两个数字。例如 test1.log
,这两个数字将为 55798391 55775635。
所以最终生成的文件counts.txt
将是这样的:
测试1 55798391 55775635
测试2 51000000 40000000
......
测试60 5000000 30000000
awk
来救援!
$ awk 'FNR==9{f=$1} FNR==10{print FILENAME,f,$1}' test{1..60}.log
如果不在同一目录中,则在循环内调用或创建文件列表并通过管道传输到 xargs awk
$ for i in {1..60}; do awk ... test$i/test$i.log; done
$ for i in {1..60}; do echo test$i/test$i.log; done | xargs awk ...
我是一名优秀的程序员,十分优秀!