gpt4 book ai didi

bash - 如何使用 awk 按列合并两个文件?

转载 作者:行者123 更新时间:2023-12-02 07:58:06 29 4
gpt4 key购买 nike

我有以下两个文本文件:

文件1

-7.7
-7.4
-7.3
-7.3
-7.3

文件2

4.823
5.472
5.856
4.770
4.425

我想并排合并它们,用逗号分隔:

文件3

-7.7,4.823
-7.4,5.472
-7.3,5.856
-7.3,4.770
-7.3,4.425

我知道这可以通过 paste -d ',' file1 file2 > file3 轻松完成,但我想要一个允许我控制每次迭代的解决方案,因为我的数据集很大而且我还需要将其他列添加到输出文件。例如:

A,-7.7,4.823,3
A,-7.4,5.472,2
B,-7.3,5.856,3
A,-7.3,4.770,1
B,-7.3,4.425,1

这是我到目前为止得到的:

awk 'NR==FNR {a[$count]=$1; count+=1; next} {print a[$count] "," $1; count+=1;}' file1 file2 > file3

输出:

-7.3,4.823
-7.3,5.472
-7.3,5.856
-7.3,4.770
-7.3,4.425

我是 bash 和 awk 的新手,所以如果有详细的回复我将不胜感激:)

编辑:
假设我有一个包含成对文件的目录,以两个扩展名结尾:.ext1 和 .ext2。这些文件的名称中包含参数,例如 file_0_par1_par2.ext1 有它的一对,file_0_par1_par2.ext2。每个文件包含 5 个值。我有一个函数可以从它的名字中提取它的序列号和参数。我的目标是在单个 csv 文件 (file_out.csv) 上写入文件中存在的值以及从其名称中提取的参数。
代码:

for file1 in *.ext1 ; do
for file2 in *.ext2 ; do
# for each file ending with .ext2, verify if it is file1's corresponding pair
# I know this is extremely time inefficient, since it's a O(n^2) operation, but I couldn't find another alternative
if [[ "${file1%.*}" == "${file2%.*}" ]] ; then
# extract file_number, and par1, par2 based on some conditions, then append to the csv file
paste -d ',' "$file1" "$file2" | while IFS="," read -r var1 var2;
do
echo "$par1,$par2,$var1,$var2,$file_number" >> "file_out.csv"
done
fi
done
done

最佳答案

有效执行更新后的问题描述的方法:

Suppose I have a directory with pairs of files, ending with two extensions: .ext1 and .ext2. Those files have parameters included in their names, for example file_0_par1_par2.ext1 has its pair, file_0_par1_par2.ext2. Each file contains 5 values. I have a function to extract its serial number and its parameters from its name. My goal is to write, on a single csv file (file_out.csv), the values present in the files along with the parameters extracted from their names.

for file1 in *.ext1 ; do
for file2 in *.ext2 ; do
# for each file ending with .ext2, verify if it is file1's corresponding pair
# I know this is extremely time inefficient, since it's a O(n^2) operation, but I couldn't find another alternative
if [[ "${file1%.*}" == "${file2%.*}" ]] ; then
# extract file_number, and par1, par2 based on some conditions, then append to the csv file
paste -d ',' "$file1" "$file2" | while IFS="," read -r var1 var2;
do
echo "$par1,$par2,$var1,$var2,$file_number" >> "file_out.csv"
done
fi
done
done

将是(未经测试):

for file1 in *.ext1; do
base="${file1%.*}"
file2="${base}.ext2"
paste -d ',' "$file1" "$file2" |
awk -v base="$base" '
BEGIN { split(base,b,/_/); FS=OFS="," }
{ print b[3], b[4], $1, $2, b[2] }
'
done > 'file_out.csv'

base="${file1%.*}"; file2="${base}.ext2" 本身比 for file2 in *.ext2 效率高 N^2 倍(给定 N 对文件);做 if [[ "${file1%.*}"== "${file2%.*}"]] ;然后 并执行 | awk '...' 本身比 | 效率高一个数量级而 IFS=","读 -r var1 var2; echo ...;完成(请参阅 why-is-using-a-shell-loop-to-process-text-considered-bad-practice ),因此您可以期望看到现有脚本在性能上的巨大改进。

关于bash - 如何使用 awk 按列合并两个文件?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60875025/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com