gpt4 book ai didi

linux - 将CSV文件中的第一行和第三行转换为Shell脚本中的列

转载 作者:行者123 更新时间:2023-12-02 18:46:02 24 4
gpt4 key购买 nike

我需要转置CSV文件中的第一行和第三行,并在其中添加一些关键字。
例如我有一个包含这样的数据的文件。

Col1,Col2,Col3,Col4,Col5,Col6,Col7,Test1_AA_Col8,Test1_AA_Col9,Test1_AA_Col10,Test1_BB_col11,Test1_BB_col12,Test1_BB_col13,Test2_AA_col14,Test2_AA_col15,Test2_AA_cl16,Test2_BB_col17,Test2_BB_col18,Test2_BB_col19,Test3_AA_col20,Test3_AA_col21,Test3_AA_col22,Test3_BB_col23,Test3_BB_col24,Test3_BB_col25,Test4_AA_col26,Test4_AA_cl27,Test4_AA_col28,Test4_BB_col29,Test4_BB_col30,Test4_BB_col31

对象,对象,对象,对象,对象,对象,对象,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64 float64,float64,float64,float64,float64,float64
String,String,String,String,String,String,String,String,“十进制(35,15)”,“十进制(35,15)”,“十进制(35,15)”,“十进制(35,15)”, “Decimal(35,15)”,“Decimal(35,15)”,“Decimal(35,15)”,“Decimal(35,15)”,“Decimal(35,15)”,“Decimal(35, 15)“,”小数(35,15)“,”小数(35,15)“,”小数(35,15)“,”小数(35,15)“,”小数(35,15)“,”小数(35,15)“,”小数(35,15)“,”小数(35,15)“,”小数(35,15)“,”小数(35,15)“,”小数(35,15) )“,”十进制(35,15)“,”十进制(35,15)“,”十进制(35,15)“

我试图换位第一和第三行,我的输出看起来像这样
CAST(Col1 AS String) AS Col1,
CAST(Col2 AS String) AS Col2,
CAST(Col3 AS String) AS Col3,
CAST(Col4 AS String) AS Col4,
CAST(Col5 AS String) AS Col5,
CAST(Col6 AS String) AS Col6,
CAST(Col7 AS String) AS Col7,
CAST(Test1_AA_Col8 AS Decimal(35,15)) AS Test1_AA_Col8,
CAST(Test1_AA_Col9 AS Decimal(35,15)) AS Test1_AA_Col9,
CAST(Test1_AA_Col10 AS Decimal(35,15)) AS Test1_AA_Col10,
CAST(Test1_BB_col11 AS Decimal(35,15)) AS Test1_BB_col11,
CAST(Test1_BB_col12 AS Decimal(35,15)) AS Test1_BB_col12,
CAST(Test1_BB_col13 AS Decimal(35,15)) AS Test1_BB_col13,
CAST(Test2_AA_col14 AS Decimal(35,15)) AS Test2_AA_col14,
CAST(Test2_AA_col15 AS Decimal(35,15)) AS Test2_AA_col15,
CAST(Test2_AA_cl16 AS Decimal(35,15)) AS Test2_AA_cl16,
CAST(Test2_BB_col17 AS Decimal(35,15)) AS Test2_BB_col17,
CAST(Test2_BB_col18 AS Decimal(35,15)) AS Test2_BB_col18,
CAST(Test2_BB_col19 AS Decimal(35,15)) AS Test2_BB_col19,
CAST(Test3_AA_col20 AS Decimal(35,15)) AS Test3_AA_col20,
CAST(Test3_AA_col21 AS Decimal(35,15)) AS Test3_AA_col21,
CAST(Test3_AA_col22 AS Decimal(35,15)) AS Test3_AA_col22,
CAST(Test3_BB_col23 AS Decimal(35,15)) AS Test3_BB_col23,
CAST(Test3_BB_col24 AS Decimal(35,15)) AS Test3_BB_col24,
CAST(Test3_BB_col25 AS Decimal(35,15)) AS Test3_BB_col25,
CAST(Test4_AA_col26 AS Decimal(35,15)) AS Test4_AA_col26,
CAST(Test4_AA_cl27 AS Decimal(35,15)) AS Test4_AA_cl27,
CAST(Test4_AA_col28 AS Decimal(35,15)) AS Test4_AA_col28,
CAST(Test4_BB_col29 AS Decimal(35,15)) AS Test4_BB_col29,
CAST(Test4_BB_col30 AS Decimal(35,15)) AS Test4_BB_col30,
) AS Test4_BB_col31

这是我编写的代码。
hadoop fs -text $f | sed -n '1p;3p' | awk -F, '{for (i=1; i<=NF; i++) a[i,NR=$i; max=(max<NF?NF:max)} END {for (i=1; i<=max; i++) {for (j=1; j<=NR;j++) printf "%s%s" , a[i,j], (i==NR?RS:FS) }}'

这是我想要实现的。
CAST(Col1 AS String) AS Col1,
CAST(Col2 AS String) AS Col2,
CAST(Col3 AS String) AS Col3,
CAST(Col4 AS String) AS Col4,
CAST(Col5 AS String) AS Col5,
CAST(Col6 AS String) AS Col6,
CAST(Col7 AS String) AS Col7,
CAST(Test1_AA_Col8 AS Decimal(35,15)) AS Test1_AA_Col8,
CAST(Test1_AA_Col9 AS Decimal(35,15)) AS Test1_AA_Col9,
CAST(Test1_AA_Col10 AS Decimal(35,15)) AS Test1_AA_Col10,
CAST(Test1_BB_col11 AS Decimal(35,15)) AS Test1_BB_col11,
CAST(Test1_BB_col12 AS Decimal(35,15)) AS Test1_BB_col12,
CAST(Test1_BB_col13 AS Decimal(35,15)) AS Test1_BB_col13,
CAST(Test2_AA_col14 AS Decimal(35,15)) AS Test2_AA_col14,
CAST(Test2_AA_col15 AS Decimal(35,15)) AS Test2_AA_col15,
CAST(Test2_AA_cl16 AS Decimal(35,15)) AS Test2_AA_cl16,
CAST(Test2_BB_col17 AS Decimal(35,15)) AS Test2_BB_col17,
CAST(Test2_BB_col18 AS Decimal(35,15)) AS Test2_BB_col18,
CAST(Test2_BB_col19 AS Decimal(35,15)) AS Test2_BB_col19,
CAST(Test3_AA_col20 AS Decimal(35,15)) AS Test3_AA_col20,
CAST(Test3_AA_col21 AS Decimal(35,15)) AS Test3_AA_col21,
CAST(Test3_AA_col22 AS Decimal(35,15)) AS Test3_AA_col22,
CAST(Test3_BB_col23 AS Decimal(35,15)) AS Test3_BB_col23,
CAST(Test3_BB_col24 AS Decimal(35,15)) AS Test3_BB_col24,
CAST(Test3_BB_col25 AS Decimal(35,15)) AS Test3_BB_col25,
CAST(Test4_AA_col26 AS Decimal(35,15)) AS Test4_AA_col26,
CAST(Test4_AA_cl27 AS Decimal(35,15)) AS Test4_AA_cl27,
CAST(Test4_AA_col28 AS Decimal(35,15)) AS Test4_AA_col28,
CAST(Test4_BB_col29 AS Decimal(35,15)) AS Test4_BB_col29,
CAST(Test4_BB_col30 AS Decimal(35,15)) AS Test4_BB_col30,
CAST(Test4_BB_col31 AS Decimal(35,15)) AS Test4_BB_col31

最佳答案

使用GNU awk从给定样本到预期输出(如果使用FPAT,而不是FS):

$ awk '
BEGIN {
# FS="," # no quoted fields
FPAT="([^,]*)|(\"[^\"]+\")"
}
NR==1 {
for(i=1;i<=NF;i++)
a[i]=$i
}
NR==3 {
for(i=1;i<=NF;i++) {
gsub(/^"|"$/,"",$i)
printf "CAST(%s AS %s) AS %s%s\n",a[i],$i,a[i],(i==NF?"":",")
}
exit
}' file

输出:
CAST(Col1 AS String) AS Col1,
CAST(Col2 AS String) AS Col2,
CAST(Col3 AS Bigint) AS Col3,
CAST(Col4 AS Int) AS Col4,
CAST(Col5 AS String) AS Col5

一线为您的管道:
$ ... | awk 'BEGIN{FPAT="([^,]*)|(\"[^\"]+\")"}NR==1{for(i=1;i<=NF;i++)a[i]=$i}NR==3{for(i=1;i<=NF;i++){gsub(/^"|"$/,"",$i);printf "CAST(%s AS %s) AS %s%s\n",a[i],$i,a[i],(i==NF?"":",")}exit}'

更新了带引号字段的输出:
CAST(Col1 AS String) AS Col1,
CAST(Col2 AS String) AS Col2,
CAST(Col3 AS Bigint) AS Col3,
CAST(Col4 AS Int) AS Col4,
CAST(Col5 AS String) AS Col5,
CAST(Col6 AS Decimal (35,2)) AS Col6

更新少一些awtks的 FPAT版本。它有一个入门级解析器来处理双引号(单对)中的逗号( \"处理不正确):
awk '
function parse(str, i,j,n,q) {
for(i=1;i<=length(str);i++) {
if(i==length(str)||(substr(str,i+1,1)=="," && q==0)) {
b[++n]=substr(str,j+1,i-j)
j=i+1
}
if(substr(str,i+1,1)=="\"")
q=(!q)
}
return n
}
BEGIN {
FS=","
}
NR==1 {
for(i=1;i<=NF;i++)
a[i]=$i
}
NR==3 {
n=parse($0)
for(i=1;i<=n;i++) {
gsub(/^"|"$/,"",b[i])
printf "CAST(%s AS %s) AS %s%s\n",a[i],b[i],a[i],(i==n?"":",")
}
exit
}' file

关于linux - 将CSV文件中的第一行和第三行转换为Shell脚本中的列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58796521/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com