gpt4 book ai didi

bash - 如何通过多个条件将一列拆分为另一列?

转载 作者:行者123 更新时间:2023-11-29 09:30:17 26 4
gpt4 key购买 nike

我必须创建一个 awk 脚本来实现以下转换:

  1. 列顺序是随机的
  2. 没有固定的结构。其实这是个大问题
  3. FLAG 列必须拆分为 FLAG1FLAG2
  4. FLAG1FLAG2 在以下条件下被填充:

    if the VAL is ":" then NUM is null
    if the VAL is ":" and FLAG "c" then NUM is null and FLAG1 is "c"
    if the VAL is ":" and FLAG "u" then NUM is null and FLAG2 is "u"
    if the VAL is "14,385" and FLAG "d" then NUM is "14385" and FLAG(both) is null
    if the VAL is "14,385" and FLAG "du" then NUM is "14385" and FLAG2 is "u"
    if the VAL is ":" and FLAG "cd" then NUM is null and FLAG1 is "c"
    if the VAL is ":" and FLAG "bc" then NUM is null and FLAG1 is "c" and FLAG2 is "b"
    if the VAL is ":" and FLAG "z" then NUM is 0 and FLAG2 is "z"

csv 输入文件是:

"PRIM",  "TRD",   "GTR",   "VAL",   "FLAG"
"TPP", "T5-78", "HT", ":", c
"TCP", "T5-78", "HT", "12,385", c
"TZP", "T5-78", "HT", ":", z
"TNP", "T5-78", "HT", ":", z
"TNP", "T5-78", "HT", ":", cd
"TNP", "T5-78", "HT", ":", du
"TNP", "T5-78", "HT", "12,524,652", dfg

输出 .dat 文件应如下所示:

PRIM    TRD GTR NUM FLAG1   FLAG2
TPP T5-78 HT null c null
TCP T5-78 HT 12385 c null
TZP T5-78 HT 0 null z
TNP T5-78 HT 0 null z
TNP T5-78 HT null c null
TNP T5-78 HT null null u
TNP T5-78 HT 12524652 null dfg

我试过的代码不能正常工作,因为只满足前 3 个要求,而第 4 个要求不工作。

BEGIN {
FS=","; OFS="\t";
a["PRIM"]=1;a["TRD"]=1;a["GTR"]=1;a["VAL"]=1;a["FLAG"]=1;
}
NR==1 {

{ $a["VAL"] = "NUMB" ; $a["FLAG"] = "FLAG1" ; $5 = "FLAG2" ; print ; next }
$a["VAL"]=="12,385" && $a["FLAG"] == "d" { $a["VAL"] = "14385" ; $a["FLAG"] = $5 = "" }
$a["VAL"]=="12,385" && $a["FLAG"] == "du" { $a["VAL"] = "14385" ; $a["FLAG"] = "" ; $9 = "u" }
$a["VAL"] != ":" { print ; next }
$a["FLAG"] == "z" { $a["VAL"] = "0" ; $a["FLAG"] = "" ; $5 = "z" }
$a["FLAG"] != "z" { $a["VAL"] = "" }

$NF=substr($NF,1,length($NF)-1);
for(i=1;i<=NF;i++) if($i in a) a[$i]=i;
}
{ print $a["PRIM"],$a["TRD"],$a["GTR"],NR==1?"NUM":$a["VAL"],
NR==1?"FLAG1"OFS"FLAG2":($a["FLAG"]?""OFS$a["FLAG"]:$a["FLAG"]);

这是我认为可以使用的最新代码。现在我无法解决的问题是最后一个值(FLAG2)打印在第二行。我尝试放置 OFS 但它没有解决问题。你能告诉我在这种情况下出了什么问题吗?

BEGIN {
FS=",";
OFS="\t";
a["PRIM"]=1;
a["TRD"]=1;
a["GTR"]=1;
a["VAL"]=1;
a["FLAG"]=1;
a["FLAG1"]=1;
a["FLAG2"]=1;
}

NR==1 {
$NF=substr($NF,1,length($NF)-1);
for(i=1;i<=NF;i++)
#if($i in a)
a[$i]=i;

a["FLAG1"] = i;
a["FLAG2"]=i;
a["FLAG1"] = a["FLAG"]; # just for testing and it is ok
a["FLAG2"] = a["FLAG"]; # just for testing and it is ok

}

{

print $a["PRIM"],$a["TRD"],$a["GTR"],NR==1?"NUM":$a["VAL"],
NR==1?"FLAG1":$a["FLAG1"],NR==1?"FLAG2":$a["FLAG2"];

输出是这样的

PRIM    TRD GTR NUM FLAG1   FLAG2
TPP T5-78 HT null c
null
TCP T5-78 HT 12385 c
null
TZP T5-78 HT 0 null
z

经过这么多建议,这是我的最后一个版本,但它仍然不成功......现在,当我添加 if 语句来满足上述要求时,没有任何反应。我相信 if 语句要么不正确,要么放在正确的位置。如果 NR>1 是灾难,则打印值。你能告诉我我的脚本有什么问题吗?我不得不承认我 3 天前开始使用这个 awk,到目前为止它很痛苦......问题是我应该从上周开始完成这个脚本

BEGIN {
FS=",";
OFS="\t";

a["PRIM"]=1;
a["TRD"]=1;
a["GTR"]=1;
a["VAL"]=1;
a["FLAG"]=1;
a["FLAG1"]=1;
a["FLAG2"]=1;
}

NR==1 {

$NF=substr($NF,1,length($NF)-1);
for(i=1;i<=NF;i++)
#if($i in a)
a[$i]=i;

#a["FLAG1"] = a[i];
#a["FLAG2"]=a[i];

a["FLAG1"] = a["FLAG"];
a["FLAG2"] = a["FLAG"];
}

{
#initialisation of the new flags
a["FLAG1"]=="";
a["FLAG2"]=="";
}

#MY IF STATEMENTS GO HERE - TEST MODE

a["FLAG"] == "cd" {a["FLAG1"]= "c"}
a["FLAG"] == "du" {a["FLAG2"]= "u"}

{
#print header
print $a["PRIM"],$a["TRD"],$a["GTR"],NR==1?"NUM":$a["VAL"], NR==1?"FLAG1":$a["FLAG1"],NR==1?"FLAG2":$a["FLAG2"];
}

#print content
NR>1
{
for(j=1;j<=NF;j++)
#if($i in a)
a[$j]=j;

#a["FLAG1"] = a[i];
#a["FLAG2"]=a[i];

a["FLAG1"] = a["FLAG"];
a["FLAG2"] = a["FLAG"];
}
#MY IF STATEMENTS GO HERE - TEST MODE

a["FLAG"] == "cd" {a["FLAG1"]= "c"}
a["FLAG"] == "du" {a["FLAG2"]= "u"}

{
print $a["PRIM"],$a["TRD"],$a["GTR"],$a["VAL"], $a["FLAG1"], $a["FLAG2"]
}

最佳答案

这要求所有输入字段有双引号。

$ echo '"PRIM",  "TRD",   "GTR",   "VAL",   "FLAG"
"TPP", "T5-78", "HT", ":", "c"
"TCP", "T5-78", "HT", "12,385", "c"
"TZP", "T5-78", "HT", ":", "z"
"TNP", "T5-78", "HT", ":", "z"
"TNP", "T5-78", "HT", ":", "cd"
"TNP", "T5-78", "HT", ":", "du"
"TNP", "T5-78", "HT", "12,524,652", "dfg"' |
awk -F '",[ \t]*"' '
{ sub(/^"/, "", $1); sub(/"$/, "", $NF)}
NR == 1 {
for (i=1; i<=NF; i++) col[$i] = i
print "PRIM TRD GTR NUM FLAG1 FLAG2"
next
}
{
f = $col["FLAG"]
v = $col["VAL"]; gsub(/,/, "", v)
num = "null"; flag1 = "null"; flag2 = "null"
}
v == ":" && f == "c" {flag1 = "c"}
v == ":" && f == "u" {flag2 = "u"}
v == "14385" && f == "d" {num = $4}
v == "14385" && f == "du" {num = $4; flag2 = "u"}
v == ":" && f == "cd" {flag1 = "c"}
v == ":" && f == "bc" {flag1 = "c"; flag2 = "b"}
v == ":" && f == "z" {num = 0; flag2 = "z"}
{print $col["PRIM"],$col["TRD"],$col["GTR"],num,flag1,flag2}
'
PRIM TRD GTR NUM FLAG1 FLAG2
TPP T5-78 HT null c null
TCP T5-78 HT null null null
TZP T5-78 HT null null z
TNP T5-78 HT null null z
TNP T5-78 HT null c null
TNP T5-78 HT null null null
TNP T5-78 HT null null null

我的输出看起来不像你的。检查您的规范并确保样本输入足以涵盖它们。

关于bash - 如何通过多个条件将一列拆分为另一列?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23278665/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com