gpt4 book ai didi

file - 从 awk 脚本打印文本 block 到文件 [banner like]

转载 作者:行者123 更新时间:2023-12-02 04:29:04 24 4
gpt4 key购买 nike

我有 awk 脚本进行一些处理并将其输出发送到文件。我如何在我的 awk 程序的 BEGIN block 中写出一条类似横幅的消息首先到那个文件,比如 bash heredoc

我知道我可以使用多个 print 命令,但是有什么方法可以让一个 print 命令,但保留带有换行符等的多行文本。

所以输出应该是这样的:

#########################################
# generated by some author #
# ENVIRON["VAR"]
#########################################

漂亮格式的另一个问题是 ENVIRON["VAR"] 应该是在字符串中间展开。

最佳答案

简单的方法是使用 heredoc 并将其保存在 awk 变量中:

VAR="whatever"
awk -v var="\
#########################################
# generated by some author #
# $VAR
#########################################" '
BEGIN{ print var }
'
#########################################
# generated by some author #
# whatever
#########################################

或者,这可能比您想要的更多,但下面是我用来提供比 awk 中的此处文档更好的东西的命令。我发现在将模板文本添加到多个文件时,它绝对是无价的。

这是一个 shell 脚本,它接受一个带有稍微扩展语法的 awk 脚本(以方便此处的文档)作为输入,调用 gawk 将扩展语法转换为普通的 awk 打印语句,然后再次调用 gawk 来执行生成的脚本。

我称它为“epawk”,表示“扩展打印”awk,下面是该工具以及几个如何使用它的示例。当您调用它而不是直接调用 awk 时,您可以编写脚本,其中包含用于打印的预格式化文本 block ,就像您想要使用 here-doc 一样(每个 # 之前的空格是一个制表符字符):

$ export VAR="whatever"
$ epawk 'BEGIN {
print <<-!
#########################################
# generated by some author #
# "ENVIRON["VAR"]"
#########################################
!
}'
#########################################
# generated by some author #
# whatever
#########################################

它的工作原理是从您的 awk 脚本创建一个 awk 脚本,然后执行它。如果你只是想查看正在生成的脚本,epawk 将打印生成的脚本而不是执行它,如果你给它 -X 参数,例如:

$ epawk -X 'BEGIN {
print <<-!
#########################################
# generated by some author #
# "ENVIRON["VAR"]"
#########################################
!
}'
BEGIN {
print "#########################################"
print "# generated by some author #"
print "# "ENVIRON["VAR"]""
print "#########################################"
}

脚本:

$ cat epawk
#!/usr/bin/env bash
# The above must be the first line of this script as bash or zsh is
# required for the shell array reference syntax used in this script.

##########################################################
# Extended Print AWK
#
# Allows printing of pre-formatted blocks of multi-line text in awk scripts.
#
# Before invoking the tool, do the following IN ORDER:
#
# 1) Start each block of pre-formatted text in your script with
# print << TERMINATOR
# on it's own line and end it with
# TERMINATOR
# on it's own line. TERMINATOR can be any sequence of non-blank characters
# you like. Spaces are allowed around the symbols but are not required.
# If << is followed by -, e.g.:
# print <<- TERMINATOR
# then all leading tabs are removed from the block of pre-formatted
# text (just like shell here documents), if it's followed by + instead, e.g.:
# print <<+ TERMINATOR
# then however many leading tabs are common across all non-blank lines
# in the current pre-formatted block are removed.
# If << is followed by =, e.g.
# print <<= TERMINATOR
# then whatever leading white space (tabs or blanks) occurs before the
# "print" command will be removed from all non-blank lines in
# the current pre-formatted block.
# By default no leading spaces are removed. Anything you place after
# the TERMINATOR will be reproduced as-is after every line in the
# post-processed script, so this for example:
# print << HERE |"cat>&2"
# foo
# HERE
# would cause "foo" to be printed to stderr.
#
# 2) Within each block of pre-formatted text only:
# a) Put a backslash character before every backslash (\ -> \\).
# b) Put a backslash character before every double quote (" -> \").
# c) Enclose awk variables in double quotes without leading
# backslashes (awkVar -> "awkVar").
# d) Enclose awk record and field references ($0, $1, $2, etc.)
# in double quotes without leading backslashes ($1 -> "$1").
#
# 3) If the script is specified on the command line instead of via
# "-f script" then replace all single quote characters (') in or out
# of the pre-formatted blocks with their ANSI octal escape sequence (\047)
# or the sequence '\'' (tick backslash tick tick). This is normal and is
# required because command-line awk scripts cannot contain single quote
# characters as those delimit the script. Do not use hex \x27, see
# http://awk.freeshell.org/PrintASingleQuote.
#
# Then just use it like you would gawk with the small caveat that only
# "-W <option>", not "--<option>", is supported for long options so you
# can use "-W re-interval" but not "--re-interval" for example.
#
# To just see the post-processed script and not execute it, call this
# script with the "-X" option.
#
# See the bottom of this file for usage examples.
##########################################################

expand_prints() {

gawk '

!inBlock {
if ( match($0,/^[[:blank:]]*print[[:blank:]]*<</) ) {

# save any blanks before the print in case
# skipType "=" is used.
leadBlanks = $0
sub(/[^[:blank:]].*$/,"",leadBlanks)

$0 = substr($0,RSTART+RLENGTH)

if ( sub(/^[-]/,"") ) { skipType = "-" }
else if ( sub(/^[+]/,"") ) { skipType = "+" }
else if ( sub(/^[=]/,"") ) { skipType = "=" }
else { skipType = "" }

gsub(/(^[[:blank:]]+|[[:blank:]]+$)/,"")

if (/[[:blank:]]/) {
terminator = $0
sub(/[[:blank:]].*/,"",terminator)

postprint = $0
sub(/[^[:blank:]]+[[:blank:]]+/,"",postprint)
}
else {
terminator = $0
postprint = ""
}

startBlock()

next
}
}

inBlock {

stripped=$0
gsub(/(^[[:blank:]]+|[[:blank:]]+$)/,"",stripped)

if ( stripped"" == terminator"" ) {
endBlock()
}
else {
updBlock()
}

next
}

{ print }

function startBlock() { inBlock=1; numLines=0 }

function updBlock() { block[++numLines] = $0 }

function endBlock( i,numSkip,indent) {

if (skipType == "") {
# do not skip any leading tabs
indent = ""
}
else if (skipType == "-") {
# skip all leading tabs
indent = "[\t]+"
}
else if (skipType == "+") {

# skip however many leading tabs are common across
# all non-blank lines in the current pre-formatted block

for (i=1;i<=numLines;i++) {

if (block[i] ~ /[^[:blank:]]/) {

match(block[i],/^[\t]+/)

if ( (numSkip == "") || (numSkip > RLENGTH) ) {
numSkip = RLENGTH
}
}
}

for (i=1;i<=numSkip;i++) {
indent = indent "\t"
}
}
else if (skipType == "=") {
# skip whatever pattern of blanks existed
# before the "print" statement
indent = leadBlanks
}


for (i=1;i<=numLines;i++) {
sub(indent,"",block[i])
print "print \"" block[i] "\"\t" postprint
}

inBlock=0
}

' "$@"

}

unset awkArgs
unset scriptFiles
expandOnly=0
while getopts "v:F:W:f:X" arg
do
case $arg in
f ) scriptFiles+=( "$OPTARG" ) ;;
[vFW] ) awkArgs+=( "-$arg" "$OPTARG" ) ;;
X ) expandOnly=1 ;;
* ) exit 1 ;;
esac
done
shift $(( OPTIND - 1 ))

if [ -z "${scriptFiles[*]}" -a "$#" -gt "0" ]
then
# The script cannot contain literal 's because in cases like this:
# 'BEGIN{ ...abc'def... }'
# the args parsed here (and later again by gawk) would be:
# $1 = BEGIN{ ...abc
# $2 = def... }
# Replace 's with \047 or '\'' if you need them:
# 'BEGIN{ ...abc\047def... }'
# 'BEGIN{ ...abc'\''def... }'
scriptText="$1"
shift
fi

# Remaining symbols in "$@" must be data file names and/or variable
# assignments that do not use the "-v name=value" syntax.

if [ -n "${scriptFiles[*]}" ]
then
if (( expandOnly == 1 ))
then
expand_prints "${scriptFiles[@]}"
else
gawk "${awkArgs[@]}" "$(expand_prints "${scriptFiles[@]}")" "$@"
fi

elif [ -n "$scriptText" ]
then
if (( expandOnly == 1 ))
then
printf '%s\n' "$scriptText" | expand_prints
else
gawk "${awkArgs[@]}" "$(printf '%s\n' "$scriptText" | expand_prints)" "$@"
fi
else
printf '%s: ERROR: no awk script specified.\n' "$toolName" >&2
exit 1
fi

使用示例:

$ cat data.txt
abc def"ghi

.

#######
$ cat script.awk
{
awkVar="bar"

print "----------------"

print << HERE
backslash: \\

quoted text: \"text\"

single quote as ANSI sequence: \047

literal single quote (ONLY works when script is in a file): '

awk variable: "awkVar"

awk field: "$2"
HERE

print "----------------"

print <<-!
backslash: \\

quoted text: \"text\"

single quote as ANSI sequence: \047

literal single quote (ONLY works when script is in a file): '

awk variable: "awkVar"

awk field: "$2"
!

print "----------------"

print <<+ whatever
backslash: \\

quoted text: \"text\"

single quote as ANSI sequence: \047

literal single quote (ONLY works when script is in a file): '

awk variable: "awkVar"

awk field: "$2"
whatever

print "----------------"
}

.

$ epawk -f script.awk data.txt
----------------
backslash: \

quoted text: "text"

single quote as ANSI sequence: '

literal single quote (ONLY works when script is in a file): '

awk variable: bar

awk field: def"ghi
----------------
backslash: \

quoted text: "text"

single quote as ANSI sequence: '

literal single quote (ONLY works when script is in a file): '

awk variable: bar

awk field: def"ghi
----------------
backslash: \

quoted text: "text"

single quote as ANSI sequence: '

literal single quote (ONLY works when script is in a file): '

awk variable: bar

awk field: def"ghi
----------------

.

$ epawk -F\" '{
print <<!
ANSI-tick-surrounded quote-separated field 2 (will work): \047"$2"\047
!
}' data.txt
ANSI-tick-surrounded quote-separated field 2 (will work): 'ghi'

.

epawk -F\" '{
print <<!
Shell-escaped-tick-surrounded quote-separated field 2 (will work): '\''"$2"'\''
"
}' data.txt
Shell-escaped-tick-surrounded quote-separated field 2 (will work): 'ghi'

.

$ epawk -F\" '{
print <<!
Literal-tick-surrounded quote-separated field 2 (will not work): '"$2"'
!
}' data.txt
Literal-tick-surrounded quote-separated field 2 (will not work):

.

$ epawk -X 'BEGIN{
print <<!
foo
bar
!
}'
BEGIN{
print " foo"
print " bar"
}

.

$ cat file
a
b
c

.

$ epawk '{
print <<+! |"cat>o2"
numLines="NR"
numFields="NF", $0="$0", $1="$1"
!
}' file

.

$ cat o2
numLines=1
numFields=1, $0=a, $1=a
numLines=2
numFields=1, $0=b, $1=b
numLines=3
numFields=1, $0=c, $1=c

.

$ epawk 'BEGIN{

cmd = "sort"
print <<+! |& cmd
d
b
a
c
!
close(cmd, "to")

while ( (cmd |& getline line) > 0 ) {
print "got:", line
}
close(cmd)

}' file
got: a
got: b
got: c
got: d

关于file - 从 awk 脚本打印文本 block 到文件 [banner like],我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24596514/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com