gpt4 book ai didi

用于在 .CSV 逗号分隔文件中添加双引号的 Bash 脚本

转载 作者:行者123 更新时间:2023-12-02 16:42:49 28 4
gpt4 key购买 nike

我需要在 csv 文件中添加双引号。我的示例数据是这样的..

378478,COMPLETED,Tracfone,,,"2020/03/29 09:39:22",,2787,,356074101197544,89148000005748235454,75176540
378328,COMPLETED,"Total Wireless","Unlimited Talk, Text, & Data (First 25GB High Speed, then unlimited 2GB)",50,"2020/03/29 06:10:01",200890899011202395,0899,0279395,356058102052972,89148000005117597971,67756296

我已经尝试使用 awksed 在线获取一些代码,结果如下所示,错误 - **数字中的第一个数字正在被修剪,就像 ex .在“378478”中它只显示“78478”。

此外,它也在为现有的双引号添加双引号!** 似乎没有什么能完美地工作。请指导我!

"78478","COMPLETED","Tracfone","","",""2020/03/29 09:39:22"","","2787","","356074101197544","89148000005748235454","75176540"
"78328","COMPLETED",""Total Wireless"",""Unlimited Talk"," Text"," & Data (First 25GB High Speed"," then unlimited 2GB)"","50",""2020/03/29 06:10:01"","200890899011202395","0899","0279395","356058102052972","89148000005117597971","67756296"
"78329","COMPLETED",""Cricket Wireless"",""Unlimited Talk"," Text"," & 4G LTE Data w/ 15GB Hotspot"","60",""2020/03/29""

这是我使用的代码:

awk -F"'?,'?" -v OFS='","' '{$1=$1; gsub(/^.|$/,"\"")} 1' file # or
sed -E 's/([^,]*) , (.*)/"\1" , "\2"/' file

我的总代码如下。我的意图是首先将所有 .xlsx 转换为 .csv,然后将双引号添加到同一个 csv 并将其保存在同一个文件中。我知道 $file.csv 部分是错误的,因此我需要一些帮助

find "$Src_Dir" -type f -iname "*.xlsx" -print>path/temp

cat path/temp | while IFS="" read -r -d $'\0' file;
do
echo $file
ssconvert "${file}" --export-type=Gnumeric_stf:stf_csv
awk -F"'?,'?" -v OFS='","' '{$1=$1; gsub(/^.|$/,"\"")} 1' $file > $file.csv
done

最佳答案

如果您想处理最简单 CSV 文件以外的任何东西,您应该远离sedawk。有更好的工具可用。

例如,如果您在您最喜欢的发行版上sudo apt install csvtool(或等效程序),您可以使用其逐行调用功能来处理输入文件中的每一行。有关示例,请参见以下脚本:

#!/bin/bash

function quotify {
# Start empty line, process every field.

line=""
while [[ $# -ne 0 ]] ; do
# Append comma for all but first field, then quoted field.

[[ -n "${line}" ]] && line="${line},"
line="${line}\"$1\""

shift
done

# Output the fully quoted line.

echo "${line}"
}

# Needed to call functions. Also, ensure link: /bin/sh -> /bin/bash.
export -f quotify

# Pretty-print input and output.

echo "Input file:"
sed 's/^/ /' inputFile.csv

echo "Output file:"
csvtool call quotify inputFile.csv | sed 's/^/ /'

请注意为 CSV 文件中的每一 调用的 quotify 函数,参数设置为该行中的每个 字段 (无引号,原始字段是否有引号)。

它基本上构造了行中所有字段的字符串,并用引号括起来,然后将其写入标准输出,如下面的脚本输出所示:

Input file:
378478,COMPLETED,Tracfone,,,"2020/03/29 09:39:22",,2787,,356074101197544,89148000005748235454,75176540
378328,COMPLETED,"Total Wireless","Unlimited Talk, Text, & Data (First 25GB High Speed, then unlimited 2GB)",50,"2020/03/29"
Output file:
"378478","COMPLETED","Tracfone","","","2020/03/29 09:39:22","","2787","","356074101197544","89148000005748235454","75176540"
"378328","COMPLETED","Total Wireless","Unlimited Talk, Text, & Data (First 25GB High Speed, then unlimited 2GB)","50","2020/03/29"

即使使用单独的工具可能是最简单的方法,但如果您绝对不能安装其他包,那么您将不得不在已有的包中编写一些代码。以下 bash 脚本是一个很好的起点,因为它不使用其他工具来实现其目标。

目前,它与一组非常具体的规则相关联,如下所示:

  • 空白很重要。逗号之间的任何内容都被视为字段的一部分。这在检测带引号的字段时尤其重要,它必须将引号作为第一个字符,没有 abc, "d,e,f",ghi 东西,因为 “d,e,f” 不会被正确处理。
  • 带引号的字段允许包含逗号,其中的 "" 序列将变成 "
  • 提供格式错误的 CSV 文件可能不是一个好主意:-)

但是,考虑到这一点,我们开始吧。我将提供每个部分的简短文本描述,但希望代码中的注释足以弄清楚发生了什么。

首先,一个用于查找某个字符串在另一个字符串中的位置的函数,对于计算字段边界很有用:

function findPos {
haystack="$1"
needle="$2"

# Remove everything past the needle.

prefix="${haystack%%${needle}*}"

# If nothing was removed, it wasn't found, so supply massive number.
# Otherwise, it was found at the length of the string with removed stuff.

position=999999
[[ ${#prefix} -ne ${#haystack} ]] && position=${#prefix}
echo ${position}
}

然后我们可以在计算下一个字段长度的函数中使用它。这基本上只是为未引用的字段寻找下一个逗号,并通过从段构建字段来对引用的字段进行特殊处理(它必须处理引号和逗号内的引号):

function getNextFieldLen {
line="$1"

# Empty line means all work done.

[[ -z "${line}" ]] && echo -1 && return

# Handle unquoted first, this is easy.

[[ "${line:0:1}" != '"' ]] && { echo $(findPos "${line}" ","); return; }

# Now handle quoted. Loop over all segments where a segment is defined as
# the text up to the next <"">, assuming it's before the next <",>.

field=""
nextQuoteComma=$(findPos "${line}" '",')
nextDoubleQuote=$(findPos "${line}" '""')
while [[ ${nextDoubleQuote} -lt ${nextQuoteComma} ]]; do
# Append segment to the field and go back for next segment.

field="${field}${line:0:${nextDoubleQuote}}\"\""
line="${line:${nextDoubleQuote}}"
line="${line:2}"

nextQuoteComma=$(findPos "${line}" '",')
nextDoubleQuote=$(findPos "${line}" '""')
done

# Add final segment (up to the comma) and output entire field.

field="${field}${line:0:${nextQuoteComma}}\""
echo "${#field}"
}

最后,还有一个顶级函数,它将引用通过标准输入输入的任何内容:

function quotifyStdIn {
# Process file line by line.

while read -r line; do
# Start with empty output line and non-comma separator.

outLine="" ; sep=""

# Place terminator to make processing easier, start field loop.

line="${line},"
fieldLen=$(getNextFieldLen "${line}")
while [[ ${fieldLen} -ge 0 ]]; do
# Get field and quotify if needed, adjust line (remove field and comma).

field="${line:0:${fieldLen}}"
[[ "${field:0:1}" = '"' ]] || field="\"${field}\""

line="${line:$((fieldLen+1))}"
#line="${line:${fieldLen}}"
#line="${line:1}"

# Append to output line and prepare for next field.

outLine="${outLine}${sep}${field}"; sep=","

fieldLen=$(getNextFieldLen "${line}")
done

# Output built line.

echo "${outLine}"
done
}

并且,如果您想直接从文件中读取(虽然提供一个空文件名或 "-" 将使用标准输入,因此您可能只使用文件-一切都基于函数):

function quotifyFile {
file="$1"

# Empty file or "-" means standard input, otherwise take input from real file.

[[ ${#file} -eq 0 ]] && { quotifyStdIn; return; }
[[ "${file}" = "-" ]] && { quotifyStdIn; return; }

quotifyStdIn < "${file}"
}

最后,因为每个不是“Hello, world”的程序都值得某种形式的测试工具,这就是您可以用来测试各种功能的工具:

(
echo 'paxdiablo,was here'
echo 'and,"then, strangely,",he,was,not'
echo '50,"My name is ""Pax"", and yours is ""Bob""",42'
echo '17,"""Love"" is grand",19'
) > harness.csv

echo "Before:"
sed "s/^/ /" harness.csv
echo "After:"
quotifyFile harness.csv | sed "s/^/ /"

rm -rf harness.csv

而且,由于除非您运行测试,否则测试工具几乎没有用,这里是第一次运行的结果:

Before:
paxdiablo,was here
and,"then, strangely,",he,was,not
50,"My name is ""Pax"", and yours is ""Bob""",42
17,"""Love"" is grand",19
After:
"paxdiablo","was here"
"and","then, strangely,","he","was","not"
"50","My name is ""Pax"", and yours is ""Bob""","42"
"17","""Love"" is grand","19"

希望这足以让您在无法安装软件包的情况下继续前进。当然,如果您无法在 bash 本身中安装其中一个软件包,那么您遇到了我无法帮助您解决的问题:-)

关于用于在 .CSV 逗号分隔文件中添加双引号的 Bash 脚本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61220101/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com