gpt4 book ai didi

ksh - Unix 脚本 - 需要提高性能的建议(shell 脚本)

转载 作者:行者123 更新时间:2023-12-01 04:50:30 32 4
gpt4 key购买 nike

我有一个输入 csv 文件,实际上我需要在输入文件中选择第 2 列和第 3 列值,并且需要转换两个值的时区(从 PT 到 CT),转换后我需要替换转换后的时区值到文件。

注意: 所有输入日期值都在太平洋时区,我正在转换为中央时区。

每行有 5 列 - 逗号分隔的文件

CHID-123456323,2017-01-09 17:17:58-08:00,2017-01-09 17:39:25-08:00,hello,123456733
CHID-123456733,2017-01-09 17:16:58-08:00,2017-01-09 18:04:09-08:00,hello,123456734
CHID-123433589,2017-01-09 17:16:55-08:00,2017-01-09 17:40:29-08:00,hello,123456735
CHID-123000789,2017-01-09 17:16:52-08:00,2017-01-09 17:46:41-08:00,hello,123456736

脚本: 我写了一个下面的脚本,这给出了我期望的确切结果。但是当输入记录数增加时,它需要更多的时间。例如,2 万条记录需要 1 小时 15 分钟。

任何人都可以看看这个脚本并建议如何提高性能?

脚本:
while read i
do
var1=`echo $i | awk -F',' '{ print $2 }'`

var1_EPOCH=`date --date="${var1}" +%s`
var1_CTZ=`TZ=":America/Chicago" date +"%Y-%m-%d %T" -d@$var1_EPOCH`
sed -i "${cnt}s/${var1}/${var1_CTZ}/" filename

var2=`echo $i | awk -F',' '{ print $3 }'`
var2_EPOCH=`date --date="${var2}" +%s`
var2_CTZ=`TZ=":America/Chicago" date +"%Y-%m-%d %T" -d@$var2_EPOCH`
sed -i "${cnt}s/${var2}/${var2_CTZ}/" filename

cnt=$(($cnt+1))
done < filename

这里是预期的输出文件

最终输出文件:
CHID-123456323,2017-01-09 19:17:58,2017-01-09 19:39:25,hello,123456733
CHID-123456733,2017-01-09 19:16:58,2017-01-09 20:04:09,hello,123456734
CHID-123433589,2017-01-09 19:16:55,2017-01-09 19:40:29,hello,123456735
CHID-123000789,2017-01-09 19:16:52,2017-01-09 19:46:41,hello,123456736

最佳答案

Ksh 为您提供了足够的内置功能。

示例输入文件:

[STEP 100] $ echo $BASH_VERSION
4.4.5(2)-release
[STEP 101] $ cat file
CHID-123456323,2017-01-09 17:17:58-08:00,2017-01-09 17:39:25-08:00,hello,123456733
CHID-123456733,2017-01-09 17:16:58-08:00,2017-01-09 18:04:09-08:00,hello,123456734
CHID-123433589,2017-01-09 17:16:55-08:00,2017-01-09 17:40:29-08:00,hello,123456735
CHID-123000789,2017-01-09 17:16:52-08:00,2017-01-09 17:46:41-08:00,hello,123456736

剧本:
[STEP 102] $ cat time.ksh
tz=America/Chicago
pattern='(.+),(.+),(.+),(.+),(.+)'
while read -r line; do
if [[ $line =~ $pattern ]]; then
c1=${.sh.match[1]}
c2=${.sh.match[2]}
c3=${.sh.match[3]}
c4=${.sh.match[4]}
c5=${.sh.match[5]}

TZ=$tz printf '%(%Y-%m-%d %T)T' "$c2" | read c2
TZ=$tz printf '%(%Y-%m-%d %T)T' "$c3" | read c3

print -r -- "$c1,$c2,$c3,$c4,$c5"
else
print -r -- "$line"
fi
done

示例输出:
[STEP 103] $ ksh time.ksh < file
CHID-123456323,2017-01-09 19:17:58,2017-01-09 19:39:25,hello,123456733
CHID-123456733,2017-01-09 19:16:58,2017-01-09 20:04:09,hello,123456734
CHID-123433589,2017-01-09 19:16:55,2017-01-09 19:40:29,hello,123456735
CHID-123000789,2017-01-09 19:16:52,2017-01-09 19:46:41,hello,123456736

制作一个 20,000 行的文件:
[STEP 104] $ rm -f bigfile
[STEP 105] $ fourlines=$(<file)
[STEP 106] $ for ((i=0; i<5000; ++i)); do printf '%s\n' "$fourlines" >> bigfile; done
[STEP 107] $ wc -l bigfile
20000 bigfile

让我们对其进行性能测试:
[STEP 108] $ time ksh time.ksh < bigfile > newfile

real 1m36.849s
user 0m27.376s
sys 0m46.741s
[STEP 109] $ tail -n 4 newfile
CHID-123456323,2017-01-09 19:17:58,2017-01-09 19:39:25,hello,123456733
CHID-123456733,2017-01-09 19:16:58,2017-01-09 20:04:09,hello,123456734
CHID-123433589,2017-01-09 19:16:55,2017-01-09 19:40:29,hello,123456735
CHID-123000789,2017-01-09 19:16:52,2017-01-09 19:46:41,hello,123456736
[STEP 110] $ ksh --version
version sh (AT&T Research) 93u+ 2012-08-01
[STEP 111] $

关于ksh - Unix 脚本 - 需要提高性能的建议(shell 脚本),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41632313/

32 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com