gpt4 book ai didi

linux - 如何在 shell 脚本中并行运行多个实例以提高时间效率

转载 作者:行者123 更新时间:2023-12-03 10:01:23 31 4
gpt4 key购买 nike

这个问题在这里已经有了答案:





Parallel processing or threading in Shell scripting

(3 个回答)



executing shell command in background from script [duplicate]

(4 个回答)


2年前关闭。




我正在使用 shell 脚本,它读取 16000 行的输入文件。运行脚本需要 8 个多小时。我需要减少它,所以我将它划分为 8 个实例并读取数据,在其中我使用 for 循环迭代 8 个文件并在其中循环以从文件中读取记录。但它不起作用。
我如何在后台并行运行 8 个实例
我需要帮助才能更高效地运行它,例如使用函数或 fork 进程。

这是代码

for file in "$MY_WORK/CCN_split_files"/*
do
echo "$file"
echo "begin read loop"
### removing the header record from the file ###
if [ "$file" == "$MY_WORK/CCN_split_files/ccn.email.list.file00" ]
then
mv $MY_WORK/CCN_split_files/ccn.email.list.file00 $MY_WORK/raw_file
sed -e '/ Regular /d; / Duplicate /d' $MY_WORK/raw_file > $MY_WORK/CCN_split_files/ccn.email.list.file00
fi
### end of removing header record ###

while read -r record
do
reccount=$(( reccount + 1 ))

### parse input record

contact_email=`echo "$record" | cut -f5 -d ''`
echo "contact email is $contact_email"
credit_card_id=`echo "$record" | cut -f6 -d ''`
echo "credit card id is $credit_card_id"
ref_nr=`echo "$record" | cut -f7 -d ''`
echo "reference nr is $ref_nr"
cny_cd=`echo "$record" | cut -f8 -d ''`
echo "country code is $cny_cd"
lang=`echo "$record" | cut -f9 -d ''`
echo "language is $lang"
pmt_ir=`echo "$record" | cut -f13 -d ''`
echo "payment ir is $pmt_ir"

### set paypal or credit card

if [ "$pmt_ir" = "3" ]
then
pmt_typ="PP"
echo "payment type is $pmt_typ"
else
pmt_typ="CC"
echo "payment type is $pmt_typ"
fi

### retrieve doc from application

echo "retrieve from CMOD for $ref_nr"
GetExit01Cntr=0
GetExit01='F'
until [[ $GetExit01 = 'T' ]]
do
GetExit01Cntr=`expr $GetExit01Cntr + 1`

/opt/ondemand/bin/arsdoc get -ac -d $MY_WORK -h $host -u $user -p $pwd -v -i "WHERE ReferenceNumber='$ref_nr' AND CreditCardId='$credit_card_id'" -f "$folder" -L1 -o "$notify_afp" -v 2> $MY_WORK/$arsdoc_out
if grep "Retrieving 1 document(s)." $MY_WORK/$arsdoc_out > /dev/null
then
GetExit01='T'
echo "CCN AFP retrieval successful"
else
echo "CCN AFP retrieval failed - Performing retry (${GetExit01Cntr})"
sleep 30
GetExit01='F'
if [[ $GetExit01Cntr -ge 3 ]]
then
echo "Max Retry Failure: (GetExit01) - Failed to successfully perform arsdoc get"
echo "CCN AFP retrieval failed"
echo "CCN AFP retrieval failed" >> $MY_WORK/$logfile
exit 12
fi
fi
done

### convert to PDF

echo "afp2pdf conversion begins"

/a585/app/AFP2PDF_PLUS/afp2pdf.sh -i /a585/app/AFP2PDF_PLUS/a2pxopts2.cfg -n /a585/app/AFP2PDF_PLUS/font -o $MY_WORK/$notify_pdf $MY_WORK/$notify_afp > $MY_WORK/$afp2pdf_out 2>&1

ReturnCode=`echo $?`
if [ "$ReturnCode" != "0" ]
then
echo "afp2pdf failed"
echo "afp2pdf failed" >> $MY_WORK/$logfile
exit 12
fi

### assign message text, subject, and reply address variables

echo "assign message text, subject, reply"
if [ $cny_cd = "US" ] && [ $lang = "EN" ] && [ $pmt_typ = "CC" ]
then
email_text=$MSG_PATH/ccnotifyusen.new
email_reply="abx@xx.com"
email_subject=" Credit Card Billing Adjustment. Ref# $ref_nr"

elif [ $cny_cd = "CA" ] && [ $lang = "EN" ] && [ $pmt_typ = "CC" ]
then
email_text=$MSG_PATH/ccnotifycaen.new
email_reply="abx@xx.com"
email_subject="Credit Card Billing Adjustment. Ref# $ref_nr"

elif [ $cny_cd = "CA" ] && [ $lang = "FR" ] && [ $pmt_typ = "CC" ]
then
email_text=$MSG_PATH/ccnotifycafr.new
email_reply="abx@xx.com"
email_subject=" Rajustement des frais. Ref. $ref_nr"

elif [ $cny_cd = "US" ] && [ $lang = "EN" ] && [ $pmt_typ = "PP" ]
then
email_text=$MSG_PATH/ppnotifyusen.new
email_reply="abx@xx.com"
email_subject=" Billing Adjustment. Ref# $ref_nr"

elif [ $cny_cd = "CA" ] && [ $lang = "EN" ] && [ $pmt_typ = "PP" ]
then
email_text=$MSG_PATH/ppnotifycaen.new
email_reply="abx@xx.com"
email_subject=" Billing Adjustment. Ref# $ref_nr"

elif [ $cny_cd = "CA" ] && [ $lang = "FR" ] && [ $pmt_typ = "PP" ]
then
email_text=$MSG_PATH/ppnotifycafr.new
email_reply="ssunkara@ups.com"
email_subject_text=`cat $MSG_PATH/ppsubjectcafr`
email_subject="$email_subject_text $ref_nr"

else
echo "invalid country, language, payment type combination: $cny_cd, $lang, $pmt_typ"
echo "invalid country, language, payment type combination: $cny_cd, $lang, $pmt_typ" >> $MY_WORK/$logfile
exit 12
fi

### overlay reply address in .muttrc initialization file

cd /a585/app/script/
echo "email via NSGalinaMail"

/usr/bin/java -jar NSGalinaMail.jar "$email_text" "$email_subject" "$contact_email" "abc@xx.com" $lang $cny_cd $MY_WORK/$notify_pdf
if [ $? -eq 0 ]; then
emailCountSuccess[$reccount-1]="Success: Email to $contact_email for $ref_nr"
else
emailCountFailure[$reccount-1]="Failure: Email to $contact_email for $ref_nr"
fi

done < $file
done

最佳答案

如果您想并行完成大量工作,请考虑使用 GNU 并行 .有一个很棒的 PDF here解释如何使用它。具体来说,我使用“第 9 节 - 管道模式”来回答您的问题。

我不是为您重写所有代码,只是向您展示一些想法。

让我们生成一个包含 16,000 行的示例文件来匹配您的:

seq 16000 > YourFile

现在让我们生成一个虚拟脚本,名为 YourScript处理您的数据,如下所示:
#!/bin/bash
lines=$(wc -l < /dev/stdin)
echo "Called to process $lines lines"
sleep 2

如您所见,它只计算在其 stdin 上收到的行数。并告诉你有多少人睡了 2 秒,这样你就可以看到发生了什么。使其可执行:
chmod +x YourScript

现在,您可以使用 GNU 并行 .首先,让 GNU 并行 将您的文件分成 4,000 行的 block ,并将一个 block 传递给 4 个作业中的每一个:
parallel --pipe -N4000 ./YourScript  < YourFile

Called to process 4000 lines
Called to process 4000 lines
Called to process 4000 lines
Called to process 4000 lines

如果您有 4 个或更多 CPU 内核,则需要 2 秒,因为默认情况下, GNU 并行 每个 CPU 内核启动一项作业。

现在尝试将 2,000 行传递给每个作业,并一次运行 4 个作业:
parallel --pipe -j 4 -N2000 ./YourScript  < YourFile

Called to process 2000 lines
Called to process 2000 lines
Called to process 2000 lines
Called to process 2000 lines
Called to process 2000 lines
Called to process 2000 lines
Called to process 2000 lines
Called to process 2000 lines

这将在 2 秒内运行前 4 批 2,000 行,然后在 2 秒内运行后 4 批 2,000 行。

希望您现在可以了解如何并行化您的脚本。 记住 阅读 stdin ,而不是来自文件!!!如果您希望脚本使用 16,000 行文件的文件名作为参数运行,或者该文件的一个 block 的文件名作为分 block GNU 并行 ,你可以使用:
parallel --pipe -N 2000 --cat YourScript {}

然后它将写入一个包含 2,000 行的临时文件,调用您的脚本,然后删除该临时文件。

有用的切换到 GNU 并行 是:
  • parallel --dry-run ...它告诉你它会做什么而不实际做任何事情
  • parallel --bar ...这会给你一个进度条
  • parallel --eta ...这会给你一个 ETA

  • 另请注意 GNU 并行 可以在您网络中的其他机器上分配工作,并且它具有失败和重试处理、输出标记等......

    此外,您运行 cut 16,000 行文件的每一行执行 6 次 - 这意味着您必须 fork 近 100,000 个进程!您可以使用 IFSread而不是这 6 个过程:
    IFS='|' read -r f1 f2 f3 <<< "a|b|c"

    关于linux - 如何在 shell 脚本中并行运行多个实例以提高时间效率,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59945327/

    31 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com