gpt4 book ai didi

bash - 检查 bash 中是否存在远程文件

转载 作者:行者123 更新时间:2023-11-29 08:51:36 30 4
gpt4 key购买 nike

我正在使用这个脚本下载文件:

parallel --progress -j16 -a ./temp/img-url.txt 'wget -nc -q -P ./images/ {}; wget -nc -q -P ./images/ {.}_{001..005}.jpg'

是否可以不下载文件,只是在远程端检查它们,如果存在则创建一个虚拟文件而不是下载?

类似于:

if wget --spider $url 2>/dev/null; then
#touch img.file
fi

应该可以,但我不知道如何将这段代码与 GNU Parallel 相结合。

编辑:

根据Ole的回答我写了这段代码:

#!/bin/bash
do_url() {
url="$1"
wget -q -nc --method HEAD "$url" && touch ./images/${url##*/}
#get filename from $url
url2=${url##*/}
wget -q -nc --method HEAD ${url%.jpg}_{001..005}.jpg && touch ./images/${url2%.jpg}_{001..005}.jpg
}
export -f do_url

parallel --progress -a urls.txt do_url {}

它可以工作,但是对于某些文件它会失败。我找不到一致性为什么它适用于某些文件,为什么它对其他文件失败。也许它有一些带有最后一个文件名的东西。第二个 wget 尝试访问 currect url,但之后的 touch 命令根本不会创建所需的文件。首先,wget 总是(正确地)下载没有 _001.jpg、_002.jpg 的主图像。

示例 urls.txt:

http://host.com/092401.jpg (工作正常,_001.jpg.._005.jpg 已下载) http://host.com/HT11019.jpg (不行,只下载主图)

最佳答案

很难理解您真正想要完成的是什么。让我尝试重新表述您的问题。

I have urls.txt containing:

http://example.com/dira/foo.jpg
http://example.com/dira/bar.jpg
http://example.com/dirb/foo.jpg
http://example.com/dirb/baz.jpg
http://example.org/dira/foo.jpg

On example.com these URLs exist:

http://example.com/dira/foo.jpg
http://example.com/dira/foo_001.jpg
http://example.com/dira/foo_003.jpg
http://example.com/dira/foo_005.jpg
http://example.com/dira/bar_000.jpg
http://example.com/dira/bar_002.jpg
http://example.com/dira/bar_004.jpg
http://example.com/dira/fubar.jpg
http://example.com/dirb/foo.jpg
http://example.com/dirb/baz.jpg
http://example.com/dirb/baz_001.jpg
http://example.com/dirb/baz_005.jpg

On example.org these URLs exist:

http://example.org/dira/foo_001.jpg

Given urls.txt I want to generate the combinations with _001.jpg .. _005.jpg in addition to the original URL. E.g.:

http://example.com/dira/foo.jpg

becomes:

http://example.com/dira/foo.jpg
http://example.com/dira/foo_001.jpg
http://example.com/dira/foo_002.jpg
http://example.com/dira/foo_003.jpg
http://example.com/dira/foo_004.jpg
http://example.com/dira/foo_005.jpg

Then I want to test if these URLs exist without downloading the file. As there are many URLs I want to do this in parallel.

If the URL exists I want an empty file created.

(Version 1): I want the empty file created in a the similar directory structure in the dir images. This is needed because some of the images have the same name, but in different dirs.

So the files created should be:

images/http:/example.com/dira/foo.jpg
images/http:/example.com/dira/foo_001.jpg
images/http:/example.com/dira/foo_003.jpg
images/http:/example.com/dira/foo_005.jpg
images/http:/example.com/dira/bar_000.jpg
images/http:/example.com/dira/bar_002.jpg
images/http:/example.com/dira/bar_004.jpg
images/http:/example.com/dirb/foo.jpg
images/http:/example.com/dirb/baz.jpg
images/http:/example.com/dirb/baz_001.jpg
images/http:/example.com/dirb/baz_005.jpg
images/http:/example.org/dira/foo_001.jpg

(Version 2): I want the empty file created in the dir images. This can be done because all the images have unique names.

So the files created should be:

images/foo.jpg
images/foo_001.jpg
images/foo_003.jpg
images/foo_005.jpg
images/bar_000.jpg
images/bar_002.jpg
images/bar_004.jpg
images/baz.jpg
images/baz_001.jpg
images/baz_005.jpg

(Version 3): I want the empty file created in the dir images called the name from urls.txt. This can be done because only one of _001.jpg .. _005.jpg exists.

images/foo.jpg
images/bar.jpg
images/baz.jpg
#!/bin/bash

do_url() {
url="$1"

# Version 1:
# If you want to keep the folder structure from the server (similar to wget -m):
wget -q --method HEAD "$url" && mkdir -p images/"$2" && touch images/"$url"

# Version 2:
# If all the images have unique names and you want all images in a single dir
wget -q --method HEAD "$url" && touch images/"$3"

# Version 3:
# If all the images have unique names when _###.jpg is removed and you want all images in a single dir
wget -q --method HEAD "$url" && touch images/"$4"

}
export -f do_url

parallel do_url {1.}{2} {1//} {1/.}{2} {1/} :::: urls.txt ::: .jpg _{001..005}.jpg

GNU Parallel 每个作业需要几毫秒。当你的工作这么短时,开销会影响时间。如果您的 CPU 核心都没有以 100% 的速度运行,您可以并行运行更多作业:

parallel -j0 do_url {1.}{2} {1//} {1/.}{2} {1/} :::: urls.txt ::: .jpg _{001..005}.jpg

您还可以“展开”循环。这将为每个 URL 节省 5 个开销:

do_url() {
url="$1"
# Version 2:
# If all the images have unique names and you want all images in a single dir
wget -q --method HEAD "$url".jpg && touch images/"$url".jpg
wget -q --method HEAD "$url"_001.jpg && touch images/"$url"_001.jpg
wget -q --method HEAD "$url"_002.jpg && touch images/"$url"_002.jpg
wget -q --method HEAD "$url"_003.jpg && touch images/"$url"_003.jpg
wget -q --method HEAD "$url"_004.jpg && touch images/"$url"_004.jpg
wget -q --method HEAD "$url"_005.jpg && touch images/"$url"_005.jpg
}
export -f do_url

parallel -j0 do_url {.} :::: urls.txt

最后您可以运行超过 250 个作业:https://www.gnu.org/software/parallel/man.html#EXAMPLE:-Running-more-than-250-jobs-workaround

关于bash - 检查 bash 中是否存在远程文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48608377/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com