gpt4 book ai didi

bash - 在后缀文件中查找和替换 URL - Linux/Ubuntu

转载 作者:行者123 更新时间:2023-12-04 18:36:45 25 4
gpt4 key购买 nike

我想监视一个特定的文件夹。
应扫描此文件夹中的每个新文件的 URL。
如果域不在定义的白名单中,则应编辑这些 URL。

例子:

blabla http://www.black.com/green/yellow.html blabla
sdfsdfsdfsdf http://www.white.com/red.html

白名单:
http://www.white.com

结果:
blabla httx://www.black.com/green/yellow.html blabla
sdfsdfsdfsdf http://www.white.com/red.html

到目前为止,我尝试的是使用此 xml 的 iwatch:
<?xml version="1.0" ?>
<!DOCTYPE config SYSTEM "/etc/iwatch/iwatch.dtd" >
<config>
<guard email="root@localhost" name="IWatch"/>
<watchlist>
<title>URL_Filter</title>
<contactpoint email="admin@test.com" name="Administrator"/>
<path type="single" syslog="on" alert="off" events="create" exec="sed -i 's/http/httx' %f">/var/test</path>
</watchlist>
</config>

因此,使用 iwatch,我可以观察文件夹“/var/test”中的新文件。
使用 sed 命令,我可以用“httx”替换每个“http”。
但我不知道如何加入白名单,这样一些 URL 就不会被替换......

- - 编辑 - -
附加信息:
我想编辑所有传入的后缀邮件,以便其中没有可点击的链接,除了一些在白名单上的域。这样做的原因是为了防止网络钓鱼邮件。
Return-Path: <example@gmail.com>
X-Original-To: example@test.de
Delivered-To: example@test.de
Received: from mail-lf0-x236.google.com (mail-lf0-x236.google.com [IPv6:2a00:1450:4010:c07::236])
by xxxxxxx.hosteurope.de (Postfix) with ESMTPS id D255223CB59
for <example@test.de>; Mon, 11 Apr 2016 14:44:10 +0200 (CEST)
Received: by mail-lf0-x236.google.com with SMTP id c126so154788483lfb.2
for <example@test.de>; Mon, 11 Apr 2016 05:39:20 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=20120113;
h=mime-version:date:message-id:subject:from:to;
bh=WwH+NIkCWDEoIkwbeCI4pf0jP0ya/ctbQ81pUsA4G7s=;
b=ZS3Uo/cpVGNw3k38Js2+/DxVda0y2136oy4D4hsR0G25x2UjhyVU/yUcPl6qEdxt8i
CQXZHQbaf8pzCdDaSq4VL9RC/sIgZy3PQzj6Cyrp3WTi6SMmQ65NwNBWLVGnpPcuzNW1
IGC5N3rjj96ndYUAxia/tTcBX7ajS3Tw9Mc8yIaO13hSXMUCrTDIFZNzHR1ib7tLDpmX
6EVyFhquhIfJVOhcuPgWUUxHly/FmZ++ucoHR0Yozj+dc1GJ6/ZYzUAPdGICelDY7ieG
nvA7KH6+v6/zoWlbfkO9BmGzAPs6M4LGHilOjpMf/09Z2oMiV/WRDxe0WrCebQptpm2c
xHPg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20130820;
h=x-gm-message-state:mime-version:date:message-id:subject:from:to;
bh=WwH+NIkCWDEoIkwbeCI4pf0jP0ya/ctbQ81pUsA4G7s=;
b=hAOSzKjertcsQIT/PHoZKsiKxLba8gaKOCmyNg7nmiPJjCWqobNvM5nf3sZP1Xhysi
gGdvk9mmMugII8dsjc7mRhDkbCT1QKVz/0UBQ+CaP6sK7kGdWfdarphGgzUGA6Il5JZi
lP4DpEQHUpG1wJ1r+dN2f+UT8tyfIwapXwo3g7FnkPLxmCq9CeqJeRlagL6vAacon8z7
CjdTHB7fzEtYToSp+cDi3+yK4zS9p4rwF4H4Ds3bJqwM/PrcFJW0YYncDHdra5TwYf6U
K6VRX19iUhQT4kTVFCtoNW9SU8Ri+Rc5VfvVTKRh4KwZ2uW5x8y07ucB0vZcAQdEnms4
AWnQ==
X-Gm-Message-State: AD7BkJJEDmk9P+Kzcn1MT4lQxpU1aYU6x8uABSpohCbT7EeOFAXjT1y6n3sFcRj7tcfWc6eBAOL6bJ78jvVOlQ==
MIME-Version: 1.0
X-Received: by 10.112.63.196 with SMTP id i4mr8426739lbs.93.1460378359811;
Mon, 11 Apr 2016 05:39:19 -0700 (PDT)
Received: by 10.114.66.51 with HTTP; Mon, 11 Apr 2016 05:39:19 -0700 (PDT)
Date: Mon, 11 Apr 2016 14:39:19 +0200
Message-ID: <CADF5gVU+C4BZCSFSiWeiBipBnDu5jTU+FVmLJbSQSbtMM9JZcQ@mail.gmail.com>
Subject: test
From: Example <example@gmail.com>
To: example@test.de
Content-Type: multipart/alternative; boundary=001a1133d4405fd878053034d55a
X-Scanned-By: MIMEDefang 2.71 on 5.38.258.144

--001a1133d4405fd878053034d55a
Content-Type: text/plain; charset=UTF-8

http://www.example.com
http://www.white.com

--001a1133d4405fd878053034d55a
Content-Type: text/html; charset=UTF-8

<div dir="ltr"><div><a href="http://www.example.com">http://www.example.com</a><br></div><a href="http://www.white.com">http://www.white.com</a><br></div>

--001a1133d4405fd878053034d55a--

最佳答案

你可以使用 Perl 来做到这一点。我建议从 CPAN 安装 Regexp::Common 包并使用 Regexp::Common::URI找到 URI,然后维护主机名白名单并检查它们。不过,对于单行来说,它有点长。

use strict;
use warnings;
use Regexp::Common qw /URI/;

my %whitelist = (
'http://www.white.com' => 1,
'http://www.example.org' => 1,
);

while (my $line = <>) {
MATCH: foreach my $match ($line =~ /($RE{URI}{HTTP})/g ){
# check the whitelist
next MATCH if grep { $match =~ /^$_/i } %whitelist;

# no whitelist entry, replace
my $match_updated = $match;
$match_updated =~ s/^http/httx/;
$line =~ s/$match/$match_updated/;
}
print $line;
}

将其保存为有意义的内容,可能是在 iwatch 可以访问的目录中的 remove_phishing_links.pl。我在做 ~ ,但我不知道这是否可行。现在你可以在你的 iwatch 文件中用类似这样的方式调用它。
<path 
type="single"
syslog="on"
alert="off"
events="create"
exec="perl -i ~/remove_phishing_links.pl %f">/var/test</path>

它会,就像 sed命令,编辑 %f中的文件到位。它逐行读取,查找 http URI,检查它们是否以任何白名单条目开头,如果不是,则替换 httphttx .

请注意,这不适用于 base64 编码的 MIME 电子邮件,或者如果 URI 中有换行符。

如果你不想安装 Regexp::Common,你也可以借用 regular expression for URIs from the URI module documentation在 CPAN 上并将其更改为仅找到 https? .

关于bash - 在后缀文件中查找和替换 URL - Linux/Ubuntu,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36543377/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com