gpt4 book ai didi

regex - grep 和 sed 正则表达式含义 - 从网页中提取 url

转载 作者:太空宇宙 更新时间:2023-11-04 05:40:26 25 4
gpt4 key购买 nike

grep -i -o '<a[^>]\+href[ ]*=[ \t]*"\(ht\|f\)tps\?:[^"]\+"' | sed -e 's/^.*"\([^"]\+\)".*$/\1/g'

在网上搜索我的作业问题的答案后,我终于得到了上述内容。但我不完全理解sed和grep使用的两个正则表达式的含义。有人可以帮我解释一下吗?提前致谢。

最佳答案

grep 命令查找包含匹配项的任何行

'<a[^>]\+href[ ]*=[ \t]*"\(ht\|f\)tps\?:[^"]\+"'

这是

<a     the characters <a
[^>] not followed by a close '>'
\+ the last thing one or more times (this is really not necessary I think.
with this, it would be "not followed by exactly one '>' which would be fine
href followed by the string 'href'
[ ]* followed by zero or more spaces (you don't really need the [], just ' *' would be enough)
= followed by the equals sign
[ \t]* followed by zero or more space or tab ("white space")
" followed by open quote (but only a double quote...)
\( open bracket (grouping)
ht characters 'ht'
\| or
f character f
\) close group (of the either-or)
tp characters 'tp'
s\? optionally followed by s
Note - the last few lines combined means 'http or https or ftp or ftps'
: character :
[^"]\+ one or more characters that are not a double quote
this is "everything until the next quote"

这能让你开始吗?您可以对下一步执行相同的操作...

请注意,让您感到困惑 - 反斜杠用于更改一些特殊字符的含义,例如 ()+;只是为了让每个人保持警惕,这些是否具有带或不带反斜杠的特殊含义不是由正则表达式语法定义的,而是由您使用它的命令(及其选项)定义的。例如,sed 会根据您是否使用 -E 标志来更改事物的含义。

关于regex - grep 和 sed 正则表达式含义 - 从网页中提取 url,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22848049/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com