gpt4 book ai didi

python - 我不明白为什么这个 URL 匹配正则表达式不起作用

转载 作者:太空宇宙 更新时间:2023-11-03 17:22:05 26 4
gpt4 key购买 nike

我正在尝试使用以下正则表达式来匹配来自原始 HTML 字符串的 Linux 内核 amd64 deb href:

r'(?<=href=")linux-.*?_amd64\.deb(?=")'

我尝试匹配的网址类型如下所示:

<a href="linux-headers-3.16.0-031600rc1-generic_3.16.0-031600rc1.201406160035_amd64.deb">

但是,我只想提取 " 之间的内容位于 href 的内部属性。上面的正则表达式确实正确匹配第一个 href,然后匹配一堆内容,包括标记。删除_amd64正则表达式中的 使其实际上仅匹配 URL,但当然,它不会过滤掉 i386 debs:

r'(?<=href=")linux-.*?\.deb(?=")'

这是我应用正则表达式的原始 HTML 代码:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<html>
<head>
<title>Index of /~kernel-ppa/mainline/v3.16-rc1-utopic</title>
</head>
<body>
<h1>Index of /~kernel-ppa/mainline/v3.16-rc1-utopic</h1>
<table><tr><th><img src="/icons/blank.gif" alt="[ICO]"></th><th><a href="?C=N;O=D">Name</a></th><th><a href="?C=M;O=A">Last modified</a></th><th><a href="?C=S;O=A">Size</a></th><th><a href="?C=D;O=A">Description</a></th></tr><tr><th colspan="5"><hr></th></tr>
<tr><td valign="top"><img src="/icons/back.gif" alt="[DIR]"></td><td><a href="/~kernel-ppa/mainline/">Parent Directory</a></td><td>&nbsp;</td><td align="right"> - </td><td>&nbsp;</td></tr>
<tr><td valign="top"><img src="/icons/text.gif" alt="[TXT]"></td><td><a href="0001-base-packaging.patch">0001-base-packaging.patch</a></td><td align="right">16-Jun-2014 04:35 </td><td align="right"> 14M</td><td>&nbsp;</td></tr>
<tr><td valign="top"><img src="/icons/text.gif" alt="[TXT]"></td><td><a href="0002-debian-changelog.patch">0002-debian-changelog.patch</a></td><td align="right">16-Jun-2014 04:35 </td><td align="right">333K</td><td>&nbsp;</td></tr>
<tr><td valign="top"><img src="/icons/text.gif" alt="[TXT]"></td><td><a href="0003-configs-based-on-Ubuntu-3.15.0-7.12.patch">0003-configs-based-on-Ubuntu-3.15.0-7.12.patch</a></td><td align="right">16-Jun-2014 04:35 </td><td align="right"> 51K</td><td>&nbsp;</td></tr>
<tr><td valign="top"><img src="/icons/unknown.gif" alt="[ ]"></td><td><a href="BUILD.LOG">BUILD.LOG</a></td><td align="right">16-Jun-2014 05:25 </td><td align="right">7.1M</td><td>&nbsp;</td></tr>
<tr><td valign="top"><img src="/icons/unknown.gif" alt="[ ]"></td><td><a href="BUILD.LOG.amd64">BUILD.LOG.amd64</a></td><td align="right">16-Jun-2014 05:25 </td><td align="right">2.3M</td><td>&nbsp;</td></tr>
<tr><td valign="top"><img src="/icons/unknown.gif" alt="[ ]"></td><td><a href="BUILD.LOG.armhf">BUILD.LOG.armhf</a></td><td align="right">16-Jun-2014 05:25 </td><td align="right">597K</td><td>&nbsp;</td></tr>
<tr><td valign="top"><img src="/icons/unknown.gif" alt="[ ]"></td><td><a href="BUILD.LOG.binary-headers">BUILD.LOG.binary-headers</a></td><td align="right">16-Jun-2014 05:25 </td><td align="right"> 22K</td><td>&nbsp;</td></tr>
<tr><td valign="top"><img src="/icons/unknown.gif" alt="[ ]"></td><td><a href="BUILD.LOG.i386">BUILD.LOG.i386</a></td><td align="right">16-Jun-2014 05:25 </td><td align="right">2.3M</td><td>&nbsp;</td></tr>
<tr><td valign="top"><img src="/icons/unknown.gif" alt="[ ]"></td><td><a href="BUILT">BUILT</a></td><td align="right">16-Jun-2014 05:25 </td><td align="right">108 </td><td>&nbsp;</td></tr>
<tr><td valign="top"><img src="/icons/unknown.gif" alt="[ ]"></td><td><a href="CHANGES">CHANGES</a></td><td align="right">16-Jun-2014 04:35 </td><td align="right">744K</td><td>&nbsp;</td></tr>
<tr><td valign="top"><img src="/icons/unknown.gif" alt="[ ]"></td><td><a href="CHECKSUMS">CHECKSUMS</a></td><td align="right">09-Jun-2015 11:36 </td><td align="right">3.1K</td><td>&nbsp;</td></tr>
<tr><td valign="top"><img src="/icons/unknown.gif" alt="[ ]"></td><td><a href="CHECKSUMS.gpg">CHECKSUMS.gpg</a></td><td align="right">09-Jun-2015 11:36 </td><td align="right">490 </td><td>&nbsp;</td></tr>
<tr><td valign="top"><img src="/icons/unknown.gif" alt="[ ]"></td><td><a href="COMMIT">COMMIT</a></td><td align="right">29-May-2015 11:09 </td><td align="right"> 51 </td><td>&nbsp;</td></tr>
<tr><td valign="top"><img src="/icons/hand.right.gif" alt="[ ]"></td><td><a href="README">README</a></td><td align="right">12-Jun-2015 13:45 </td><td align="right">622 </td><td>&nbsp;</td></tr>
<tr><td valign="top"><img src="/icons/unknown.gif" alt="[ ]"></td><td><a href="SOURCES">SOURCES</a></td><td align="right">12-Jun-2015 13:45 </td><td align="right">237 </td><td>&nbsp;</td></tr>
<tr><td valign="top"><img src="/icons/unknown.gif" alt="[ ]"></td><td><a href="linux-headers-3.16.0-031600rc1-generic_3.16.0-031600rc1.201406160035_amd64.deb">linux-headers-3.16.0-031600rc1-generic_3.16.0-031600rc1.201406160035_amd64.deb</a></td><td align="right">16-Jun-2014 04:55 </td><td align="right">1.1M</td><td>&nbsp;</td></tr>
<tr><td valign="top"><img src="/icons/unknown.gif" alt="[ ]"></td><td><a href="linux-headers-3.16.0-031600rc1-generic_3.16.0-031600rc1.201406160035_i386.deb">linux-headers-3.16.0-031600rc1-generic_3.16.0-031600rc1.201406160035_i386.deb</a></td><td align="right">16-Jun-2014 05:15 </td><td align="right">1.0M</td><td>&nbsp;</td></tr>
<tr><td valign="top"><img src="/icons/unknown.gif" alt="[ ]"></td><td><a href="linux-headers-3.16.0-031600rc1-lowlatency_3.16.0-031600rc1.201406160035_amd64.deb">linux-headers-3.16.0-031600rc1-lowlatency_3.16.0-031600rc1.201406160035_amd64.deb</a></td><td align="right">16-Jun-2014 04:56 </td><td align="right">1.1M</td><td>&nbsp;</td></tr>
<tr><td valign="top"><img src="/icons/unknown.gif" alt="[ ]"></td><td><a href="linux-headers-3.16.0-031600rc1-lowlatency_3.16.0-031600rc1.201406160035_i386.deb">linux-headers-3.16.0-031600rc1-lowlatency_3.16.0-031600rc1.201406160035_i386.deb</a></td><td align="right">16-Jun-2014 05:17 </td><td align="right">1.0M</td><td>&nbsp;</td></tr>
<tr><td valign="top"><img src="/icons/unknown.gif" alt="[ ]"></td><td><a href="linux-headers-3.16.0-031600rc1_3.16.0-031600rc1.201406160035_all.deb">linux-headers-3.16.0-031600rc1_3.16.0-031600rc1.201406160035_all.deb</a></td><td align="right">16-Jun-2014 04:36 </td><td align="right"> 12M</td><td>&nbsp;</td></tr>
<tr><td valign="top"><img src="/icons/unknown.gif" alt="[ ]"></td><td><a href="linux-image-3.16.0-031600rc1-generic_3.16.0-031600rc1.201406160035_amd64.deb">linux-image-3.16.0-031600rc1-generic_3.16.0-031600rc1.201406160035_amd64.deb</a></td><td align="right">16-Jun-2014 04:55 </td><td align="right"> 51M</td><td>&nbsp;</td></tr>
<tr><td valign="top"><img src="/icons/unknown.gif" alt="[ ]"></td><td><a href="linux-image-3.16.0-031600rc1-generic_3.16.0-031600rc1.201406160035_i386.deb">linux-image-3.16.0-031600rc1-generic_3.16.0-031600rc1.201406160035_i386.deb</a></td><td align="right">16-Jun-2014 05:15 </td><td align="right"> 51M</td><td>&nbsp;</td></tr>
<tr><td valign="top"><img src="/icons/unknown.gif" alt="[ ]"></td><td><a href="linux-image-3.16.0-031600rc1-lowlatency_3.16.0-031600rc1.201406160035_amd64.deb">linux-image-3.16.0-031600rc1-lowlatency_3.16.0-031600rc1.201406160035_amd64.deb</a></td><td align="right">16-Jun-2014 04:56 </td><td align="right"> 51M</td><td>&nbsp;</td></tr>
<tr><td valign="top"><img src="/icons/unknown.gif" alt="[ ]"></td><td><a href="linux-image-3.16.0-031600rc1-lowlatency_3.16.0-031600rc1.201406160035_i386.deb">linux-image-3.16.0-031600rc1-lowlatency_3.16.0-031600rc1.201406160035_i386.deb</a></td><td align="right">16-Jun-2014 05:17 </td><td align="right"> 51M</td><td>&nbsp;</td></tr>
<tr><th colspan="5"><hr></th></tr>
</table>
<address>Apache/2.2.22 (Ubuntu) Server at kernel.ubuntu.com Port 80</address>
</body>
</html>

我正在使用re.findall(pattern, rawHTMLString)为了这。正则表达式有什么问题?

最佳答案

试试这个:

(?<=href=")linux-[^"]*?_amd64\.deb(?=")

你的.*?似乎太贪心了,所以跳过引号至少会跳过引用的区域。

关于python - 我不明白为什么这个 URL 匹配正则表达式不起作用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33048997/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com