gpt4 book ai didi

C# RegEx - 查找 html 标签(div 和 anchor )

转载 作者:太空宇宙 更新时间:2023-11-04 13:32:49 27 4
gpt4 key购买 nike

我必须检索几个 div 部分(特定类名称“row”)及其内容,另外找到所有 anchor 标记(链接 url)(类“underline red bold”)。简短地说:获取部分:

<div class = "row ">
... (divs, tags ...)
<a class="underline red bold" href="/searchClickThru?pid=prod56534895&amp;q=&amp;rpos=109181&amp;rpp=10&amp;_dyncharset=UTF-8&amp;sort=&amp;url=/culture-and-gender-intimate-relation-ksiazka,prod56534895,p">

和网址集合

string[] urls = {"/searchClickThru?pid=prod56534895&amp;q=&amp;rpos=109181&amp;rpp=10&amp;_dyncharset=UTF-8&amp;sort=&amp;url=/culture-and-gender-intimate-relation-ksiazka,prod56534895,p"}

整个页面看起来像这样:

<html>

... 很多东西

<div class="row ">

<div class="photo">
<a rel="nofollow" href="/searchClickThru?pid=prod56534895&amp;q=&amp;rpos=109181&amp;rpp=10&amp;_dyncharset=UTF-8&amp;sort=&amp;url=/culture-and-gender-intimate-relation-ksiazka,prod56534895,p">
<img alt="alt msg" src="/b/s/b9/03/b9038292d147a582add07ee1f0607827.jpg">
</a>
</div>

<div class="desc">
<div class="l1">
<div class="icons">
</div>

<table cellspacing="0" cellpadding="0" border="0">
<tbody>
<tr>
<td>
<div class="fleft">
<a class="underline red bold" href="/searchClickThru?pid=prod56534895&amp;q=&amp;rpos=109181&amp;rpp=10&amp;_dyncharset=UTF-8&amp;sort=&amp;url=/culture-and-gender-intimate-relation-ksiazka,prod56534895,p">
Culture And Gender <br>Intimate Relation</a>
</div>

<div class="fleft">

</div>
</td>
</tr>
</tbody>
</table>
</div>
<div class="l2">

<div>
</div>
<div>
<div class="but">
</div>
</div>
</div>
<div class="l3">
Long description
<a class="underlinepix_red no_wrap" rel="nofollow" href="/searchClickThru?pid=prod56534895&amp;q=&amp;rpos=109181&amp;rpp=10&amp;_dyncharset=UTF-8&amp;sort=&amp;url=/culture-and-gender-intimate-relation-ksiazka,prod56534895,p">
more<img alt="" src="/b/img/arr_red_sm.gif">
</a>
</div>
</div>
</div>

<div class="omit"></div>

<div class="row ">

<div class="photo">
<a rel="nofollow" href="/searchClickThru?pid=prod56534895&amp;q=&amp;rpos=109181&amp;rpp=10&amp;_dyncharset=UTF-8&amp;sort=&amp;url=/culture-and-gender-intimate-relation-ksiazka,prod56534899,p">
<img alt="alt msg" src="/b/s/b9/03/b9038292d147a582add07ee1f06078222.jpg">
</a>
</div>

<div class="desc">
<div class="l1">
<div class="icons">
</div>

<table cellspacing="0" cellpadding="0" border="0">
<tbody>
<tr>
<td>
<div class="fleft">
<a class="underline red bold" href="/searchClickThru?pid=prod56534895&amp;q=&amp;rpos=109181&amp;rpp=10&amp;_dyncharset=UTF-8&amp;sort=&amp;url=/culture-and-gender-intimate-relation-ksiazka,prod5653489225,p">
Culture And Gender <br>Intimate Relation</a>
</div>

<div class="fleft">

</div>
</td>
</tr>
</tbody>
</table>
</div>
<div class="l2">

<div>
</div>
<div>
<div class="but">
</div>
</div>
</div>
<div class="l3">
Long description
<a class="underlinepix_red no_wrap" rel="nofollow" href="/searchClickThru?pid=prod56534895&amp;q=&amp;rpos=109181&amp;rpp=10&amp;_dyncharset=UTF-8&amp;sort=&amp;url=/culture-and-gender-intimate-relation-ksiazka,prod56534895,p">
more<img alt="" src="/b/img/arr_red_sm.gif">
</a>
</div>
</div>
</div>

有人可以帮我创建合适的 reg ex 吗?

最佳答案

正则表达式不太适合这种情况。

由于 HTML 的嵌套特性,执行您要求的正则表达式将非常(非常)长且复杂。请改用 HTML 解析器。

关于C# RegEx - 查找 html 标签(div 和 anchor ),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/2585357/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com