Python/BeautifulSoup : How to look directly beneath a code comment?-6ren

Python/BeautifulSoup : How to look directly beneath a code comment?

转载作者：太空宇宙更新时间：2023-11-03 19:31:37

我正在使用 BeautifulSoup 解析一些网页并尝试在库中工作(而不是尝试使用强力正则表达式解决所有问题..)

我正在查看的页面结构如下:

<!--comment--> 
<div>a</div>
<div>b</div>
<div>c</div>
<!--comment--> 
<div>a</div>
<div>b</div
<div>c</div

我想单独解析每个部分。有没有办法告诉 beautifulsoup 分解相同评论之间的区域？

谢谢!

最佳答案

评论是节点，就像其他任何东西一样:

from BeautifulSoup import BeautifulSoup
from BeautifulSoup import Comment
from BeautifulSoup import NavigableString

text = BeautifulSoup("""<!--comment--><div>a</div><div>b</div><div>c</div>
                        <!--comment--><div>a</div><div>b</div><div>c</div>""")

comments = text.findAll(text=lambda elm: isinstance(elm, Comment))
for comment in comments:
    next_sib = comment.nextSibling
    while not isinstance(next_sib, Comment) and \
        not isinstance(next_sib, NavigableString) and next_sib:
        # This prints each sibling while it isn't whitespace or another comment
        # Append next_sib to a list, dictionary, etc, etc and
        # do what you want with it
        print next_sib 
        next_sib = next_sib.nextSibling

编辑:

它不会检测到相同的评论(评论文本？)，但您可以通过检查评论文本是否与之前的评论 block 相同来解决这个问题。

关于Python/BeautifulSoup : How to look directly beneath a code comment?，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/5527935/