gpt4 book ai didi

python - Beautifulsoup:如果标签或元素未知,如何查找字符串?

转载 作者:太空宇宙 更新时间:2023-11-04 10:35:33 25 4
gpt4 key购买 nike

正如它所说的那样。有没有办法在整个 DOM 中搜索特定文本,例如 CAPTCHA 单词?

最佳答案

您可以使用查找 并指定text参数:

With text you can search for strings instead of tags. As with name and the keyword arguments, you can pass in a string, a regular expression, a list, a function, or the value True.

>>> from bs4 import BeautifulSoup
>>> data = """
... <div>test1</div>
... <div class="myclass1">test2</div>
... <div class="myclass2">CAPTCHA</div>
... <div class="myclass3">test3</div>"""
>>> soup = BeautifulSoup(data)
>>> soup.find(text='CAPTCHA').parent
<div class="myclass2">CAPTCHA</div>

如果 CAPTCHA 只是文本的一部分,您可以将 lambda 函数传递给 text 并检查 CAPTCHA 位于标签文本内:

>>> data = """
... <div>test1</div>
... <div class="myclass1">test2</div>
... <div class="myclass2">Here CAPTCHA is a part of a sentence</div>
... <div class="myclass3">test3</div>"""
>>> soup = BeautifulSoup(data)
>>> soup.find(text=lambda x: 'CAPTCHA' in x).parent
<div class="myclass2">Here CAPTCHA is a part of a sentence</div>

或者,如果将正则表达式传递给 text,也可以实现同样的效果:

>>> import re
>>> soup.find(text=re.compile('CAPTCHA')).parent
<div class="myclass2">Here CAPTCHA is a part of a sentence</div>

关于python - Beautifulsoup:如果标签或元素未知,如何查找字符串?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23486424/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com