gpt4 book ai didi

python BeautifulSoup 在表中查找某些内容

转载 作者:太空宇宙 更新时间:2023-11-03 18:16:56 25 4
gpt4 key购买 nike

各位, 我已经成功地让 beautifulsoup 使用以下内容来抓取页面

html =  response.read()
soup = BeautifulSoup(html)
links = soup.findAll('a')

出现多次

<A href="javascript:Set_Variables('foo1','bar1''')"onmouseover="javascript: return window.status=''">
<A href="javascript:Set_Variables('foo2','bar2''')"onmouseover="javascript: return window.status=''">

如何迭代它并获取 foo/bar 值?

谢谢

最佳答案

您可以使用正则表达式从 href 属性中提取变量:

import re
from bs4 import BeautifulSoup

data = """
<div>
<table>
<A href="javascript:Set_Variables('foo1','bar1''')" onmouseover="javascript: return window.status=''">
<A href="javascript:Set_Variables('foo2','bar2''')" onmouseover="javascript: return window.status=''">
</table>
</div>
"""

soup = BeautifulSoup(data)

pattern = re.compile(r"javascript:Set_Variables\('(\w+)','(\w+)'")
for a in soup('a'):
match = pattern.search(a['href'])
if match:
print match.groups()

打印:

('foo1', 'bar1')
('foo2', 'bar2')

关于python BeautifulSoup 在表中查找某些内容,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24874830/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com