gpt4 book ai didi

python - BeautifulSoup 问题 : How to get the exact link by matching the exact tag content?

转载 作者:搜寻专家 更新时间:2023-10-31 08:50:41 28 4
gpt4 key购买 nike

我想获取“S-1”之后的链接,而不是“S-1/A”之后的链接。我尝试了“.find_all(lambda tag: tag.name == 'td' and tag.get()==['S-1'])”,尝试了“.select('td.s-1')”,并未能获得链接。我很感激任何帮助。

相关页面源码如下:

    <tr>
<td>ADVANCE FINANCIAL BANCORP</td>
<td>S-1/A</td>
<td>10/31/1996</td>
<td><a id="two_column_main_content_rpt_filings_fil_view_0" href="/markets/ipos/filing.ashx?filingid=1567309" target="_blank">Filing</a>
</td>
</tr>

<tr>
<td>ADVANCE FINANCIAL BANCORP</td>
<td>S-1</td>
<td>9/27/1996</td>
<td><a id="two_column_main_content_rpt_filings_fil_view_1" href="/markets/ipos/filing.ashx?filingid=921318" target="_blank">Filing</a>
</td>
</tr>

相关页面源码截图如下:

Relevant Page Source

这是完整页面源的链接:

https://www.nasdaq.com/markets/ipos/company/advance-financial-bancorp-5492-13046?tab=financials

最佳答案

试试这个:

from bs4 import BeautifulSoup
import requests

def getlink(url):
response = requests.get(url)
mainpage = BeautifulSoup(response.text, 'html5lib')
table = mainpage.findAll('table', attrs={"class": "marginB10px"})
links = table[1].findAll('a')
return links[1].get('href')

link = getlink('https://www.nasdaq.com/markets/ipos/company/advance-financial-bancorp-5492-13046?tab=financials')
mainlink = 'https://www.nasdaq.com'
link = mainlink + link
print(link)

输出:

https://www.nasdaq.com/markets/ipos/filing.ashx?filingid=921318

关于python - BeautifulSoup 问题 : How to get the exact link by matching the exact tag content?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50703916/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com