gpt4 book ai didi

python - 如何将主链接添加到子链接html,以便可以调用该链接?

转载 作者:行者123 更新时间:2023-12-01 02:13:14 24 4
gpt4 key购买 nike

这是我的代码,它给出了 HTML 页面中特定新闻链接的列表,它只包含资源名称和参数,我想包含主域名,以便链接可以操作。

import requests
from bs4 import BeautifulSoup


def get_cric_info_articles():

cricinfo_article_link = "http://www.espncricinfo.com/ci/content/story/news.html"

r = requests.get(cricinfo_article_link)
cricinfo_article_html = r.text

soup = BeautifulSoup(cricinfo_article_html, "html.parser")
# print(soup.prettify())

cric_info_items = soup.find_all("h2",
{"class": "story-title"})
cricinfo_article_dict = {}

for div in cric_info_items:
cricinfo_article_dict[div.find('a').string] = div.find('a')['href']

return cricinfo_article_dict


print(get_cric_info_articles())

我得到的{'贝尔-德拉蒙德在揭幕战中领先MCC':'/ci/content/story/1135157.html','苏格兰选择布拉德·惠尔和克里斯· Solr 参加世界杯预选赛' : '/scotland/content/story/1135152.html', 'Newlands 努力实现水独立': '/southafrica/content/story/1135120.html'}

我正在尝试将此'/ci/content/story/1135157.html'附加到http://www.espncricinfo.com/
所以最终的链接将是 http://www.espncricinfo.com/ci/content/story/1135157.html ', 我怎样才能做到这一点?很抱歉这篇文章很长

我所做的改变

for div in cric_info_items:
a = div.find('a')['href']
b = 'http://www.espncricinfo.com/'
c = urljoin(b,a)
cricinfo_article_dict[div.find('a').string] = c

最佳答案

您可以使用 urllib.parse 模块来实现此目的:

from urllib.parse import urljoin
urljoin('http://www.espncricinfo.com/', '/ci/content/story/1135157.html')

希望有帮助。

关于python - 如何将主链接添加到子链接html,以便可以调用该链接?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48584755/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com