gpt4 book ai didi

python-3.x - Python 网络抓取遗漏了搜索对象列表中的一个元素

转载 作者:行者123 更新时间:2023-12-04 15:27:47 25 4
gpt4 key购买 nike

我正在尝试使用 Python 3.7 中的 beautifulsouprequests 库抓取一些数据。对于此网页上的每个项目(标记文章),都有一个 youtube 链接。找到 article 的所有实例后,我可以成功提取标题。此代码还成功地在每篇文章中找到了 youtube-player 类的实例,但索引 7 除外,其输出为 None

from bs4 import BeautifulSoup
import requests
url = 'https://coreyms.com/page/12'
soup = BeautifulSoup(requests.get(url).text, "html.parser")
articles = soup.find_all('article')

for article in articles:
headline = article.h2.a.text
print(headline)
link = article.find('iframe', {'class': 'youtube-player'})
print(link)

但是,从源头(beautifulsoup 的输出)来看,如果我直接搜索 youtube-player,我会正确获取所有实例。

links = soup.find_all('iframe', {'class': 'youtube-player'})
for link in links:
print(link)

我如何改进我的代码以获取 article 循环中的所有 youtube-player 实例?

最佳答案

您可以使用 zip() 内置函数将标题和 YouTube 链接绑定(bind)在一起。

例如:

import requests
from bs4 import BeautifulSoup

url = 'https://coreyms.com/page/12'
soup = BeautifulSoup(requests.get(url).text, "html.parser")

for title, player in zip(soup.select('.entry-title'),
soup.select('iframe.youtube-player')):
print('{:<75}{}'.format(title.text, player['src']))

打印:

Git: Difference between “add -A”, “add -u”, “add .”, and “add *”           https://www.youtube.com/embed/tcd4txbTtAY?version=3&rel=1&fs=1&autohide=2&showsearch=0&showinfo=1&iv_load_policy=1&wmode=transparent
Programming Terms: Combinations and Permutations https://www.youtube.com/embed/QI9EczPQzPQ?version=3&rel=1&fs=1&autohide=2&showsearch=0&showinfo=1&iv_load_policy=1&wmode=transparent
Chrome Quick Tip: Quickly Bookmark Open Tabs for Later Viewing https://www.youtube.com/embed/tsiSg_beudo?version=3&rel=1&fs=1&autohide=2&showsearch=0&showinfo=1&iv_load_policy=1&wmode=transparent
Python: Comprehensions – How they work and why you should be using them https://www.youtube.com/embed/3dt4OGnU5sM?version=3&rel=1&fs=1&autohide=2&showsearch=0&showinfo=1&iv_load_policy=1&wmode=transparent
Python: Generators – How to use them and the benefits you receive https://www.youtube.com/embed/bD05uGo_sVI?version=3&rel=1&fs=1&autohide=2&showsearch=0&showinfo=1&iv_load_policy=1&wmode=transparent
Quickest and Easiest Way to Run a Local Web-Server https://www.youtube.com/embed/lE6Y6M9xPLw?version=3&rel=1&fs=1&autohide=2&showsearch=0&showinfo=1&iv_load_policy=1&wmode=transparent
Git for Beginners: Command-Line Fundamentals https://www.youtube.com/embed/HVsySz-h9r4?version=3&rel=1&fs=1&autohide=2&showsearch=0&showinfo=1&iv_load_policy=1&wmode=transparent
Time-Saving Keyboard Shortcuts for the Mac Terminal https://www.youtube.com/embed/TXzrk3b9sKM?version=3&rel=1&fs=1&autohide=2&showsearch=0&showinfo=1&iv_load_policy=1&wmode=transparent
Overview of Online Learning Resources in 2015 https://www.youtube.com/embed/QGy6M8HZSC4?version=3&rel=1&fs=1&autohide=2&showsearch=0&showinfo=1&iv_load_policy=1&wmode=transparent
Python: Else Clauses on Loops https://www.youtube.com/embed/Dh-0lAyc3Bc?version=3&rel=1&fs=1&autohide=2&showsearch=0&showinfo=1&iv_load_policy=1&wmode=transparent

编辑:似乎当您使用 html.parser 时,BeautifulSoup 在一个地方无法识别 youtube 链接,请使用 lxmlhtml5lib 改为:

import requests
from bs4 import BeautifulSoup

url = 'https://coreyms.com/page/12'
soup = BeautifulSoup(requests.get(url).text, "lxml")

for article in soup.select('article'):
title = article.select_one('.entry-title')
player = article.select_one('iframe.youtube-player') or {'src':''}
print('{:<75}{}'.format(title.text, player['src']))

关于python-3.x - Python 网络抓取遗漏了搜索对象列表中的一个元素,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61914180/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com