gpt4 book ai didi

python - 使用 beautifulsoup 从维基百科表中获取列

转载 作者:太空宇宙 更新时间:2023-11-03 15:10:58 25 4
gpt4 key购买 nike

source_code = requests.get('http://en.wikipedia.org/wiki/Taylor_Swift_discography')
soup = BeautifulSoup(source_code.text)
tables = soup.find_all("table")

我正在尝试从 Taylor Swift's discography 的“单曲列表”表中获取歌曲名称列表

该表没有唯一的类或 ID。我能想到的唯一独特的东西是围绕“单例名单......”的标题标签

List of singles as main artist, with selected chart positions, sales figures and certifications

我试过:

table = soup.find_all("caption")

但它没有返回任何内容,我假设 caption 不是 bs4 中可识别的标签?

最佳答案

其实和findAll()find_all()没有任何关系。 findAll()BeautifulSoup3 中使用,留在 BeautifulSoup4 出于兼容性原因,引用自 bs4的源码:

def find_all(self, name=None, attrs={}, recursive=True, text=None,
limit=None, **kwargs):
generator = self.descendants
if not recursive:
generator = self.children
return self._find_all(name, attrs, text, limit, generator, **kwargs)

findAll = find_all # BS3

而且,有一个更好的方法来获取单打列表,依靠 span 元素和 id="Singles" 指示 的开始单打段落。然后,使用 find_next_sibling()获取 span 标签父级之后的第一个表。然后,用scope="row"获取所有th元素:

from bs4 import BeautifulSoup
import requests


source_code = requests.get('http://en.wikipedia.org/wiki/Taylor_Swift_discography')
soup = BeautifulSoup(source_code.content)

table = soup.find('span', id='Singles').parent.find_next_sibling('table')
for single in table.find_all('th', scope='row'):
print(single.text)

打印:

"Tim McGraw"
"Teardrops on My Guitar"
"Our Song"
"Picture to Burn"
"Should've Said No"
"Change"
"Love Story"
"White Horse"
"You Belong with Me"
"Fifteen"
"Fearless"
"Today Was a Fairytale"
"Mine"
"Back to December"
"Mean"
"The Story of Us"
"Sparks Fly"
"Ours"
"Safe & Sound"
(featuring The Civil Wars)
"Long Live"
(featuring Paula Fernandes)
"Eyes Open"
"We Are Never Ever Getting Back Together"
"Ronan"
"Begin Again"
"I Knew You Were Trouble"
"22"
"Highway Don't Care"
(with Tim McGraw)
"Red"
"Everything Has Changed"
(featuring Ed Sheeran)
"Sweeter Than Fiction"
"The Last Time"
(featuring Gary Lightbody)
"Shake It Off"
"Blank Space"

关于python - 使用 beautifulsoup 从维基百科表中获取列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26789042/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com