gpt4 book ai didi

python - 基本的 BeautifulSoup 维基百科抓取

转载 作者:行者123 更新时间:2023-11-28 22:32:16 25 4
gpt4 key购买 nike

我正在尝试获得一个非常基本、简短、基本的无序列表 <ul>离开维基百科。我的最终目标是将其放入 DataFrame 中.我的问题是,我从这里去哪里?

In [28]: from bs4 import BeautifulSoup

import urllib2

import requests

from pandas import Series,DataFrame

In [29]: url = "https://en.wikipedia.org/wiki/National_Pro_Grid_League"

In [31]: result = requests.get(url)

In [32]: c = result.content

In [33]: soup = BeautifulSoup(c)

我似乎无法在这个 StackOverflow 上找到任何答案,所以如果有人能给我任何建议,我将不胜感激。
这是我正在寻找的具体列表:

Active teams[edit]
Baltimore Anthem (2015–present)
Boston Iron (2014–present)
DC Brawlers (2014–present)
Los Angeles Reign (2014–present)
Miami Surge (2014–present)
New York Rhinos (2014–present)
Phoenix Rise (2014–present)
San Francisco Fire (2014–present)

最佳答案

首先,您需要找到页面的正确部分。您可以通过查找带有 id="Active_teams_at_league_closing" 的标题来执行此操作然后找到下一个 <ul>那里的元素。

from bs4 import BeautifulSoup
import requests

url = "https://en.wikipedia.org/wiki/National_Pro_Grid_League"
r = requests.get(url)
soup = BeautifulSoup(r.content)

heading = soup.find(id='Active_teams_at_league_closing')
teams = heading.find_next('ul')
for team in teams:
print(team.string)

关于python - 基本的 BeautifulSoup 维基百科抓取,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41152492/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com