gpt4 book ai didi

python - 通过bs4获取html表数据到python中

转载 作者:行者123 更新时间:2023-12-01 00:07:17 24 4
gpt4 key购买 nike

我正在尝试从 twitch sub count 站点获取数据以查看各种 twitch channel 数据。我希望能够输入用户名并获取 channel 的排名和当前子计数。

from urllib.request import urlopen, Request 
from bs4 import BeautifulSoup as soup

url = "https://twitchanalysis.top/topsubs"
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0
Safari/537.3"}

client = Request(url=url, headers=headers)
page_html = urlopen(client).read()
page_soup = soup(page_html, "html.parser")
db = {}
table = page_soup.find("table", id="topsubs_table")

for cell in page_soup.find_all('td')[3]:
cell = page_soup.find_all('td')
db[cell[2].text] = [cell[0].text, cell[3].text]
print(db)

但是,此代码在运行时仅返回 1 号 channel 。它应该返回第一页上的所有 channel 。我不知道该怎么办。请帮忙。

最佳答案

要获取所有记录,您必须遍历行。

from urllib.request import urlopen, Request
from bs4 import BeautifulSoup as soup
url = "https://twitchanalysis.top/topsubs"
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.3"}
client = Request(url=url, headers=headers)
page_html = urlopen(client).read()
page_soup = soup(page_html, "html.parser")
db = {}
table = page_soup.find("table", id="topsubs_table")
for row in table.find_all('tr')[1:]:
cell = row.find_all('td')
if 'This is a spot for an advertisement' in cell[0].text:
continue
else:
db[cell[2].text] = [cell[0].text, cell[3].text]
print(db)

输出:

{'montanablack88': ['9', '23192'], 'nickmercs': ['4', '25314'], 'pokimane': ['36', '11173'], 'gladd': ['6', '24422'], 'criticalrole': ['16', '17529'], 'maximilian_dood': ['31', '12111'], 'ratirl': ['20', '15106'], 'cohhcarnage': ['12', '20752'], 'jasonr': ['40', '10313'], 'forsen': ['43', '9890'], 'teepee': ['23', '14090'], 'jerma985': ['41', '10180'], 'BobbyPoffGaming': ['19', '16281'], 'castro_1021': ['5', '24670'], 'drlupo': ['7', '23886'], 'alanzoka': ['30', '12521'], 'trainwreckstv': ['29', '13087'], 'noway4u_sir': ['21', '14611'], 'dakotaz': ['39', '10551'], 'ludwig': ['38', '10805'], 'rallied': ['42', '9893'], 'cdnthe3rd': ['48', '9103'], 'therealknossi': ['13', '20563'], 'lord_kebun': ['33', '11311'], 'xqcow': ['2', '32624'], 'littlesiha': ['44', '9693'], 'zerator': ['25', '13498'], 'chocotaco': ['27', '13373'], 'paymoneywubby': ['32', '11972'], 'timthetatman': ['14', '19526'], 'tfue': ['18', '16604'], 'auronplay': ['47', '9590'], 'sacriel': ['28', '13123'], 'lirik': ['17', '17497'], 'pestily': ['15', '18013'], 'rubius': ['35', '11223'], 'FORMAL': ['22', '14200'], 'drdisrespect': ['3', '29027'], 'admiralbahroo': ['10', '22752'], 'papaplatte': ['24', '13519'], 'nick28t': ['49', '9101'], 'joshog': ['34', '11284'], 'shlorox': ['37', '10980'], 'loltyler1': ['45', '9680'], 'gronkh': ['26', '13391'], 'gamesdonequick': ['1', '35885'], 'summit1g': ['8', '23787'], 'MOONMOON': ['11', '22220'], 'zanoxvii': ['46', '9677']}

关于python - 通过bs4获取html表数据到python中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59883857/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com