gpt4 book ai didi

python - 如何仅抓取同一类中的某些标签?

转载 作者:太空宇宙 更新时间:2023-11-03 20:51:53 25 4
gpt4 key购买 nike

我正在创建这个程序,它允许我从这个网站上抓取所有角色的名字和能力。包含我需要的信息的标签 (li) 与其他不需要的 li 标签混合在一起。

我尝试过选择不同的类(class),但这行不通。

这是我的代码:

import bs4, requests, lxml, re, time, os
from bs4 import BeautifulSoup as soup

def webscrape():
res = requests.get('https://www.usgamer.net/articles/15-11-2017-skyrim-guide-for-xbox-one-and-ps4-which-races-and-character-builds-are-the-best')
soup = bs4.BeautifulSoup(res.text, 'lxml')
races_list = soup.find_all("li < strong")
races_list_text = [f.text.strip() for f in races_list]
print(races_list_text)
time.sleep(1)
webscrape()

预计打印出所有比赛及其相应信息。

最佳答案

您可以使用以下内容

import requests
from bs4 import BeautifulSoup as bs

r = requests.get('https://www.usgamer.net/articles/15-11-2017-skyrim-guide-for-xbox-one-and-ps4-which-races-and-character-builds-are-the-best')
soup = bs(r.content, 'lxml')

#one list of tuples
race_info = [ (item.text, item.next_sibling) for item in soup.select('h2 ~ ul strong')]
# separate lists
races, abilities = zip(*[ (item.text, item.next_sibling) for item in soup.select('h2 ~ ul strong')])

字典可能会更好,在这种情况下你可以这样做

race_info = [ (item.text, item.next_sibling) for item in soup.select('h2 ~ ul strong')]
race_info = dict(race_info)

~general sibling combinator :

The ~ combinator selects siblings. This means that the second element follows the first (though not necessarily immediately), and both share the same parent.

关于python - 如何仅抓取同一类中的某些标签?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56266292/

25 4 0