gpt4 book ai didi

python - 如何使用 'contents'来抓取我想要的值?

转载 作者:太空宇宙 更新时间:2023-11-03 20:45:11 25 4
gpt4 key购买 nike

我正在关注这个linkwebsite 中抓取数据

我想抓取姓名、网址、年份和国籍,当我尝试使用以下代码时

import requests
import csv
from bs4 import BeautifulSoup
import bs4


f = csv.writer(open('z_artist_names_assignment.csv', 'w'))
f.writerow(['N'])

pages = []

for i in range(1, 2):
url = 'https://web.archive.org/web/20121007172955/https://www.nga.gov/collection/anZ' + str(i) + '.htm'
pages.append(url)



for item in pages:
page = requests.get(item,timeout=10)
soup = BeautifulSoup(page.text, 'html.parser')

last_links = soup.find(class_='AlphaNav')
last_links.decompose()

artist_name_list = soup.find(class_='BodyText')
artist_name_list_items = artist_name_list.find_all('a')

nationality_list = soup.find(class_='BodyText')
nationality_list_items = nationality_list.find_all('td')

for artist_name in artist_name_list_items:
names = artist_name.contents[0]
links = 'https://web.archive.org' + artist_name.get('href')

for nationality in nationality_list_items:
nationality = nationality.contents[0]
print(nationality)

返回打印(国籍)不仅是内容,还有名称和选项卡,例如

<a href="/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=11630">Zabaglia, Niccola</a>
Italian, 1664 - 1750
<a href="/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=34202">Zaccone, Fabian</a>
American, 1910 - 1992
<a href="/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=3475">Zadkine, Ossip</a>
French, 1890 - 1967

我只想要“意大利,1664 - 1750”或“意大利”或“1664 - 1750”。如何使用内容方法来获取这些值?

这是 HTML

<tr valign="top"><td><a href="/web/20121007172955/http://www.nga.gov/cgi-bin/tsearch?artistid=3452">Zalce, Alfredo</a></td><td>Mexican, born 1908</td></tr>


最佳答案

我认为最好找到所有包含艺术家信息的“tr”元素,而不是“td”。

下面是示例。希望对您有帮助!

entries = soup.find_all("tr", {"valign" : "top"})


links = ['https://web.archive.org{}'.format(entry.contents[0].a['href']) for entry in entries]
names = [entry.contents[0].text for entry in entries]
nationalities = [entry.contents[1] for entry in entries]

关于python - 如何使用 'contents'来抓取我想要的值?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56654139/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com