gpt4 book ai didi

python - 使用 beautiful soup 基于类隔离 'td a' 标签

转载 作者:行者123 更新时间:2023-11-30 22:41:11 25 4
gpt4 key购买 nike

我想将此 url 中的 url 链接写入文件,但表格中的每一行都有 2 个 'td a' 标记。我只想要一个 class="pagelink" href="/search" 等。

我尝试了以下代码,希望只选取 "class":"pagelink" 的代码,但产生了错误:

AttributeError: 'Doctype' object has no attribute 'find_all'

有人可以帮忙吗?

import requests
from bs4 import BeautifulSoup as soup
import csv

writer.writerow(['URL', 'Reference', 'Description', 'Address'])

url = https://www.saa.gov.uk/search/?SEARCHED=1&ST=&SEARCH_TERM=city+of+edinburgh%2C+EDINBURGH&ASSESSOR_ID=&SEARCH_TABLE=valuation_roll_cpsplit&PAGE=0&DISPLAY_COUNT=1000&TYPE_FLAG=CP&ORDER_BY=PROPERTY_ADDRESS&H_ORDER_BY=SET+DESC&ORIGINAL_SEARCH_TERM=city+of+edinburgh&DRILL_SEARCH_TERM=BOSWALL+PARKWAY%2C+EDINBURGH&DD_TOWN=EDINBURGH&DD_STREET=BOSWALL+PARKWAY#results

response = session.get(url) #not used until after the iteration begins
html = soup(response.text, 'lxml')

for link in html:
prop_link = link.find_all("td a", {"class":"pagelink"})

writer.writerow([prop_link])

最佳答案

您的 html 变量包含一个不可迭代的 Doctype 对象。您需要在该对象中使用 find_allselect 来查找所需的节点。

示例:

import requests
from bs4 import BeautifulSoup as soup
import csv

outputfilename = 'Ed_Streets2.csv'

#inputfilename = 'Edinburgh.txt'

baseurl = 'https://www.saa.gov.uk'

outputfile = open(outputfilename, 'wb')
writer = csv.writer(outputfile)
writer.writerow(['URL', 'Reference', 'Description', 'Address'])

session = requests.session()

url = "https://www.saa.gov.uk/search/?SEARCHED=1&ST=&SEARCH_TERM=city+of+edinburgh%2C+EDINBURGH&ASSESSOR_ID=&SEARCH_TABLE=valuation_roll_cpsplit&PAGE=0&DISPLAY_COUNT=100&TYPE_FLAG=CP&ORDER_BY=PROPERTY_ADDRESS&H_ORDER_BY=SET+DESC&ORIGINAL_SEARCH_TERM=city+of+edinburgh&DRILL_SEARCH_TERM=BOSWALL+PARKWAY%2C+EDINBURGH&DD_TOWN=EDINBURGH&DD_STREET=BOSWALL+PARKWAY#results"

response = session.get(url)
html = soup(response.text, 'lxml')

prop_link = html.find_all("a", class_="pagelink button small")

for link in prop_link:
prop_url = baseurl+(link["href"])
print prop_url
writer.writerow([prop_url, "", "", ""])

关于python - 使用 beautiful soup 基于类隔离 'td a' 标签,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42675154/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com