gpt4 book ai didi

用于网络抓取的 Python 循环

转载 作者:太空宇宙 更新时间:2023-11-04 00:35:48 25 4
gpt4 key购买 nike

我正在尝试从 a website 中提取数据.

我写的代码是

import csv

import requests

from bs4 import BeautifulSoup

page = requests.get("http://www.realcommercial.com.au/sold/property-offices-
retail-showrooms+bulky+goods-land+development-hotel+leisure+medical+consulting-other-in-wa/list-1?includePropertiesWithin=includesurrounding&activeSort=list-date&autoSuggest=true")

soup = BeautifulSoup(page.content, 'html.parser')

Address_1 = soup.find('p', attrs ={'class' :'details-panel__address'})

Address = Address.text.strip()

我得到的结果是

 'GF 255 Adelaide TerracePerth, WA 6000'

这只是一个列表的一行地址。

当我使用 soup.find_all 时,我得到的结果如下:

 p class="details-panel__address" data-reactid="90"><span class="details-
panel__address-text text-truncate" data-reactid="91">GF 255 Adelaide
Terrace</span><span class="details-panel__address-text text-truncate" data-
reactid="92">Perth, WA 6000</span></p

p class="details-panel__address" data-reactid="122"><span class="details-
panel__address-text text-truncate" data-reactid="123">369-371 Oxford
Street</span><span class="details-panel__address-text text-truncate" data-
reactid="124">Mount Hawthorn, WA 6016</span></p>,

p class="details-panel__address" data-reactid="148"><span class="details-
panel__address-text text-truncate" data-reactid="149">2 Lloyd Street</span>
<span class="details-panel__address-text text-truncate" data-
reactid="150">Midland, WA 6056</span></p>,

p class="details-panel__address" data-reactid="172"><span class="details-
panel__address-text text-truncate" data-reactid="173">Bluenote Building, 16/162
Colin Street</span><span class="details-panel__address-text text-truncate"
data-reactid="174">West Perth, WA 6005</span></p>,

p class="details-panel__address" data-reactid="196"><span class="details-
panel__address-text text-truncate" data-reactid="197">Bluenote Building, 10/162
Colin Street</span><span class="details-panel__address-text text-truncate"
data-reactid="198">West Perth, WA 6005</span></p>

请建议我应该如何提取有关此页面上所有列表的地址、属性类型、销售日期、销售值(value)、区域、代理名称、代理名称和电话号码的信息。另外,我不知道如何使用循环打开特定页面上的每个列表并从中获取信息。

最佳答案

soup.find_all 返回元素列表。要获取文本,您必须遍历元素列表以使用 text 属性提取文本。

import requests 

from bs4 import BeautifulSoup

page = requests.get("""http://www.realcommercial.com.au/sold/property-offices-
retail-showrooms+bulky+goods-land+development-hotel+leisure+medical+consulting-other-in-wa/list-1?includePropertiesWithin=includesurrounding&activeSort=list-date&autoSuggest=true""")

soup = BeautifulSoup(page.content, 'html.parser')

Address_1 = soup.find_all('p', attrs ={'class' :'details-panel__address'})
address_list = [ address.text.strip() for address in Address_1]
print(address_list)
links = soup.find_all('a', attrs ={'class' :'details-panel'})
hrefs = [link['href'] for link in links]
print(hrefs)
# Now iterate through the list of urls and extract the required data

关于用于网络抓取的 Python 循环,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44108944/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com