gpt4 book ai didi

python - 无法使用 BeautifulSoup 检索所需 XPATH 的元素

转载 作者:行者123 更新时间:2023-11-28 16:24:48 24 4
gpt4 key购买 nike

我刚开始使用网络抓取,我正在使用 BeautifulSoup (Python) 来完成这项工作。我想获取示例网页的一些属性数据以进行测试。代码开始如下,

import requests
from bs4 import BeautifulSoup as Soup

page = "http://www.zillow.com/homedetails/1630-Amalfi-Dr-Pacific-Palisades-CA-90272/20546602_zpid/"
response = requests.get(page)
soup = Soup(response.text)

# now, I would like to get the price for sale price of the apartment
# the element in the HTML DOM is as following,
# <span class="" id="yui_3_18_1_1_1464168312477_3548">$12,895,000<span class="value-suffix"></span></span>
# The XPath of the element, //*[@id="yui_3_18_1_1_1464168312477_3548"]

# I write the code as following,
value = soup.select('span#yui_3_18_1_1_1464168312477_3548')
print value

我没有得到任何结果。我做错了什么?

最佳答案

您在控制台中查看的源代码与您从请求中返回的源代码不同,span id="yui_3_18_1_1_1464170172533_3087" 是动态生成的,因此您需要使用一些东西喜欢selenium .

不幸的是,id 每次访问也是唯一的,所以我们不能使用它,一致的是父 div,所以我们可以使用 main-row home-summary-row 获取父级内的第一个 span > 使用 css 选择器 的类:

In [4]: from selenium import webdriver
In [5]: dr = webdriver.PhantomJS()

In [6]: dr.get("http://www.zillow.com/homedetails/1630-Amalfi-Dr-Pacific-Palisades-CA-90272/20546602_zpid/")
In [7]: span = dr.find_element_by_css_selector('div.main-row.home-summary-row span')
In [8]: print(span.text)
$12,895,000

我用了phantomjs对于 headless 浏览,您可以根据需要使用 Firefox 或 Chrome,所有信息都在链接中。

实际上再次查看源代码我们可以使用 bs4 做同样的事情,id 是唯一动态生成的东西所以如果我们忘记了 id 我们可以得到价格:

In [26]: soup.select_one("div.main-row.home-summary-row span").text
Out[26]: u'$12,895,000'

更好的方法是使用元标记来获取大量信息:

import requests
from bs4 import BeautifulSoup as Soup

page = "http://www.zillow.com/homedetails/1630-Amalfi-Dr-Pacific-Palisades-CA-90272/20546602_zpid/"
response = requests.get(page)
soup = Soup(response.text,"lxml")
metas = soup.select("meta")

现在,如果我们看看 metas 返回了什么:

from pprint import pprint as pp

pp(metas)

[<meta content="on" http-equiv="x-dns-prefetch-control"/>,
<meta charset="unicode-escape"/>,
<meta content="View 31 photos of this $12,895,000, 7 bed, 10.0 bath, 10500 sqft single family home located at 1630 Amalfi Dr, Pacific Palisades, CA 90272 built in 2015. MLS # 16-103696." name="description"/>,
<meta content="Zillow, Inc." name="author"/>,
<meta content="Copyright (c) 2006-2014 Zillow, Inc." name="Copyright"/>,
<meta content="none" name="msapplication-config"/>,
<meta content="ALL" name="ROBOTS"/>,
<meta content="NOYDIR" name="ROBOTS"/>,
<meta content="NOODP" name="ROBOTS"/>,
<meta content="yes" name="apple-mobile-web-app-capable"/>,
<meta content="black-translucent" name="apple-mobile-web-app-status-bar-style"/>,
<meta content="telephone=no" name="format-detection"/>,
<meta content="#3366b8" name="msapplication-TileColor"/>,
<meta content="http://www.zillowstatic.com/static/images/logos/zillow-logo-win8-tile.png" name="msapplication-TileImage"/>,
<meta content="/8Me6HBNZX/rt2n5/y1Lo3ZIrkcvkTBimqviTDiurR4=" name="verify-v1"/>,
<meta content="7cb4abe457d82ae8" name="y_key"/>,
<meta content="width=device-width, height=device-height, initial-scale=1.0, maximum-scale=1.0, minimum-scale=1.0, user-scalable=no" name="viewport"/>,
<meta content="Zillow Real Estate, Rentals, and Mortgage" itemprop="name"/>,
<meta content="The most trafficked website about home sales and rentals, with real estate values for almost every U.S. home. 1,000,000 listings that you won't find on MLS." itemprop="description"/>,
<meta content="http://www.zillowstatic.com/static/images/social/share_thumbnail.png" itemprop="image"/>,
<meta content="691f1bfccade71b5-c065751219a379dd-g64cedb67f5ea020a-a" name="google-translate-customization"/>,
<meta content="202692,878610170,662000799,100001769907023,10716009,769244502,10716649,503322863" property="fb:admins"/>,
<meta content="172285552816089" property="fb:app_id"/>,
<meta content="zillow_fb:home" property="og:type"/>,
<meta content="1630 Amalfi Dr, Pacific Palisades, CA 90272" property="og:zillow_fb:address"/>,
<meta content="7" property="zillow_fb:beds"/>,
<meta content="10" property="zillow_fb:baths"/>,
<meta content='For sale: $12,895,000. Stunning brand new Contemporary Cape Cod Estate in Palisades Riviera by Huntington Estate Homes w/ 7 beds, 10 baths, + office in 10,500 sq ft on an 18,590 sq ft lot. Soaring ceilings, magnificent chandelier, &amp; floating staircase create a grand entrance w/ glass wine cellar, formal living &amp; dining rooms. Floor plan flows openly between gourmet kitchen, family room, &amp; patio with a set of disappearing Fleetwood Pocket doors. Fireplaces in living, family, &amp; master suite add warmth to the contemporary feel, &amp; detailed wood paneling &amp; coffered ceilings enhance quality of design throughout. Master suite opens completely to sweeping ocean views &amp; private patio. Lower level feats. Old Hollywood style theater w/130" screen, surround sound, stadium seats, floor-to-ceiling suede panels, exercise pool, spa, gym, office, guest beds, open air patio, &amp; elevator access to take you from floor to floor. Perfect for entertaining - outdoor BBQ, seating, &amp; saltwater pool/spa complete this elegant estate.' property="zillow_fb:description"/>,
<meta content="http://www.zillow.com/homedetails/1630-Amalfi-Dr-Pacific-Palisades-CA-90272/20546602_zpid/" property="og:url"/>,
<meta content="Pacific Palisades Home For Sale" property="og:title"/>,
<meta content="http://photos2.zillowstatic.com/p_d/IS5ypcj39edbdc1000000000.jpg" property="og:image"/>,
<meta content='For sale: $12,895,000. Stunning brand new Contemporary Cape Cod Estate in Palisades Riviera by Huntington Estate Homes w/ 7 beds, 10 baths, + office in 10,500 sq ft on an 18,590 sq ft lot. Soaring ceilings, magnificent chandelier, &amp; floating staircase create a grand entrance w/ glass wine cellar, formal living &amp; dining rooms. Floor plan flows openly between gourmet kitchen, family room, &amp; patio with a set of disappearing Fleetwood Pocket doors. Fireplaces in living, family, &amp; master suite add warmth to the contemporary feel, &amp; detailed wood paneling &amp; coffered ceilings enhance quality of design throughout. Master suite opens completely to sweeping ocean views &amp; private patio. Lower level feats. Old Hollywood style theater w/130" screen, surround sound, stadium seats, floor-to-ceiling suede panels, exercise pool, spa, gym, office, guest beds, open air patio, &amp; elevator access to take you from floor to floor. Perfect for entertaining - outdoor BBQ, seating, &amp; saltwater pool/spa complete this elegant estate.' property="og:description"/>,
<meta content="https://videos.zillowstatic.com/production/07a58eebcafbfe833b92f17945131f2e251b5fe5/mp4_600k_landscape_z1/mp4_600k_landscape_z1.mp4" property="og:video"/>,
<meta content="https://videos.zillowstatic.com/production/07a58eebcafbfe833b92f17945131f2e251b5fe5/mp4_600k_landscape_z1/mp4_600k_landscape_z1.mp4" property="og:video:secure_url"/>,
<meta content="640" property="og:video:width"/>,
<meta content="video/mp4" property="og:video:type"/>,
<meta content="360" property="og:video:height"/>,
<meta content="238648973530.apps.googleusercontent.com" name="google-signin-clientid"/>,
<meta content="https://www.googleapis.com/auth/plus.login https://www.googleapis.com/auth/plus.profile.emails.read" name="google-signin-scope"/>,
<meta content="http://zillow.com" name="google-signin-cookiepolicy"/>,
<meta content="summary_large_image" name="twitter:card"/>,
<meta content="@Zillow" name="twitter:site"/>,
<meta content="@Zillow" name="twitter:creator"/>,
<meta content="1630 Amalfi Dr" name="twitter:title"/>,
<meta content="Stunning brand new Contemporary Cape Cod Estate in Palisades Riviera by Huntington Estate Homes w/ 7 beds, 10 baths, + office in 10,500 sq ft on an 18,590 sq ft lot. Soaring ceilings, magnificent chandelier, &amp;amp; floating staircase create a grand entrance w/ glass wine cellar, formal living &amp;amp; dining rooms. Floor plan flows openly between gourmet kitchen, family room, &amp;amp; patio with a set of disappearing Fleetwood Pocket doors. Fireplaces in living, family, &amp;amp; master suite add warmth to the contemporary feel, &amp;amp; detailed wood paneling &amp;amp; coffered ceilings enhance quality of design throughout. Master suite opens completely to sweeping ocean views &amp;amp; private patio. Lower level feats. Old Hollywood style theater w/130&amp;quot; screen, surround sound, stadium seats, floor-to-ceiling suede panels, exercise pool, spa, gym, office, guest beds, open air patio, &amp;amp; elevator access to take you from floor to floor. Perfect for entertaining - outdoor BBQ, seating, &amp;amp; saltwater pool/spa complete this elegant estate." name="twitter:description"/>,
<meta content="http://photos2.zillowstatic.com/p_d/IS5ypcj39edbdc1000000000.jpg" name="twitter:image"/>,
<meta content="1630 Amalfi Dr, Pacific Palisades, CA 90272" itemprop="name"/>,
<meta content="USD" itemprop="priceCurrency"/>,
<meta content="$12,895,000" itemprop="price"/>,
<meta content="34.060605" itemprop="latitude"/>,
<meta content="-118.501625" itemprop="longitude"/>]

我们可以使用属性提取价格和其他信息:

In [22]: soup = Soup(response.text,"lxml")

In [23]: soup.select_one("meta[itemprop=price]")["content"]
Out[23]: '$12,895,000'

In [24]: soup.select_one("meta[name=twitter:description]")["content"]
Out[24]: 'Stunning brand new Contemporary Cape Cod Estate in Palisades Riviera by Huntington Estate Homes w/ 7 beds, 10 baths, + office in 10,500 sq ft on an 18,590 sq ft lot. Soaring ceilings, magnificent chandelier, &amp; floating staircase create a grand entrance w/ glass wine cellar, formal living &amp; dining rooms. Floor plan flows openly between gourmet kitchen, family room, &amp; patio with a set of disappearing Fleetwood Pocket doors. Fireplaces in living, family, &amp; master suite add warmth to the contemporary feel, &amp; detailed wood paneling &amp; coffered ceilings enhance quality of design throughout. Master suite opens completely to sweeping ocean views &amp; private patio. Lower level feats. Old Hollywood style theater w/130&quot; screen, surround sound, stadium seats, floor-to-ceiling suede panels, exercise pool, spa, gym, office, guest beds, open air patio, &amp; elevator access to take you from floor to floor. Perfect for entertaining - outdoor BBQ, seating, &amp; saltwater pool/spa complete this elegant estate.'
In [27]: soup.select_one("meta[itemprop=latitude]")["content"]
Out[27]: '34.060605'
In [28]: soup.select_one("meta[itemprop=longitude]")["content"]
Out[28]: '-118.501625'
In [29]: soup.select_one("meta[property=og:zillow_fb:address]")["content"]
Out[29]: '1630 Amalfi Dr, Pacific Palisades, CA 90272'

关于python - 无法使用 BeautifulSoup 检索所需 XPATH 的元素,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37433366/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com