- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我刚开始使用网络抓取,我正在使用 BeautifulSoup (Python) 来完成这项工作。我想获取示例网页的一些属性数据以进行测试。代码开始如下,
import requests
from bs4 import BeautifulSoup as Soup
page = "http://www.zillow.com/homedetails/1630-Amalfi-Dr-Pacific-Palisades-CA-90272/20546602_zpid/"
response = requests.get(page)
soup = Soup(response.text)
# now, I would like to get the price for sale price of the apartment
# the element in the HTML DOM is as following,
# <span class="" id="yui_3_18_1_1_1464168312477_3548">$12,895,000<span class="value-suffix"></span></span>
# The XPath of the element, //*[@id="yui_3_18_1_1_1464168312477_3548"]
# I write the code as following,
value = soup.select('span#yui_3_18_1_1_1464168312477_3548')
print value
我没有得到任何结果。我做错了什么?
最佳答案
您在控制台中查看的源代码与您从请求中返回的源代码不同,span id="yui_3_18_1_1_1464170172533_3087"
是动态生成的,因此您需要使用一些东西喜欢selenium .
不幸的是,id 每次访问也是唯一的,所以我们不能使用它,一致的是父 div,所以我们可以使用 main-row home-summary-row
获取父级内的第一个 span > 使用 css 选择器 的类:
In [4]: from selenium import webdriver
In [5]: dr = webdriver.PhantomJS()
In [6]: dr.get("http://www.zillow.com/homedetails/1630-Amalfi-Dr-Pacific-Palisades-CA-90272/20546602_zpid/")
In [7]: span = dr.find_element_by_css_selector('div.main-row.home-summary-row span')
In [8]: print(span.text)
$12,895,000
我用了phantomjs对于 headless 浏览,您可以根据需要使用 Firefox 或 Chrome,所有信息都在链接中。
实际上再次查看源代码我们可以使用 bs4 做同样的事情,id 是唯一动态生成的东西所以如果我们忘记了 id 我们可以得到价格:
In [26]: soup.select_one("div.main-row.home-summary-row span").text
Out[26]: u'$12,895,000'
更好的方法是使用元标记来获取大量信息:
import requests
from bs4 import BeautifulSoup as Soup
page = "http://www.zillow.com/homedetails/1630-Amalfi-Dr-Pacific-Palisades-CA-90272/20546602_zpid/"
response = requests.get(page)
soup = Soup(response.text,"lxml")
metas = soup.select("meta")
现在,如果我们看看 metas 返回了什么:
from pprint import pprint as pp
pp(metas)
[<meta content="on" http-equiv="x-dns-prefetch-control"/>,
<meta charset="unicode-escape"/>,
<meta content="View 31 photos of this $12,895,000, 7 bed, 10.0 bath, 10500 sqft single family home located at 1630 Amalfi Dr, Pacific Palisades, CA 90272 built in 2015. MLS # 16-103696." name="description"/>,
<meta content="Zillow, Inc." name="author"/>,
<meta content="Copyright (c) 2006-2014 Zillow, Inc." name="Copyright"/>,
<meta content="none" name="msapplication-config"/>,
<meta content="ALL" name="ROBOTS"/>,
<meta content="NOYDIR" name="ROBOTS"/>,
<meta content="NOODP" name="ROBOTS"/>,
<meta content="yes" name="apple-mobile-web-app-capable"/>,
<meta content="black-translucent" name="apple-mobile-web-app-status-bar-style"/>,
<meta content="telephone=no" name="format-detection"/>,
<meta content="#3366b8" name="msapplication-TileColor"/>,
<meta content="http://www.zillowstatic.com/static/images/logos/zillow-logo-win8-tile.png" name="msapplication-TileImage"/>,
<meta content="/8Me6HBNZX/rt2n5/y1Lo3ZIrkcvkTBimqviTDiurR4=" name="verify-v1"/>,
<meta content="7cb4abe457d82ae8" name="y_key"/>,
<meta content="width=device-width, height=device-height, initial-scale=1.0, maximum-scale=1.0, minimum-scale=1.0, user-scalable=no" name="viewport"/>,
<meta content="Zillow Real Estate, Rentals, and Mortgage" itemprop="name"/>,
<meta content="The most trafficked website about home sales and rentals, with real estate values for almost every U.S. home. 1,000,000 listings that you won't find on MLS." itemprop="description"/>,
<meta content="http://www.zillowstatic.com/static/images/social/share_thumbnail.png" itemprop="image"/>,
<meta content="691f1bfccade71b5-c065751219a379dd-g64cedb67f5ea020a-a" name="google-translate-customization"/>,
<meta content="202692,878610170,662000799,100001769907023,10716009,769244502,10716649,503322863" property="fb:admins"/>,
<meta content="172285552816089" property="fb:app_id"/>,
<meta content="zillow_fb:home" property="og:type"/>,
<meta content="1630 Amalfi Dr, Pacific Palisades, CA 90272" property="og:zillow_fb:address"/>,
<meta content="7" property="zillow_fb:beds"/>,
<meta content="10" property="zillow_fb:baths"/>,
<meta content='For sale: $12,895,000. Stunning brand new Contemporary Cape Cod Estate in Palisades Riviera by Huntington Estate Homes w/ 7 beds, 10 baths, + office in 10,500 sq ft on an 18,590 sq ft lot. Soaring ceilings, magnificent chandelier, & floating staircase create a grand entrance w/ glass wine cellar, formal living & dining rooms. Floor plan flows openly between gourmet kitchen, family room, & patio with a set of disappearing Fleetwood Pocket doors. Fireplaces in living, family, & master suite add warmth to the contemporary feel, & detailed wood paneling & coffered ceilings enhance quality of design throughout. Master suite opens completely to sweeping ocean views & private patio. Lower level feats. Old Hollywood style theater w/130" screen, surround sound, stadium seats, floor-to-ceiling suede panels, exercise pool, spa, gym, office, guest beds, open air patio, & elevator access to take you from floor to floor. Perfect for entertaining - outdoor BBQ, seating, & saltwater pool/spa complete this elegant estate.' property="zillow_fb:description"/>,
<meta content="http://www.zillow.com/homedetails/1630-Amalfi-Dr-Pacific-Palisades-CA-90272/20546602_zpid/" property="og:url"/>,
<meta content="Pacific Palisades Home For Sale" property="og:title"/>,
<meta content="http://photos2.zillowstatic.com/p_d/IS5ypcj39edbdc1000000000.jpg" property="og:image"/>,
<meta content='For sale: $12,895,000. Stunning brand new Contemporary Cape Cod Estate in Palisades Riviera by Huntington Estate Homes w/ 7 beds, 10 baths, + office in 10,500 sq ft on an 18,590 sq ft lot. Soaring ceilings, magnificent chandelier, & floating staircase create a grand entrance w/ glass wine cellar, formal living & dining rooms. Floor plan flows openly between gourmet kitchen, family room, & patio with a set of disappearing Fleetwood Pocket doors. Fireplaces in living, family, & master suite add warmth to the contemporary feel, & detailed wood paneling & coffered ceilings enhance quality of design throughout. Master suite opens completely to sweeping ocean views & private patio. Lower level feats. Old Hollywood style theater w/130" screen, surround sound, stadium seats, floor-to-ceiling suede panels, exercise pool, spa, gym, office, guest beds, open air patio, & elevator access to take you from floor to floor. Perfect for entertaining - outdoor BBQ, seating, & saltwater pool/spa complete this elegant estate.' property="og:description"/>,
<meta content="https://videos.zillowstatic.com/production/07a58eebcafbfe833b92f17945131f2e251b5fe5/mp4_600k_landscape_z1/mp4_600k_landscape_z1.mp4" property="og:video"/>,
<meta content="https://videos.zillowstatic.com/production/07a58eebcafbfe833b92f17945131f2e251b5fe5/mp4_600k_landscape_z1/mp4_600k_landscape_z1.mp4" property="og:video:secure_url"/>,
<meta content="640" property="og:video:width"/>,
<meta content="video/mp4" property="og:video:type"/>,
<meta content="360" property="og:video:height"/>,
<meta content="238648973530.apps.googleusercontent.com" name="google-signin-clientid"/>,
<meta content="https://www.googleapis.com/auth/plus.login https://www.googleapis.com/auth/plus.profile.emails.read" name="google-signin-scope"/>,
<meta content="http://zillow.com" name="google-signin-cookiepolicy"/>,
<meta content="summary_large_image" name="twitter:card"/>,
<meta content="@Zillow" name="twitter:site"/>,
<meta content="@Zillow" name="twitter:creator"/>,
<meta content="1630 Amalfi Dr" name="twitter:title"/>,
<meta content="Stunning brand new Contemporary Cape Cod Estate in Palisades Riviera by Huntington Estate Homes w/ 7 beds, 10 baths, + office in 10,500 sq ft on an 18,590 sq ft lot. Soaring ceilings, magnificent chandelier, &amp; floating staircase create a grand entrance w/ glass wine cellar, formal living &amp; dining rooms. Floor plan flows openly between gourmet kitchen, family room, &amp; patio with a set of disappearing Fleetwood Pocket doors. Fireplaces in living, family, &amp; master suite add warmth to the contemporary feel, &amp; detailed wood paneling &amp; coffered ceilings enhance quality of design throughout. Master suite opens completely to sweeping ocean views &amp; private patio. Lower level feats. Old Hollywood style theater w/130&quot; screen, surround sound, stadium seats, floor-to-ceiling suede panels, exercise pool, spa, gym, office, guest beds, open air patio, &amp; elevator access to take you from floor to floor. Perfect for entertaining - outdoor BBQ, seating, &amp; saltwater pool/spa complete this elegant estate." name="twitter:description"/>,
<meta content="http://photos2.zillowstatic.com/p_d/IS5ypcj39edbdc1000000000.jpg" name="twitter:image"/>,
<meta content="1630 Amalfi Dr, Pacific Palisades, CA 90272" itemprop="name"/>,
<meta content="USD" itemprop="priceCurrency"/>,
<meta content="$12,895,000" itemprop="price"/>,
<meta content="34.060605" itemprop="latitude"/>,
<meta content="-118.501625" itemprop="longitude"/>]
我们可以使用属性提取价格和其他信息:
In [22]: soup = Soup(response.text,"lxml")
In [23]: soup.select_one("meta[itemprop=price]")["content"]
Out[23]: '$12,895,000'
In [24]: soup.select_one("meta[name=twitter:description]")["content"]
Out[24]: 'Stunning brand new Contemporary Cape Cod Estate in Palisades Riviera by Huntington Estate Homes w/ 7 beds, 10 baths, + office in 10,500 sq ft on an 18,590 sq ft lot. Soaring ceilings, magnificent chandelier, & floating staircase create a grand entrance w/ glass wine cellar, formal living & dining rooms. Floor plan flows openly between gourmet kitchen, family room, & patio with a set of disappearing Fleetwood Pocket doors. Fireplaces in living, family, & master suite add warmth to the contemporary feel, & detailed wood paneling & coffered ceilings enhance quality of design throughout. Master suite opens completely to sweeping ocean views & private patio. Lower level feats. Old Hollywood style theater w/130" screen, surround sound, stadium seats, floor-to-ceiling suede panels, exercise pool, spa, gym, office, guest beds, open air patio, & elevator access to take you from floor to floor. Perfect for entertaining - outdoor BBQ, seating, & saltwater pool/spa complete this elegant estate.'
In [27]: soup.select_one("meta[itemprop=latitude]")["content"]
Out[27]: '34.060605'
In [28]: soup.select_one("meta[itemprop=longitude]")["content"]
Out[28]: '-118.501625'
In [29]: soup.select_one("meta[property=og:zillow_fb:address]")["content"]
Out[29]: '1630 Amalfi Dr, Pacific Palisades, CA 90272'
关于python - 无法使用 BeautifulSoup 检索所需 XPATH 的元素,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37433366/
如果我使用下面的代码,数据将为零 dispatch_async(dispatch_get_global_queue(0,0), ^{ UIImage *img = [[UIImage allo
fread来自 data.table包一般可以在读取文件时自动确定列分隔符( sep )。 例如,这里fread自动检测 |作为列分隔符: library(data.table) fread(past
因此,如果我有一个如下所示的数据框: A B C rowname1 4.5 4 3.2 rowname2 3 23
我有一个汽车模型的搜索数据库:“日产Gtr”,“Huynday Elantra”,“Honda Accord”等。 现在我还有一个用户列表和他们喜欢的汽车类型 user1喜欢:carId:1234,c
我正在使用 Javamail 来获取一些电子邮件数据。我将用户输入作为电子邮件 ID、imap 地址和密码并连接到 imap。然后我监视收件箱的电子邮件并查明此人是否在“收件人”或“抄送”中。 Ema
我有一些数据,我想根据差距统计来评估最佳簇数。 我阅读了 gap statistic 上的页面在 r 中给出了以下示例: gs.pam.RU Number of clusters (method '
我有一个用户名和密码组合,我将使用它通过 java 代码访问安全服务器。 我的想法是: 在外部存储加密凭据 执行时提示用户输入解密密码 在使用前将解密的凭据直接存储在字符数组中 使用凭据连接到数据库
这是 Firebase 数据:[Firebase 数据][1] 我必须从员工那里检索所有字段并将其存储在一个数组中。 现在数据更改 toast 消息即将到来,但已经很晚了。 Firebase.setA
我是 iOS 的新手,正在开发一个基本的应用程序,它目前正在使用 SSKeychain 和 AFNetworking 与 API 进行交互。当您使用我检索的应用程序登录并在我的 CredentialS
编辑:这个问题已经在 apphacker 和 ConcernedOfTunbridgeWells 的帮助下得到解决。我已更新代码以反射(reflect)我将使用的解决方案。 我目前正在编写一个群体智能
我是 C 的新手,我想编写一个程序来检查用户输入的单词是否合法。我已经在 stackoverflow 上搜索了建议,但很多都是针对特定情况的。请在我被激怒之前,我知道这个语法不正确,但正在寻找一些关于
我相信你们中的一些人编写过 C# 类,这些类必须从数据库设置密码/从数据库获取密码。 我假设敏感细节不会以明文形式显示。处理此类数据的推荐程序是什么?检索到的文本是否加密?您是否将 pws 存储在加密
我在 linux 上使用 2.7 之前的 python 版本,想知道如何检索 RUID? 2.7 及更高版本从 os 包中获得了 getresuid,但我似乎找不到 2.6 的等效项 最佳答案 您可以
我已经在 Android 中实现了一个存储对象的标准 LRUCache。每个键都是与存储的对象关联的唯一 ObjectId。我的问题是从缓存中检索对象的唯一方法是通过 ObjectId(无迭代器)。实
这已经被问过很多次了。解决方案(对我有用)是从 packages.config 文件(这就足够了)和 packages 文件夹中删除 *** 包。 这对我来说是一个糟糕的解决方案,因为每次我想安装一些
我有以下文字: #{king} for a ##{day}, ##{fool} for a #{lifetime} 以及以下(损坏的)正则表达式: [^#]#{[a-z]+} 我想匹配所有#{word
我正在寻找一种快速(如高性能,而不是快速修复)解决方案来持久化和检索数千万个小型(大约 1k)二进制对象。每个对象都应该有一个用于检索的唯一 ID(最好是 GUID 或 SHA)。额外的要求是它应该可
有没有办法获取 RegInit 的重置值?通过探测产生的类型的成员?我可以看到 RegInit 将返回类型(例如 UInt )。例如,我将有一个寄存器,我想通过 regmap 对其进行控制。 val
Iv 目前接手了一个项目,其中开发人员在某些表的 json 数组列中存储了 has many 关系。 产品表 ---------------------------- id | product | c
Git 会在任何地方记录推送到远程的历史吗? 我注意到我们能够在 Microsoft VSTS 中查看 Git 存储库的推送历史记录以及每次推送的相关提交。它甚至显示旧的、过时的提交,由于后来的强制推
我是一名优秀的程序员,十分优秀!