- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我在运行代码时遇到了一些麻烦。我正在尝试使用 selenium、beautiful soup 和 python 来抓取叠加层或灯箱的内容。我不确定叠加层是如何创建的,但我认为它是 ajax
当我运行以下 python 2.7 代码时,firefox 浏览器打开,导航到该页面,单击正确的链接并向用户显示叠加层,我可以使用 Firefox 检查其标签和标记,但我不能'不知道如何让 python 访问覆盖层。
新手将不胜感激
#Import the beautiful soup library
from bs4 import BeautifulSoup
# import urllib2 library to actually go get the webpage for Beautiful Soup
import urllib2
#Import Selenium and the code needed to wait for the page to load
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
URLToParse ='http://courses.it-tallaght.ie/'
#Open the webpage using Soup to get the list of departments so we can iterate on them
soup = BeautifulSoup(urllib2.urlopen(URLToParse))
#Open the webpage using selenium
driver = webdriver.Firefox()
driver.get(URLToParse)
subset = driver.find_element_by_id('homeProgrammes')
#Just get the part of the document that contains the list of department
Depts = soup.find(id="homeProgrammes")
# For all the links in the div with id homeProgrammes
for links in Depts.findAll('a'):
#Using selenium find the link to the depts list of courses that matches the link string from beautiful soup and click it
FollowLink = subset.find_element_by_link_text(links.string)
FollowLink.click()
# Try waiting 10 seconds for the element with ID 'ProgrammeListForDepartment' becomes available and print the contents using prettify
try:
element = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID, 'ProgrammeListForDepartment')))
Overlay = BeautifulSoup(driver.find_element_by_id('ProgrammeListForDepartment'))
print(Overlay.prettify())
except NoSuchElementException as e:
print(NoSuchElementException.msg())
最佳答案
你根本不需要BeautifulSoup
。 Selenium
本身在 locating elements 中非常强大.
这是遍历所有部门的工作代码,单击每个部门,提取类(class)列表并关闭叠加窗口。结果被收集到字典中:
from pprint import pprint
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url ='http://courses.it-tallaght.ie/'
driver = webdriver.Firefox()
driver.get(url)
courses = {}
for department_link in driver.find_elements_by_css_selector("div#homeProgrammes a[onclick]"):
department = department_link.text
# open department
department_link.click()
# grab a list of courses
overlay = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID, 'ProgrammeListForDepartment')))
courses[department] = [course_link.text for course_link in overlay.find_elements_by_css_selector("ol > li > a")]
# close department
overlay.find_element_by_link_text("close").click()
pprint(courses)
driver.close()
它打印:
{u'Accounting & Prof. Studies': [u'Accounting Technician (ATI)',
u'APICS Certificate in Production and Inventory Management (CPIM)',
u'APICS Certified Supply Chain professional (CSCP)',
u'Bachelor of Business (Honours) in Accounting & Finance',
u'Bachelor of Business (Honours) in Accounting & Finance',
u'Bachelor of Business in Accounting & Finance',
u'Bachelor of Business in Accounting & Finance',
u'Foundation Certificate in Personnel Practice (CIPD)',
u'Foundation Diploma in Human Resource Practice (CIPD)',
u'Higher Certificate in Business in Accounting',
u'Higher Certificate in Business in Real Estate (Valuation, Sale and Management)'],
u'Computing': [u'Bachelor of Science (Honours) in Computing',
u'Bachelor of Science (Honours) in Computing',
u'Bachelor of Science (Honours) in IT Management',
u'Bachelor of Science (Honours) IT Management',
u'Bachelor of Science in Computing',
u'Bachelor of Science in Computing',
u'Bachelor of Science in IT Management',
u'Certificate in Cloud Computing Applications Development',
u'Certificate in Cloud Computing Infrastructure Management',
u'Certificate in Fundamentals of Software Development (Minor Award)',
u'Certificate in Network Design and Implementation',
u'Higher Certificate in Science in Information Technology',
u'Higher Certificate in Science in IT Management',
u'Higher Diploma in Science in Computing',
u'M. Sc. in Distributed and Mobile Computing',
u'M.Sc. in Information Technology Management',
u'PhD in Information Technology',
u'Postgraduate Diploma in Distributed and Mobile Computing',
u'Postgraduate Diploma in Information Technology Management',
u'Postgraduate Diploma in Science in Info Technology Management Information Technology Management'],
u'Electronic Engineering': [u'Bachelor Degree in Engineering (Honours) in Electronic Engineering',
u'Bachelor of Engineering (Honours) in Electronic Engineering',
u'Bachelor of Engineering in Electronic Engineering',
u'Bachelor of Engineering In Electronic Engineering',
u'Cisco CCNA Routing & Switching',
u'Higher Certificate in Engineering in Electronic Engineering',
u'Masters of Engineering in Electronic Engineering in Electronic System Design',
u'Single Subject Certificate Structured Analogue Design'],
u'External Services': [u'Access English',
u'Pre-Start Academic English',
u'Pre-Start Maths'],
u'Humanities': [u'Bachelor of Arts (Honours) in Creative Digital Media',
u'Bachelor of Arts (Honours) in European Studies',
u'Bachelor of Arts (Honours) International Hospitality & Tourism Management',
u'Bachelor of Arts (Honours) Social Care Practice',
u'Bachelor of Arts (Ordinary) International Hospitality and Tourism Management',
u'Bachelor of Arts in Culinary Arts',
u'Bachelor of Arts in International Hospitality and Tourism Management',
u'English as a Foreign Language',
u'Higher Cert in Arts in International Hospitality & Tourism Operati in Int Hosp & Tourism Operations',
u'Higher Certificate in Arts in Culinary Arts'],
u'Management': [u'Bachelor of Business (Honours) in Management',
u'Bachelor of Business (Honours) in Management',
u'Bachelor of Business in Management',
u'Bachelor of Science (Honours) in the Management of Innovation and Technology',
u'Bachelor of Science in the Management of Innovation and Technology',
u'Higher Certificate in Business in Business Administration',
u'International Digital Management & Sales',
u'TA_BMNGT_D - Bachelor of Business in Management'],
u'Marketing': [u'Bachelor of Arts (Honours) in Advertising & Marketing Communications',
u'Bachelor of Arts in Advertising and Marketing Communications',
u'Bachelor of Business (Honours) in Marketing',
u'Bachelor of Business (Honours) in Marketing Management',
u'Bachelor of Business in Marketing',
u'Bachelor of Business in Marketing',
u'BSc in Data Analytics with Digital Marketing',
u'Higher Certificate in Business in Marketing',
u'Higher Diploma in Business in Marketing'],
u'Mechanical Engineering': [u'B.Eng(Hons) in Mechanical Engineering',
u'Bachelor of Engineering (Honours) in Mechanical Engineering',
u'Bachelor of Engineering in Energy and Environmental Engineering',
u'Bachelor of Engineering in Mechanical Engineering',
u'Bachelor of Science (Honours) in Energy Systems Engineering',
u'Bachelor of Science (Hons) in Energy Systems Engineering',
u'Certificate in Project Management (IPMA)',
u'EIQA Diploma in Quality Management Quality Management',
u'Higher Certificate in Engineering in Mechanical Engineering',
u'Master of Engineering in Mechanical Engineering'],
u'Science': [u'Bachelor of Science (Honours) in Bioanalytical Science',
u'Bachelor of Science (Honours) in Bioanalytical Science',
u'Bachelor of Science (Honours) in Pharmaceutical Science',
u'Bachelor of Science (Hons) in Sports Science and Health',
u'Bachelor of Science (Hons) in Sports Science and Health (1 Year Add-On)',
u'Bachelor of Science Hons in DNA and Forensic Analysis',
u'Bachelor of Science in Bio Analysis (1 year add-on Bachelor Degree)',
u'Bachelor of Science in Bioanalysis or Chemical Analysis',
u'Bachelor of Science in Chemical Analysis',
u'Bachelor of Science in DNA and Forensic Analysis',
u'Bachelor of Science in Pharmaceutical Science',
u'Bachelor of Science in Pharmaceutical Technology',
u'Bachelor of Science in Sports Science and Health',
u'Bachelor of Science in Sterile Services Management',
u'Certificate in Bioprocessing and Cleanroom Management - Minor Award',
u'Certificate in Food Science and Technology Minor Award',
u'Certificate in GMP & Regulatory Affairs (MIN) in GMP & Technology',
u'Certificate in GMP and Medical Device Manufacture (Minor Award)',
u'Copy of TA_SSPPM_B - Certificate in Pharmaceutical and Medical Device Manufacturing (Special Purpose Award)v2',
u'Higher Certificate in Science Contamination Control and Asepsis for the Healthcare Sector',
u'Higher Certificate in Science in Bio & Pharmaceutical Analysis',
u'Higher Certificate in Science in GMP and Technology',
u'Higher Certificate in Science in Process Technologies',
u'Higher Diploma in Food Science and Technology',
u'Higher Diploma in Science in Pharmaceutical Manufacturing',
u'Masters in Pharmaceutical Manufacturing & Process Technology',
u'PhD in Science in Biology',
u'PhD in Science in Chemistry']}
关于python - 在 Python 中使用 Selenium、Beautiful Soup 抓取灯箱覆盖,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33897934/
我注意到一个非常烦人的错误:BeautifulSoup4(包:bs4)经常发现比以前版本(包:BeautifulSoup)更少的标签。 这是该问题的一个可重现的实例: import requests
我正在尝试从具有我所知道的特定ID的表中获取数据。 由于某种原因,该代码不断给我“无”结果。 我正在尝试从HTML代码中解析: שווי שוק (אלפי ש"ח)
我正在尝试从包含以下 HTML 的网站中提取价格: $ 29.99 我正在使用以下 Beautiful Soup 代码: book_prices = soup_pack
我做了一个网络爬虫,它从一个文本文件中获取数千个 Urls,然后爬取该网页上的数据。 现在它有很多网址;一些网址也被破坏了。 所以它给了我错误: Traceback (most recent call
我正在尝试加载 html 页面并输出文本,即使我正确获取网页,BeautifulSoup 以某种方式破坏了编码。 来源: # -*- coding: utf-8 -*- import requests
目录 beautiful soup库的安装 beautiful soup库的理解 beautiful soup库的引用 BeautifulSoup类
Beautiful Soup就是Python的一个HTML或XML的解析库,可以用它来方便地从网页中提取数据。它有如下三个特点: Beautiful Soup提供一些简单的、Python式的
题目地址:https://leetcode.com/problems/beautiful-arrangement/description/ 题目描述 Suppose you have N inte
题目地址:https://leetcode.com/problems/beautiful-array/description/ 题目描述 Forsome fixed N, an array A i
您好,我正在尝试从网站获取一些信息。请原谅我,如果我的格式有任何错误,这是我第一次发布到 SO。 soup.find('div', {"class":"stars"}) 从这里我收到 我需要 “
我想从 Google Arts & Culture 检索信息使用 BeautifulSoup。我检查了许多 stackoverflow 帖子( [1] , [2] , [3] , [4] , [5]
我决定学习 Python,因为我现在有更多时间(由于大流行)并且一直在自学 Python。 我试图从一个网站上刮取税率,几乎可以获得我需要的一切。下面是来自我的 Soup 变量以及相关 Python
我正在使用 beautifulsoup 从页面中获取所有链接。我的代码是: import requests from bs4 import BeautifulSoup url = 'http://ww
我正在使用react-beautiful-dnd版本8.0.5(最新)并尝试渲染可重组列表,但我不断收到此错误: Warning: React.createElement: type is inval
我在将组件放入应用程序的下一个列表区域时遇到困难。我可以在父列中完美地拖放和排序,但无法将组件放在其他地方。这是我的 onDragEnd 函数中的代码: onDragEnd = result =>
发生的情况是,当我在一列中有多个项目并尝试拖动其中一个时,只显示一个项目,并且根据发现的经验教训 here我应该处于可以移动同一列内的项目但不能移动的位置。在 React 开发工具中,state 和
我正在尝试根据部分属性值来识别 html 文档中的标签。 例如,如果我有一个 Beautifulsoup 对象: import bs4 as BeautifulSoup r = requests.ge
Показать телефон 如何在 Beautiful Soup 中找到上述元素? 我尝试了以下方法,但没有奏效: show = soup.find('div', {'class': 'acti
我如何获得结果网址:https://www.sec.gov/Archives/edgar/data/1633917/000163391718000094/0001633917-18-000094-in
我是 python 新手,尝试从页面中提取表格,但无法使用 BS4 找到该表格。你能告诉我我哪里出错了吗? import requests from bs4 import BeautifulSoup
我是一名优秀的程序员,十分优秀!