gpt4 book ai didi

Python:如何解析需要登录的网页的 HTML?

转载 作者:太空宇宙 更新时间:2023-11-04 14:22:01 24 4
gpt4 key购买 nike

我正在尝试解析需要登录的网页的 HTML。我可以使用此脚本获取网页的 HTML:

from urllib2 import urlopen
from BeautifulSoup import BeautifulSoup
import re

webpage = urlopen ('https://www.example.com')
soup = BeautifulSoup (webpage)
print soup
#This would print the source of example.com

但事实证明,尝试获取我登录的网页的来源更加困难。我尝试将 ('https://www.example.com') 替换为 ('https://user:pass@example.com'),但出现无效 URL 错误。

有人知道我该怎么做吗?提前致谢。

最佳答案

Selenium WebDriver ( http://seleniumhq.org/projects/webdriver/ ) 可能适合您的需求。您可以登录该页面,然后打印 HTML 的内容。这是一个例子:

from selenium import webdriver

# initiate
driver = webdriver.Firefox() # initiate a driver, in this case Firefox
driver.get("http://example.com") # go to the url

# locate the login form
username_field = driver.find_element_by_name(...) # get the username field
password_field = driver.find_element_by_name(...) # get the password field

# log in
username_field.send_keys("username") # enter in your username
password_field.send_keys("password") # enter in your password
password_field.submit() # submit it

# print HTML
html = driver.page_source
print html

关于Python:如何解析需要登录的网页的 HTML?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/9387500/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com