gpt4 book ai didi

python - 使用 Mechanize 无法访问完整的网页

转载 作者:行者123 更新时间:2023-12-04 16:21:18 24 4
gpt4 key购买 nike

我试图保存 usautoforce 的主页使用 mechanize.@Ertugrul 根据您的回答,我有完整的页面。但是当我尝试访问用户名和密码字段时,它给出了一个错误。我已经将所有只读设置为 false。当我在编辑器中打开网页时,没有引用用户名和密码的 html
这是我在 Mechanize 中的代码,

br = mechanize.Browser()


br.set_handle_equiv(True)
br.set_handle_redirect(True)
br.set_handle_robots(False)
#br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
br.addheaders = [('User-Agent', 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0'), ('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'),('Upgrade-Insecure-Requests','1'),('Connection','keep-alive')]

br.open("http://www.usautoforce.com/Pages/home.aspx")
br.set_handle_robots(False)
print br.response
time.sleep(9)

latest_index = 0
html_replaced = ""
html = br.response().read()


for m in re.finditer('(href|src)(=")(/[^"]+")', html):
html_replaced += html[latest_index:m.start()] + m.groups()[0]+m.groups()[1] + 'http://www.usautoforce.com' + m.groups()[2]
latest_index = m.end()


f=open("us.html","w")
f.write(html_replaced)
f.close()

print [form for form in br.forms()][0]

br.set_handle_robots(False)
print br.response
time.sleep(9)
html = br.response().read()

br.select_form(nr=0)
time.sleep(2)

#for control in br.form.controls:
# print control
# print "type=%s, name=%s value=%s" % (control.type, control.name, br[control.name])

br.form.set_all_readonly(False)
br.form["nexpartuname"] = "abc"

br.form["pwd"] = "xyz"
br.submit()

这是错误:
  File "haha.py", line 60, in <module>
br.form["nexpartuname"] = "clack"
File "/usr/lib/python2.7/site-packages/mechanize/_form.py", line 2775, in __setitem__
control = self.find_control(name)
File "/usr/lib/python2.7/site-packages/mechanize/_form.py", line 3096, in find_control
return self._find_control(name, type, kind, id, label, predicate, nr)
File "/usr/lib/python2.7/site-packages/mechanize/_form.py", line 3180, in _find_control
raise ControlNotFoundError("no control matching "+description)
mechanize._form.ControlNotFoundError: no control matching name 'nexpartuname'

最佳答案

Mechanize 不执行 javascript。您尝试访问的站点也显示“请启用脚本...”。

由于没有办法在 Mechanize 中启用 js,我个人建议您使用 phantomjs。

但这里真正的问题不是 javascript,而是 url。由于该网站中的 url 是相对的,因此当您下载并打开 html 代码时,它不会按预期运行。

您必须将所有相对 url 转换为绝对 url。在将 html 写入文件之前使用此代码。将 html_replaced str 而不是 html str 写入文件。

latest_index = 0
html_replaced = ""

for m in re.finditer('(href|src)(=")(/[^"]+")', html):
html_replaced += html[latest_index:m.start()] + m.groups()[0]+m.groups()[1] + 'http://www.usautoforce.com' + m.groups()[2]
latest_index = m.end()

关于python - 使用 Mechanize 无法访问完整的网页,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41142225/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com