- c - 在位数组中找到第一个零
- linux - Unix 显示有关匹配两种模式之一的文件的信息
- 正则表达式替换多个文件
- linux - 隐藏来自 xtrace 的命令
我正在使用 Python 2.7 和 Mechanize 2.5。我正在尝试使用 select_form() 方法,但出现以下错误:
File "C:\Python27\lib\site-packages\mechanize\_mechanize.py", line 499, in select_form
global_form = self._factory.global_form
File "C:\Python27\lib\site-packages\mechanize\_html.py", line 544, in __getattr__
self.forms()
File "C:\Python27\lib\site-packages\mechanize\_html.py", line 557, in forms
self._forms_factory.forms())
File "C:\Python27\lib\site-packages\mechanize\_html.py", line 237, in forms
_urlunparse=_rfc3986.urlunsplit,
File "C:\Python27\lib\site-packages\mechanize\_form.py", line 845, in ParseResponseEx
_urlunparse=_urlunparse,
File "C:\Python27\lib\site-packages\mechanize\_form.py", line 982, in _ParseFileEx
fp.feed(data)
File "C:\Python27\lib\site-packages\mechanize\_form.py", line 759, in feed
_sgmllib_copy.SGMLParser.feed(self, data)
File "C:\Python27\lib\site-packages\mechanize\_sgmllib_copy.py", line 110, in feed
self.goahead(0)
File "C:\Python27\lib\site-packages\mechanize\_sgmllib_copy.py", line 144, in goahead
k = self.parse_starttag(i)
File "C:\Python27\lib\site-packages\mechanize\_sgmllib_copy.py", line 302, in parse_starttag
self.finish_starttag(tag, attrs)
File "C:\Python27\lib\site-packages\mechanize\_sgmllib_copy.py", line 347, in finish_starttag
self.handle_starttag(tag, method, attrs)
File "C:\Python27\lib\site-packages\mechanize\_sgmllib_copy.py", line 387, in handle_starttag
method(attrs)
File "C:\Python27\lib\site-packages\mechanize\_form.py", line 736, in do_option
_AbstractFormParser._start_option(self, attrs)
File "C:\Python27\lib\site-packages\mechanize\_form.py", line 481, in _start_option
raise ParseError("OPTION outside of SELECT")
ParseError: OPTION outside of SELECT
这是我的代码:
cj = cookielib.LWPCookieJar()
br = mechanize.Browser()
br.set_cookiejar(cj)
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
br.addheaders = [('User-Agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
br.open("website_url_which_i_will_not_share")
br.select_form(nr=0)
以下是我打开的网页html中的表单部分
<html lang="en-us" xml:lang="en-us" xmlns="http://www.w3.org/1999/xhtml">
<head> I omitted this section </head>
<body class="login">
<div id="container">
<div id="header" style="background-color: #13397A;">
<div id="content" class="colM">
<div id="content-main">
<form id="login-form" method="post" action="/admin/">
<div style="display:none">
<input type="hidden" value="8a689f2e3d215a3465f1bb66e037d1a5" name="csrfmiddlewaretoken">
</div>
<div class="form-row">
<label class="required" for="id_username">Username:</label>
<input id="id_username" type="text" maxlength="30" name="username">
</div>
<div class="form-row">
<label class="required" for="id_password">Password:</label>
<input id="id_password" type="password" name="password">
<input type="hidden" value="1" name="this_is_the_login_form">
<input type="hidden" value="/admin/" name="next">
</div>
<div class="submit-row">
<label> </label>
<input type="submit" value="Log in">
</div>
</form>
<script type="text/javascript">
</div>
<br class="clear">
</div>
<div id="footer"></div>
</div>
<script type="text/javascript">
</body>
</html>
我已经在 stackoverflow 和谷歌上对此进行了研究,但找不到类似的问题,甚至找不到此错误的描述。
如果有人能告诉我这个错误是什么意思并帮助我找出问题所在,我将不胜感激。
谢谢
编辑:我一直在做大量的表单提交工作,除了这个以外,每个网站都运行良好。它是一个数据库 API,我正试图从中抓取数据。
最佳答案
我遇到了同样的问题(不幸的是还没有解决),我发现了这段有趣的代码,它可能会有所帮助
来自 http://comments.gmane.org/gmane.comp.python.wwwsearch.general/1991 (参见 archive.org version)
import mechanize
from BeautifulSoup import BeautifulSoup
class SanitizeHandler(mechanize.BaseHandler):
def http_response(self, request, response):
if not hasattr(response, "seek"):
response = mechanize.response_seek_wrapper(response)
#if HTML used get it though a robust Parser like BeautifulSoup
if response.info().dict.has_key('content-type') and ('html' in response.info().dict['content-type']):
soup = BeautifulSoup(response.get_data())
response.set_data(soup.prettify())
return response
br = mechanize.Browser()
br.add_handler(SanitizeHandler())
# Now you get good HTML
这应该覆盖 http_response 方法并“清理”您的 html。
关于Python Mechanize select_form() - ParseError : OPTION outside of SELECT,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11659268/
当尝试使用mechanize时即使 forms() 中有一个对象生成器,我似乎无法使用 select_form(id) 来检索表单。 代码: import mechanize urls = ['htt
我正在尝试从网站上抓取一些数据。 我正在尝试编写的脚本应该获取页面的内容: http://www.atpworldtour.com/Rankings/Singles.aspx 应该模拟用户通过附加排名
我正在使用 Python 2.7 和 Mechanize 2.5。我正在尝试使用 select_form() 方法,但出现以下错误: File "C:\Python27\lib\site-pack
我是一名优秀的程序员,十分优秀!