python - 在 Python 3 中从服务器返回回复时引发 BadStatusLine 异常-6ren

python - 在 Python 3 中从服务器返回回复时引发 BadStatusLine 异常

转载作者：太空狗更新时间：2023-10-29 22:30:38

28

4

我正在尝试将一个脚本移植到 python 3，该脚本提交在这里找到的 XML 提要:

https://developers.google.com/search-appliance/documentation/files/pushfeed_client.py.txt

在运行 2to3.py 并进行一些小的调整以消除任何语法错误后，脚本失败并显示以下内容:

(py33dev) d:\dev\workspace>python pushfeed_client.py --datasource="TEST1" --feedtype="full" --url="http://gsa:19900/xmlfeed" --xmlfilename="test.xml"
Traceback (most recent call last):
  File "pushfeed_client.py", line 108, in <module>
    main(sys.argv)
  File "pushfeed_client.py", line 56, in main
    result = urllib.request.urlopen(request_url)
  File "C:\Python33\Lib\urllib\request.py", line 156, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Python33\Lib\urllib\request.py", line 469, in open
    response = self._open(req, data)
  File "C:\Python33\Lib\urllib\request.py", line 487, in _open
    '_open', req)
  File "C:\Python33\Lib\urllib\request.py", line 447, in _call_chain
    result = func(*args)
  File "C:\Python33\Lib\urllib\request.py", line 1268, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "C:\Python33\Lib\urllib\request.py", line 1253, in do_open
    r = h.getresponse()
  File "C:\Python33\Lib\http\client.py", line 1147, in getresponse
    response.begin()
  File "C:\Python33\Lib\http\client.py", line 358, in begin
    version, status, reason = self._read_status()
  File "C:\Python33\Lib\http\client.py", line 340, in _read_status
    raise BadStatusLine(line)
http.client.BadStatusLine: <!DOCTYPE html>

为什么在服务器的响应中返回异常？这是我嗅探 session 时 GSA 的完整响应:

<!DOCTYPE html>
<html lang=en>
  <meta charset=utf-8>
  <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
  <title>Error 400 (Bad Request)!!1</title>
  <style>
    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}
  </style>
  <a href=//www.google.com/><img src=//www.google.com/images/errors/logo_sm.gif alt=Google></a>
  <p><b>400.</b> <ins>That’s an error.</ins>
  <p>Your client has issued a malformed or illegal request.  <ins>That’s all we know.</ins>

它确实返回了一个 HTTP 400。只要 XML 负载中有一个 utf-8 字符，我就可以可靠地导致这个问题。当它是普通的 ascii 时，它可以完美地工作。这是我可以用来可靠地重现问题的最基本的代码版本:

import http.client
http.client.HTTPConnection.debuglevel = 1
with open("GSA_full_Feed.xml", encoding='utf-8') as xdata:
    payload = xdata.read()
content_length = len(payload)
feed_path = "xmlfeed"
content_type = "multipart/form-data; boundary=----------boundary_of_feed_data$"
headers = {"Content-type": content_type, "Content-length": content_length}
conn = http.client.HTTPConnection("gsa", 19900)
conn.request("POST", feed_path, body=payload.encode("utf-8"), headers=headers)
res = conn.getresponse()
print(res.read())
conn.close()

下面是一个用于引发异常的示例 XML 负载:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE gsafeed PUBLIC "-//Google//DTD GSA Feeds//EN" "gsafeed.dtd">
<gsafeed>
  <header>
    <datasource>TEST1</datasource>
    <feedtype>full</feedtype>
  </header>
  <group>
    <record action="add" mimetype="text/html" url="https://myschweetassurl.com">
      <metadata>
        <meta content="shit happens, then you die" name="description"/>
      </metadata>
      <content>wacky Umläut test of non utf-8 characters</content>
    </record>
  </group>
</gsafeed>

我能在第 2 和第 3 版本之间找到的唯一差异是每个请求的内容长度 header 。 Python 3 版本始终比 2 版本短，870 对 873。

最佳答案

经过大量的 wiresharking，我找出了问题的原因和解决方案，即内容长度 header 的设置方式。在脚本的 Python 3 端口中，我复制了设置内容长度的方法。这是哪个:

headers['Content-length']=str(len(body))

这是不正确的!正确的方法是这样的:

headers['Content-length']=str(len(bytes(body, 'utf-8')))

因为有效载荷必须是字节对象。当您对其进行字节编码时，长度与字符串版本不同。

return urllib.request.Request(theurl, bytes(body, 'utf-8'), headers)

当使用从 http.client.HTTPConnection 派生的任何内容时，您可以安全地省略手动设置内容长度 header 。它有一个检查内容长度 header 的内部方法，如果缺少，则根据内容主体的长度设置它，而不考虑形式。

问题是 Python 2 和 3 之间的翻译但细微差别以及它如何处理字符串和编码它们。正常的 ASCII 版本可以工作而 utf-8 版本不能工作，这一定是某种侥幸，哦，好吧。

关于python - 在 Python 3 中从服务器返回回复时引发 BadStatusLine 异常，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/21297408/

28

4

0

文章推荐： python - 在构建 RPM 包时传送 *.so 和二进制文件

文章推荐： c# - 什么是 vNext 控制台应用程序？

文章推荐： python - 如何存储和打印前 20% 的特征名称和分数？

python - 错误 : httplib. py in_read_status raise BadStatusLine(line) BadStatusLine on ubuntu
我在 python 中使用 selenium 抓取一个网站，当我在 Windows 上运行相同的脚本时我得到了想要的结果，但是在 ubuntu 16.04 中当我运行相同的脚本时它抛出错误: File
python - 泡沫错误 : BadStatusLine in httplib
我正在使用 suds 0.3.6。创建 suds 客户端时，随机出现错误: httplib.py，_read_status()，第 355 行，类 httplib.BadStatusLine' 这是用
Python/Django "BadStatusLine"错误
我遇到了一个奇怪的错误，我似乎找不到解决方案。这个错误不会在我每次点击这段代码时发生，也不会在循环中的同一次迭代中发生(它发生在一个循环中)。如果我运行够了，它似乎没有遇到错误，程序执行成功。无论如
python - httplib.BadStatusLine : ''
一如既往，我经常遇到问题，我已经彻底搜索了当前问题的答案，但发现自己一头雾水。以下是我搜索过的一些地方:- How to fix httplib.BadStatusLine exception?- P
python - Selenium - 引发 httplib.BadStatusLine
我将 PhantomJS 与 Selenium 结合使用，我想在 stackoverflow 上进行多次搜索。这段代码在我的本地电脑上运行良好，当我将它更改为内存较少的服务器时，它会引发 httpli
python-requests 认证代理 httplib.BadStatusLine
通过 python-requests 进行身份验证的代理返回以下错误: >>> import requests >>> proxies = {'https': 'http://username:pas
python - Selenium 的 BadStatusLine 错误
我正在尝试使用 Selenium 和 BeautifulSoup 抓取 Google Chrome 扩展商店的评论。但是，即使使用最新版本的 Chromedriver，我似乎也无法启动和运行 Sele
python - BadStatusLine 错误后重启 Python 脚本
我这里有一个程序可以传输市场价格并根据价格执行订单，但是，每隔一段时间(几个小时左右)它就会抛出这个错误: Exception in thread Thread-1: Traceback (most
python - Grooveshark 提前关闭连接 (httplib.BadStatusLine)
我正在尝试连接到groovyshark。因为 python 是我选择的语言。但我已经碰壁了。看来groveshark最近改变了他们的协议(protocol)的一部分，或者我可能遇到了python的限制
python - 如何修复 httplib.BadStatusLine 异常？
URL = "MY HTTP REQUEST URL" XML = "0" parameter = urllib.urlencode({'XML': XML}) response = urllib.u
python - 连接中止 .', BadStatusLine("''",) 在服务器上？
我使用以下代码从网络获取图像: path = 'http://domgvozdem.ru/images/ustanovka-kondicionera-svoimi-rukami.jpg' def ex
exception - urllib 异常 http.client.BadStatusLine
我一生都无法弄清楚为什么我不能捕获这个异常。查看此处this guide . def get_team_names(get_team_id_url, team_id): print(get_
某些 DELETE 请求的 Python BadStatusLine 错误
我正在尝试使用 python-rest-client ( http://code.google.com/p/python-rest-client/wiki/Using_Connection ) 来执行
python - 正确处理连接错误 : Connection aborted, BadStatusLine ("' ",)
我一直在使用 Python 中的 requests 库查询 Web 服务器上的数据。我收到以下错误: ConnectionError: ('Connection aborted.', BadStatu
python - 为什么 python 请求抛出这个 BadStatusLine 异常
在 python 中，如果我导入请求并执行: t = requests.get("http://www.azlyrics.com/u/urban.html") 我得到这个异常: raise BadSt
python - 处理 urllib2 返回的 badstatusline(line)？
我有一个简单的互联网检查器正在运行，但它偶尔会返回一个我似乎无法处理的错误... 函数如下: def internet_on(): try: urllib2.urlopen("
Python + split : Error - httplib. BadStatusLine: ''
在我的 python 项目中，我使用 Splinter ( https://splinter.readthedocs.io/en/latest/ ) 打开浏览器并尝试访问网站: from splint
python - httplib.BadStatusLine : '' on Linux but not Mac
这个错误已经困扰我几个小时了。我决定编写一个单独的项目，看看我是否可以复制它，我可以，但只能在我的服务器上。这适用于我的 Mac。 Mac:OSX El Capitan 10.11.6 服务器:Cen
python - 使用 Python 时的 BadStatusLine 错误，请求
我在 Python 中使用请求，但总是遇到 BadStatusLine 错误。我的代码如下: import requests ip = 'xx.xx.xx.xx' port = 80 proxies
Python，请求，错误= http.client.BadStatusLine :
尝试从 URL 获取数据，但请求遇到问题。我认为它与页面或数据的格式(没有)有关？我使用的是Python 3.6.5，请求2.20.0 import requests r = requests.ge

首页

博学

6Ren·AI

商城

python - 在 Python 3 中从服务器返回回复时引发 BadStatusLine 异常