gpt4 book ai didi

python - Unicode编码错误: 'ascii' codec can't encode character u'\u2019' in position 30339: ordinal not in range(128)

转载 作者:行者123 更新时间:2023-12-03 04:50:40 24 4
gpt4 key购买 nike

这是有问题的代码

启动 session 处理程序

session = requests.Session()

以编程方式获取 SAML 断言,打开初始 IdP url 并遵循所有 HTTP302 重定向,并获取生成的登录页面

formresponse = session.get(idpentryurl, verify=sslverification)

捕获idpauthformsubmiturl,这是所有302之后的最终URL

idpauthformsubmiturl = formresponse.url

解析响应并提取所有必要的值,以便构建 IdP 期望的所有表单值的字典

formsoup = BeautifulSoup(formresponse.text.decode('utf8'))
payload = {}

调试输出:

DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): myapps.microsoft.com
DEBUG:urllib3.connectionpool:https: //myapps.microsoft.com:443 "GET /signin/AWS%20CMD%20(Audit)/18216d2a-eef8-4fde-962c-50cf615f3f5b HTTP/1.1" 302 244
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): account.activedirectory.windowsazure.com
DEBUG:urllib3.connectionpool:https://account.activedirectory.windowsazure.com:443 "GET /applications/signin/AWS%20CMD%20(Audit)/18216d2a-eef8-4fde-962c-50cf615f3f5b HTTP/1.1" 302 94
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): login.microsoftonline.com
DEBUG:urllib3.connectionpool:https://login.microsoftonline.com:443 "GET /common/oauth2/authorize?client_id=0000000c-0000-0000-c000-000000000000&redirect_uri=https%3A%2F%2Faccount.activedirectory.windowsazure.com%2F&response_mode=form_post&response_type=code%20id_token&scope=openid%20profile&state=OpenIdConnect.AuthenticationProperties%3DmIDzRLZskQlxxtgB9rjxiHrNVmQJpcUVaK8wuZ3A2PMIyBE8fzxkXDcroNhC4wyof9OK9OlhqH0J_stoYSEIhKiEzx4O3XDW4rS4xyFTitGmztuV3ozOJhX5uafmQm_XmKnXEjEt9CNwFbp2Kju3rRGLAXRViD3byQ7XpwdXkeXoDFLwmy5OIXQgzvPjSsc7Jx7xEXMHckDwElhBOBFXmJVYCkHYx6cB-3yjwGJHX6RQ2lfx6CUg7x2PqPkbo4WsUxbZDAJZsMqYXyVRZGSDqAgU3gSezlHNgZGh-nblkxj7Dw6rdMVKmpNWZWkjp3zI3OjWa91FTrVc0mC9gIQC-BC4zaF-FrwQ4rHPbQlisQoS6-S1qM8ca_cEi6CfFaHh2lrtB-xdNEVum97Mzmlg9g&nonce=1507770263.sCv6L2a21eQuLNKaXL3zog&nux=1 HTTP/1.1" 200 15838
Traceback (most recent call last):
File "formauth.py", line 62, in <module>

formsoup = BeautifulSoup(formresponse.text.decode('utf8'))

File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)

**UnicodeEncodeError:** 'ascii' codec can't encode character u'\u2019' in position 30342: ordinal not in range(128)

尝试以下技巧没有帮助:在响应正文中用空格替换非 utf8 字符

formresponse.encoding = formresponse.apparent_encoding
formsoupba = bytearray(formresponse.text, 'utf8')
for i, val in enumerate(formsoupba):
if val > 128:
formsoupba[i] = 32
formsoup = BeautifulSoup(formsoupba.decode('utf8'), "html.parser")

将产生以下错误:

返回codecs.utf_8_decode(输入,错误,True)UnicodeDecodeError:“utf8”编解码器无法解码位置 30334 中的字节 0x80:起始字节无效

如有任何帮助,我们将不胜感激

最佳答案

您正在尝试将 unicode 字符( \u2019 ,引号)解码为 utf-8 ,应该可以正常工作。然后有什么东西试图将其编码回 ascii - 也许是某个 bs4 解析器?

没关系 - 如果你愿意失去奇怪的东西,这里是霰弹枪方法 字符:

clean_text = formresponse.text.encode('utf8').decode('ascii', 'ignore')
formsoup = BeautifulSoup(clean_text, "html.parser")

这只是ignore任何编码错误,意味着您丢失了字符。查看此处的文档,了解忽略之外的其他一些选项:https://docs.python.org/2/library/codecs.html

更深入的方法是找到页面的实际编码 - https://login.microsoftonline.com:443声称是<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> ,但如果它包含这些类型的字符,显然不是。我认为这可能会让 BeautifulSoup 失望。尝试给 bs4 一些不同的编码,例如 cp1252latin-1 .

关于python - Unicode编码错误: 'ascii' codec can't encode character u'\u2019' in position 30339: ordinal not in range(128),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46699920/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com