gpt4 book ai didi

python urllib.parse.urljoin 在以数字和冒号开头的路径上

转载 作者:太空宇宙 更新时间:2023-11-04 04:17:10 28 4
gpt4 key购买 nike

请问,这是怎么回事?

>>> import urllib.parse
>>> base = 'http://example.com'
>>> urllib.parse.urljoin(base, 'abc:123')
'http://example.com/abc:123'
>>> urllib.parse.urljoin(base, '123:abc')
'123:abc'
>>> urllib.parse.urljoin(base + '/', './123:abc')
'http://example.com/123:abc'

python3.7 文档说:

Changed in version 3.5: Behaviour updated to match the semantics defined in RFC 3986.

该 RFC 的哪一部分实现了这种疯狂行为,是否应将其视为错误?

最佳答案

RFC 的哪一部分强制实现了这种疯狂做法?

此行为正确并且与其他实现一致,如RFC3986所示:

A path segment that contains a colon character (e.g., "this:that") cannot be used as the first segment of a relative-path reference, as it would be mistaken for a scheme name. Such a segment must be preceded by a dot-segment (e.g., "./this:that") to make a relative-path reference.

已经在另一个post中讨论过了:

Colons are allowed in the URI path. But you need to be careful when writing relative URI paths with a colon since it is not allowed when used like this:

<a href="tag:sample">

In this case tag would be interpreted as the URI’s scheme. Instead you need to write it like this:

<a href="./tag:sample">

使用urljoin

函数 urljoin 只是将两个参数都视为 URL(没有任何假设)。它要求它们的方案相同或第二个方案代表一个相对 URI 路径。否则,它只返回第二个参数(恕我直言,它应该会引发错误)。您可以通过查看 source of urljoin 来更好地理解逻辑。 .

def urljoin(base, url, allow_fragments=True):
"""Join a base URL and a possibly relative URL to form an absolute
interpretation of the latter."""
...
bscheme, bnetloc, bpath, bparams, bquery, bfragment = \
urlparse(base, '', allow_fragments)
scheme, netloc, path, params, query, fragment = \
urlparse(url, bscheme, allow_fragments)

if scheme != bscheme or scheme not in uses_relative:
return _coerce_result(url)

解析例程urlparse的结果如下:

>>> from urllib.parse import urlparse
>>> urlparse('123:abc')
ParseResult(scheme='123', netloc='', path='abc', params='', query='', fragment='')
>>> urlparse('abc:123')
ParseResult(scheme='', netloc='', path='abc:123', params='', query='', fragment='')
>>> urlparse('abc:a123')
ParseResult(scheme='abc', netloc='', path='a123', params='', query='', fragment='')

关于python urllib.parse.urljoin 在以数字和冒号开头的路径上,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55202875/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com