gpt4 book ai didi

Unicode URL 解码

转载 作者:行者123 更新时间:2023-12-02 10:25:16 26 4
gpt4 key购买 nike

对 unicode 字符进行 URL 编码的常用方法是将其拆分为 2 %HH 代码。 (\u4161 => %41%61)

但是,unicode在解码时是如何区分的呢?您如何知道 %41%61\u4161\x41\x61(“Aa”)?

是否需要编码的 8 位字符前面带有 %00

或者,unicode 字符应该丢失/分割吗?

最佳答案

根据Wikipedia :

Current standard

The generic URI syntax mandates that new URI schemesthat provide for the representation ofcharacter data in a URI must, ineffect, represent characters from theunreserved set without translation,and should convert all othercharacters to bytes according toUTF-8, and then percent-encode thosevalues. This requirement wasintroduced in January 2005 with thepublication of RFC 3986. URI schemesintroduced before this date are notaffected.

Not addressed by the currentspecification is what to do withencoded character data. For example,in computers, character data manifestsin encoded form, at some level, andthus could be treated as either binarydata or as character data when beingmapped to URI characters. Presumably,it is up to the URI schemespecifications to account for thispossibility and require one or theother, but in practice, few, if any,actually do.

Non-standard implementations

There exists a non-standard encodingfor Unicode characters: %uxxxx, wherexxxx is a Unicode value represented asfour hexadecimal digits. This behavioris not specified by any RFC and hasbeen rejected by the W3C. The thirdedition of ECMA-262 still includes anescape(string) function that uses thissyntax, but also an encodeURI(uri)function that converts to UTF-8 andpercent-encodes each octet.

所以,看起来这完全取决于编写 unencode 方法的人......标准不是很有趣吗?

关于Unicode URL 解码,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/155892/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com