gpt4 book ai didi

python - 使用 Swift,如何在 Python 中像这个简短的脚本一样重新编码然后解码字符串?

转载 作者:行者123 更新时间:2023-11-28 05:52:30 25 4
gpt4 key购买 nike

XKCD 的 API 和奇怪的编码问题存在一些问题。

Minor encoding issue with xkcd alt texts in chat

解决方案(在 Python 中)是将其编码为 latin1,然后解码为 utf8,但我如何在 Swift 中执行此操作?

测试字符串:

"Be careful\u00e2\u0080\u0094it's breeding season"

预期输出:

Be careful—it's breeding season

Python(来自上面的链接):

import json
a = '''"Be careful\u00e2\u0080\u0094it's breeding season"'''
print(json.loads(a).encode('latin1').decode('utf8'))

这是如何在 Swift 中完成的?

let strdata = "Be careful\\u00e2\\u0080\\u0094it's breeding season".data(using: .isoLatin1)!
let str = String(data: strdata, encoding: .utf8)

那行不通!

最佳答案

您必须先解码 JSON 数据,然后提取字符串,最后“修复”字符串。这是一个独立的示例,其中包含来自 https://xkcd.com/1814/info.0.json 的 JSON :

let data = """
{"month": "3", "num": 1814, "link": "", "year": "2017", "news": "",
"safe_title": "Color Pattern", "transcript": "",
"alt": "\\u00e2\\u0099\\u00ab When the spacing is tight / And the difference is slight / That's a moir\\u00c3\\u00a9 \\u00e2\\u0099\\u00ab",
"img": "https://imgs.xkcd.com/comics/color_pattern.png",
"title": "Color Pattern", "day": "22"}
""".data(using: .utf8)!

// Alternatively:
// let url = URL(string: "https://xkcd.com/1814/info.0.json")!
// let data = try! Data(contentsOf: url)

do {
if let dict = (try JSONSerialization.jsonObject(with: data, options: [])) as? [String: Any],
var alt = dict["alt"] as? String {

// Now try fix the "alt" string
if let isoData = alt.data(using: .isoLatin1),
let altFixed = String(data: isoData, encoding: .utf8) {
alt = altFixed
}

print(alt)
// ♫ When the spacing is tight / And the difference is slight / That's a moiré ♫
}
} catch {
print(error)
}

如果你只有一个形式的字符串

Be careful\u00e2\u0080\u0094it's breeding season

然后你仍然可以使用JSONSerialization来解码\uNNNN转义序列,然后继续如上。

一个简单的例子(为了简洁省略了错误检查):

let strbad = "Be careful\\u00e2\\u0080\\u0094it's breeding season"
let decoded = try! JSONSerialization.jsonObject(with: Data("\"\(strbad)\"".utf8), options: .allowFragments) as! String
let strgood = String(data: decoded.data(using: .isoLatin1)!, encoding: .utf8)!
print(strgood)
// Be careful—it's breeding season

附录:这是“修复”错误 JSON 编码的更强大版本。它在源字符串中搜索出现的 \uNNNN 并将它们转换为字节,然后将其解释为 UTF-8。与先前方法相比的优势在于源字符串中的其他非 ASCII 字符保持不变:

extension String {

func decodeBrokenJSON() -> String? {

var bytes = Data()
var position = startIndex

while let range = range(of: "\\u", range: position..<endIndex) {
bytes.append(contentsOf:self[position ..< range.lowerBound].utf8)
position = range.upperBound
let hexCode = self[position...].prefix(4)
guard hexCode.count == 4, let byte = UInt8(hexCode, radix: 16) else {
return nil // Invalid hex code
}
bytes.append(byte)
position = index(position, offsetBy: hexCode.count)
}
bytes.append(contentsOf: self[position ..< endIndex].utf8)
return String(data: bytes, encoding: .utf8)
}
}

例子:

print("Be careful\\u00e2\\u0080\\u0094it's breeding season".decodeBrokenJSON()!)
// Be careful—it's breeding season

print("\\u00c4\\u00b0zmir éûò€🇹🇷".decodeBrokenJSON()!)
// İzmir éûò€🇹🇷

关于python - 使用 Swift,如何在 Python 中像这个简短的脚本一样重新编码然后解码字符串?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52387450/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com