gpt4 book ai didi

go - 字符串到 UCS-2

转载 作者:IT王子 更新时间:2023-10-29 01:33:33 24 4
gpt4 key购买 nike

我想在 Go 中翻译我的 python 程序,将 unicode 字符串转换为 UCS-2 HEX 字符串。

在 python 中,这很简单:

u"Bien joué".encode('utf-16-be').encode('hex')
-> 004200690065006e0020006a006f007500e9

我是 Go 的初学者,我发现的最简单的方法是:

package main

import (
"fmt"
"strings"
)

func main() {
str := "Bien joué"
fmt.Printf("str: %s\n", str)

ucs2HexArray := []rune(str)
s := fmt.Sprintf("%U", ucs2HexArray)
a := strings.Replace(s, "U+", "", -1)
b := strings.Replace(a, "[", "", -1)
c := strings.Replace(b, "]", "", -1)
d := strings.Replace(c, " ", "", -1)
fmt.Printf("->: %s", d)
}

str: Bien joué
->: 004200690065006E0020006A006F007500E9
Program exited.

我真的认为这显然效率不高。我该如何改进它?

谢谢

最佳答案

把这个转换做成一个函数,以后你就可以很容易地改进转换算法。例如,

package main

import (
"fmt"
"strings"
"unicode/utf16"
)

func hexUTF16FromString(s string) string {
hex := fmt.Sprintf("%04x", utf16.Encode([]rune(s)))
return strings.Replace(hex[1:len(hex)-1], " ", "", -1)
}

func main() {
str := "Bien joué"
fmt.Println(str)
hex := hexUTF16FromString(str)
fmt.Println(hex)
}

输出:

Bien joué
004200690065006e0020006a006f007500e9

注意:

你说“将 unicode 字符串转换为 UCS-2 字符串”,但你的 Python 示例使用 UTF-16:

u"Bien joué".encode('utf-16-be').encode('hex')

The Unicode Consortium

UTF-16 FAQ

Q: What is the difference between UCS-2 and UTF-16?

A: UCS-2 is obsolete terminology which refers to a Unicode implementation up to Unicode 1.1, before surrogate code points and UTF-16 were added to Version 2.0 of the standard. This term should now be avoided.

UCS-2 does not describe a data format distinct from UTF-16, because both use exactly the same 16-bit code unit representations. However, UCS-2 does not interpret surrogate code points, and thus cannot be used to conformantly represent supplementary characters.

Sometimes in the past an implementation has been labeled "UCS-2" to indicate that it does not support supplementary characters and doesn't interpret pairs of surrogate code points as characters. Such an implementation would not handle processing of character properties, code point boundaries, collation, etc. for supplementary characters.

关于go - 字符串到 UCS-2,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30556584/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com