gpt4 book ai didi

go - 使用Go读取已用UCS-2 little Endian编码的文本文件

转载 作者:行者123 更新时间:2023-12-01 20:27:21 25 4
gpt4 key购买 nike

我有一个Go程序来读取类似于以下代码的文本文件:

package main

import (
"bufio"
"log"
"os"
)

func main() {
file, err := os.Open("test.txt")

if err != nil {
log.Fatalf("failed opening file: %s", err)
}

scanner := bufio.NewScanner(file)
scanner.Split(bufio.ScanLines)
var txtlines []string

for scanner.Scan() {
txtlines = append(txtlines, scanner.Text())
}

file.Close()
}

游乐场: https://play.golang.org/p/cnDOEFaT0lr

该代码对所有文本文件都适用,除了已使用UCS-2 little endian编码的文件。如何将文件转换为UFT8格式以进行读取?

最佳答案

I have a Go program to read a text file. How can I convert the [UCS-2 little endian] file to UFT-8 format to read it?



Unicode

FAQ: UTF-8, UTF-16, UTF-32 & BOM

Q: What is the difference between UCS-2 and UTF-16?

A: UCS-2 is obsolete terminology which refers to a Unicode implementation up to Unicode 1.1, before surrogate code points and UTF-16 were added to Version 2.0 of the standard. This term should now be avoided.

UCS-2 does not describe a data format distinct from UTF-16, because both use exactly the same 16-bit code unit representations. However, UCS-2 does not interpret surrogate code points, and thus cannot be used to conformantly represent supplementary characters.

Sometimes in the past an implementation has been labeled "UCS-2" to indicate that it does not support supplementary characters and doesn't interpret pairs of surrogate code points as characters. Such an implementation would not handle processing of character properties, code point boundaries, collation, etc. for supplementary characters.



UCS-2是UTF-16的适当子集。

例如,
package main

import (
"bufio"
"fmt"
"os"

"golang.org/x/text/encoding/unicode"
)

func main() {
// "Language Learning and Teaching" written in 16 or more languages: UCS-2
// http://www.humancomp.org/unichtm/unilang.htm
f, err := os.Open("unilang.htm")
if err != nil {
fmt.Fprintln(os.Stderr, err)
os.Exit(1)
}
defer f.Close()

dec := unicode.UTF16(unicode.LittleEndian, unicode.IgnoreBOM).NewDecoder()
scn := bufio.NewScanner(dec.Reader(f))
for scn.Scan() {
fmt.Println(scn.Text())
}
if err := scn.Err(); err != nil {
fmt.Fprintln(os.Stderr, err)
os.Exit(1)
}
}

游乐场: https://play.golang.org/p/3VombFxUNb1

关于go - 使用Go读取已用UCS-2 little Endian编码的文本文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59529543/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com