gpt4 book ai didi

emacs - 如何更正具有混合编码的文件?

转载 作者:行者123 更新时间:2023-12-01 02:26:26 25 4
gpt4 key购买 nike

鉴于具有混合编码(例如 utf-8 和 latin-1)的损坏文件,我如何配置 Emacs 以在保存文件时将其所有符号“投影”为单一编码(例如 utf-8)?

我做了以下功能来自动化一些清理工作,但我想我可以在某处找到将一种编码中的符号“é”映射到 utf-8 中的“é”的信息,以便改进此功能(或有人已经写了这样的函数)。

  (defun jyby/cleanToUTF ()
"Cleaning to UTF"
(interactive)
(progn
(save-excursion (replace-regexp "अ" ""))
(save-excursion (replace-regexp "आ" ""))
(save-excursion (replace-regexp "ॆ" ""))
)
)

(global-unset-key [f11])
(global-set-key [f11] 'jyby/cleanToUTF)

我有许多文件被混合编码“损坏”(由于从浏览器复制粘贴的字体配置错误),产生以下错误。有时我会手动清理它们,方法是用“”或适当的字符搜索和替换每个有问题的符号,或者更快地将“utf-8-unix”指定为编码(下次编辑和保存时会提示相同的消息)文件)。这已成为一个问题,因为在任何此类损坏的文件中,任何重音字符都被替换为每次保存时大小加倍的序列,最终使文件大小加倍。我正在使用 GNU Emacs 24.2.1
These default coding systems were tried to encode text
in the buffer `test_accents.org':
(utf-8-unix (30 . 4194182) (33 . 4194182) (34 . 4194182) (37
. 4194182) (40 . 4194181) (41 . 4194182) (42 . 4194182) (45
. 4194182) (48 . 4194182) (49 . 4194182) (52 . 4194182))
However, each of them encountered characters it couldn't encode:
utf-8-unix cannot encode these: ...

Click on a character (or switch to this window by `C-x o'
and select the characters by RET) to jump to the place it appears,
where `C-u C-x =' will give information about it.

Select one of the safe coding systems listed below,
or cancel the writing with C-g and edit the buffer
to remove or modify the problematic characters,
or specify any other coding system (and risk losing
the problematic characters).

raw-text emacs-mule no-conversion

最佳答案

我在 emacs 中多次遇到这个问题。当我有一个搞砸的文件时,例如在 raw-text-unix 模式下,并保存为 utf-8,emacs 甚至会提示文本已经是干净的 utf-8。我还没有找到一种方法让它只提示非 utf-8。

我刚刚找到了一种使用重新编码的合理的半自动化方法:

f=mixed-file
recode -f ..utf-8 $f > /tmp/recode.out
diff $f recode.out | cat -vt

# manually fix lines of text that can't be converted to utf-8 in $f,
# and re-run recode and diff until the output diff is empty.

一个有用的工具是 http://www.ltg.ed.ac.uk/~richard/utf-8.cgi?input=342+200+224&mode=obytes

然后我只是在emacs中重新打开文件,它被识别为干净的unicode。

关于emacs - 如何更正具有混合编码的文件?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/15977942/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com