utf-8 - 即使环境和 aspell 配置都将编码指定为 UTF-8，Aspell 仍将字典文件解码为 latin1-6ren

utf-8 - 即使环境和 aspell 配置都将编码指定为 UTF-8，Aspell 仍将字典文件解码为 latin1

转载作者：行者123 更新时间：2023-12-04 06:48:56

更新
显然，对此的解决方案是使用另一个配置参数在命令行上设置编码:--encodig=UTF-8。

例如:

zby@tvm1:/home/xpapers$ aspell --lang=en create master ./dictionary.local < w 
Warning: The word "PÃ©rez" is invalid. The character '©' (U+A9) may not appear in the middle of a word. Skipping word.

文件 w 只包含一个词:

zby@tvm1:/home/xpapers$ cat w
Pérez

那就是第二个字母是 e 带重音。十六进制转储:

zby@tvm1:/home/xpapers$ hexdump w
0000000 c350 72a9 7a65 000a                    
0000007

这是 littleendian - 所以你需要翻转字节 - 但它似乎是正确的 UTF-8(50 - P，然后是 c3 72 - 这是 e 带重音)，它在我的控制台中显示 OK。

在环境中我有:

zby@tvm1:/home/xpapers$ set | grep LANG
LANG=en_US.UTF-8

aspell 配置(由 aspell dump config 转储)附在下面，我认为唯一相关的信息是:

# encoding (string)
#   encoding to expect data to be in
# default: !encoding = UTF-8

所以似乎一切都为 UTF-8 设置了 - 但 aspell 似乎仍然尝试使用 Latin-1。

这是在 Ubuntu Karmic Coala 上:

zby@tvm1:~$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=9.10
DISTRIB_CODENAME=karmic
DISTRIB_DESCRIPTION="Ubuntu 9.10"

Aspell 是:

zby@tvm1:~$ aspell -v
@(#) International Ispell Version 3.1.20 (but really Aspell 0.60.6)

==============================================

zby@tvm1:/home/xpapers$ aspell dump config
# conf (string)
#   main configuration file
# default: aspell.conf

# conf-dir (string)
#   location of main configuration file
# default: /etc

# data-dir (string)
#   location of language data files
# default: <prefix:lib/aspell> = /usr/lib/aspell

# dict-alias (list)
#   create dictionary aliases

# dict-dir (string)
#   location of the main word list
# default: <data-dir> = /usr/lib/aspell

# encoding (string)
#   encoding to expect data to be in
# default: !encoding = UTF-8

# filter (list)
#   add or removes a filter

# filter-path (list)
#   path(s) aspell looks for filters

# mode (string)
#   filter mode
# default: url

# extra-dicts (list)
#   extra dictionaries to use

# home-dir (string)
#   location for personal files
# default: <$HOME|./> = /home/zby

# ignore (integer)
#   ignore words <= n chars
# default: 1

# ignore-case (boolean)
#   ignore case when checking words
# default: false

# ignore-repl (boolean)
#   ignore commands to store replacement pairs
# default: false

# keyboard (string)
#   keyboard definition to use for typo analysis
# default: standard

# lang (string)
#   language code
# default: <language-tag> = en_US

# local-data-dir (string)
#   location of local language data files
# default: <actual-dict-dir> = /usr/lib/aspell/

# master (string)
#   base name of the main dictionary to use
# default: <lang> = en_US

# normalize (boolean)
#   enable Unicode normalization
# default: true

# norm-required (boolean)
#   Unicode normalization required for current lang
# default: false

# norm-form (string)
#   Unicode normalization form: none, nfd, nfc, comp
# default: nfc

# norm-strict (boolean)
#   avoid lossy conversions when normalization
# default: false

# per-conf (string)
#   personal configuration file
# default: .aspell.conf

# personal (string)
#   personal dictionary file name
# default: .aspell.<lang>.pws = .aspell.en_US.pws

# prefix (string)
#   prefix directory
# default: /usr

# repl (string)
#   replacements list file name
# default: .aspell.<lang>.prepl = .aspell.en_US.prepl

# run-together (boolean)
#   consider run-together words legal
# default: false

# run-together-limit (integer)
#   maximum number that can be strung together
# default: 2

# run-together-min (integer)
#   minimal length of interior words
# default: 3

# save-repl (boolean)
#   save replacement pairs on save all
# default: true

# set-prefix (boolean)
#   set the prefix based on executable location
# default: true

# size (string)
#   size of the word list
# default: +60

# sug-mode (string)
#   suggestion mode
# default: normal

# sug-edit-dist (integer)
#   edit distance to use, override sug-mode default
# default: 1

# sug-typo-analysis (boolean)
#   use typo analysis, override sug-mode default
# default: true

# sug-repl-table (boolean)
#   use replacement tables, override sug-mode default
# default: true

# sug-split-char (list)
#   characters to insert when a word is split

# use-other-dicts (boolean)
#   use personal, replacement & session dictionaries
# default: true

# variety (list)
#   extra information for the word list

# warn (boolean)
#   enable warnings
# default: true

# affix-compress (boolean)
#   use affix compression when creating dictionaries
# default: false

# clean-affixes (boolean)
#   remove invalid affix flags
# default: true

# clean-words (boolean)
#   attempts to clean words so that they are valid
# default: false

# invisible-soundslike (boolean)
#   compute soundslike on demand rather than storing
# default: false

# partially-expand (boolean)
#   partially expand affixes for better suggestions
# default: false

# skip-invalid-words (boolean)
#   skip invalid words
# default: true

# validate-affixes (boolean)
#   check if affix flags are valid
# default: true

# validate-words (boolean)
#   check if words are valid
# default: true

# backup (boolean)
#   create a backup file by appending ".bak"
# default: true

# byte-offsets (boolean)
#   use byte offsets instead of character offsets
# default: false

# guess (boolean)
#   create missing root/affix combinations
# default: false

# keymapping (string)
#   keymapping for check mode: "aspell" or "ispell"
# default: aspell

# reverse (boolean)
#   reverse the order of the suggest list
# default: false

# suggest (boolean)
#   suggest possible replacements
# default: true

# time (boolean)
#   time load time and suggest time in pipe mode
# default: false


#######################################################################
#
# Filter: email
#   filter for skipping quoted text in email messages
#
# configured as follows:

# f-email-quote (list)
#   email quote characters

# f-email-margin (integer)
#   num chars that can appear before the quote char
# default: 10


#######################################################################
#
# Filter: html
#   filter for dealing with HTML documents
#
# configured as follows:

# f-html-check (list)
#   HTML attributes to always check

# f-html-skip (list)
#   HTML tags to always skip the contents of


#######################################################################
#
# Filter: tex
#   filter for dealing with TeX/LaTeX documents
#
# configured as follows:

# f-tex-check-comments (boolean)
#   check TeX comments
# default: false

# f-tex-command (list)
#   TeX commands


#######################################################################
#
# Filter: sgml
#   filter for dealing with generic SGML/XML documents
#
# configured as follows:

# f-sgml-check (list)
#   SGML attributes to always check

# f-sgml-skip (list)
#   SGML tags to always skip the contents of


#######################################################################
#
# Filter: texinfo
#   filter for dealing with Texinfo documents
#
# configured as follows:

# f-texinfo-ignore (list)
#   Texinfo commands to ignore the parameters of

# f-texinfo-ignore-env (list)
#   Texinfo environments to ignore


#######################################################################
#
# Filter: context
#   experimental filter for hiding delimited contexts
#
# configured as follows:

# f-context-delimiters (list)
#   context delimiters (separated by spaces)

# f-context-visible-first (boolean)
#   swaps visible and invisible text
# default: false

最佳答案

使用 -lang=en 创建字典时，Aspell 会查找 en 语言文件。在我的 Ubuntu 系统上，看起来像:

name en
charset iso8859-1
special ' -*-
soundslike en
affix en

所以 Aspell 使用该字符集。要覆盖该设置，请使用 --encoding=utf-8 选项。

然后对于输入(和建议的单词)设置编码选项。

关于utf-8 - 即使环境和 aspell 配置都将编码指定为 UTF-8，Aspell 仍将字典文件解码为 latin1，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/3396637/

文章推荐： asp.net - 不使用数据源控件的gridview编辑

文章推荐： c# - 了解 System.Xml.Serialization.XmlIgnoreAttribute 与基类

文章推荐： c#-4.0 - 如何修改可为空类型的值？

aspell - Hunspell/Aspell 数据转换为人类可读的屈折列表
有没有一种简单的方法可以从 Hunspell/Aspell 字典数据文件中生成人类可读的屈折列表？例如，我想生成以下输出(针对不同的语言): ... 书，书预订，预订，预订，预订 ... 去，去，
utf-8 - 即使环境和 aspell 配置都将编码指定为 UTF-8，Aspell 仍将字典文件解码为 latin1
更新显然，对此的解决方案是使用另一个配置参数在命令行上设置编码:--encodig=UTF-8。例如: zby@tvm1:/home/xpapers$ aspell --lang=en creat
linux - 如何在命令行中使用 aspell
我需要在命令行程序中使用 aspell 或任何其他拼写检查。我只需要检查一个单词而不是文件中的单词.... 最佳答案 echo $WORD | aspell -a 关于linux - 如何在命令行中使
r - 如何向现有的 Aspell 词典中添加更多单词？
我已经安装了 Aspell 字典来拼写检查我的文档。但是在文档中有一些拼写错误的单词，但我不希望 aspell 检测到这些单词是错误的。所以，基本上我想将这些词添加到现有的 aspell 词典中。我
ubuntu - 要求 aspell 跳过部分文档
我正在使用 aspell 在 Linux 上拼写检查 LaTeX 文档。我的文档经常包含各种编程语言的代码示例，我希望 aspell 在拼写检查时简单地跳过这些行。我可以在文档中写些什么来关闭一段文
ubuntu - aspell api 文档
我在哪里可以找到 aspell api 文档以及示例示例。最佳答案就在这儿: Writing programs to use ASpell ，特别是标题为“通过 C API”的第一部分。关于ub
spell-checking - ASpell 的好选择？
ASpell 有什么好的替代品吗？它是不错的开源软件，但已经有一段时间没有更新了。性能不太好，我无法使用非字母字符创建自定义工作列表。最佳答案 Hunspell .这就是 Firefox 用于其拼写
R Aspell Homebrew 软件
在装有 OS 10.6 的 Macbook pro 上工作。我最近用 R 包管理器安装了 Aspell 包，看起来安装进行得很好(没有安装错误)。但是当我尝试使用 aspell 时出现以下错误， >
linux - 向 aspell 添加许多词典
我有一个包含多个文件的 tex 文档，我想用 aspell 检查它。我使用的命令是: cat $f | aspell list --extra-dicts="./names.spl" --mode=t
php - 将词典添加到 Aspell/Pspell
我已经成功安装了 Aspell 并使用 Pspell 编译了 PHP，一切似乎都运行良好。我唯一不能做的就是创建一个“忽略”单词列表，或者一个个人字典，但是你想引用它。我已经尝试了列出的步骤 he
php - 如何在 aspell 自定义词典上使用特殊字符？
我正在使用 aspell 和 php (php-pspell) 在内部搜索引擎上构建“您是不是要找”功能。我有一个产品目录，我希望这些产品的名称也是字典中的单词，这样“您是不是要找”就可以提出建议。
python - aspell-python 安装错误
我正在尝试安装aspell-python 包以进行拼写校正。在安装包时遇到问题。错误: python setup.py install running install running build
emacs - 使用 ispell/aspell 拼写检查驼峰式单词
我需要拼写检查包含许多驼峰单词的大型文档。我想要 ispell 或 aspell 检查单个单词是否拼写正确。所以，如果是这个词: ScientificProgrezGoesBoink 我很想让它建议
具有多个文件的 latex aspell 的 makefile
我需要使用 makefile 对目录中的所有 tex 文件运行 aspell spellcheck: @aspell --lang=en -t -c sections/*tex 不起作用，因为
iphone - iPhone 上的 Aspell 拼写检查器？
我已经成功地将 aspell 编译为 iPhone 的静态库。我有 libaspell.a 文件并将其作为框架包含在我的 xcode 项目中。有人知道如何在 iPhone 上使用他们的 c/c++
Perl 与 Aspell 的接口(interface)
我正在尝试通过 Perl 用 Aspell 识别拼写错误的单词。我在没有管理员权限的 Linux 服务器上工作，这意味着我可以访问 Perl 和 Aspell，但不能访问，例如 Text::Aspel
linux - 如何在 aspell 字典中使用 Unicode？
我正在尝试使用 aspell 检查文本中的拼写。我有一个带有异常(exception)的自定义词典。它们都是 ASCII，但有一个词是 Unicode (foo.en.pws): personal_w
emacs - 我使用的是 ispell 还是 aspell，本地字典保存在哪里？
在 Emacs 中进行拼写检查时，我习惯使用 M-xispell 来调用拼写检查器。我读到有些人配置了他们的 Emacs，因此这会调用 aspell 而不是 ispell (或可能是 hunspell
java - 在 Java 中使用 Aspell 库
我发现 c api 可用于 aspell，但我想在 java 代码中使用它。我知道有 JNI 框架，我们可以通过它调用 c 库，但我无法在 aspell 情况下使用它，任何人都可以建议我可以在 ja
linux - 创建 aspell 样式文件编辑 bash 脚本
我正在创建一个在 Linux 上运行的命令行 lint 工具。我的输出目前是这样的: ./ex4/task6.7/SumOfCubedDigits.java > Line 15 has incons

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

utf-8 - 即使环境和 aspell 配置都将编码指定为 UTF-8，Aspell 仍将字典文件解码为 latin1