- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我在导入以下文件时遇到一些问题: http://www.kuleuven.be/bio/ento/temp/test.xlsx以正确的编码转换为 R。特别是,
library("xlsx")
read.xlsx("test.xlsx",1,header=F,colClasses=c("character"),encoding="UTF-8")
给我
X1
1 a-cadinol
2 a-calacorene
3 a-caryophyllene alcohol
4 a-curcumene
5 a-elemol
6 a-muurolene
7 a-terpineol acetate
8 ß-4-dimethyl-3-cyclohexane-1-ethanol acetate
9 ß-bisabolene
10 ß-bisabolol
11 ß-bourbonene
12 ß-caryophyllene alcohol
13 ß-cyclocitral
14 ß-farnesol
15 ß-selinene
16 ß-sesquiphellandrene
17 <U+03B3>-cadinene
18 <U+03B3>-Carboethoxy-<U+03B3>-butyrolactone
19 <U+03B3>-ethyl-<U+03B3>-butyrolactone
20 <U+03B3>-eudesmol
21 <U+03B3>-muurolene
22 <U+03B3>-nonalactone
23 <U+03B3>-octalactone
24 <U+03B3>-selinene
25 <U+03B3>-undecalactone
26 d-cadinene
27 d-cadinol
28 d-muurolene
29 d-undecalactone
但是a-
, <U+03B3>-
和d-
应该是alpha-
, gamma-
和delta-
关于如何以正确的编码导入文件有什么想法吗?
我正在 Windows 上工作,并且 iconvlist()
给我
[1] "437" "850" "852" "855" "857"
[6] "860" "861" "862" "863" "865"
[11] "866" "869" "ANSI_X3.4-1968" "ANSI_X3.4-1986" "ASCII"
[16] "ASMO-708" "BIG-5" "BIG-FIVE" "big5" "BIG5"
[21] "big5-hkscs" "BIG5-HKSCS" "big5hkscs" "BIG5HKSCS" "CP-GR"
[26] "CP-IS" "cp1025" "CP1125" "CP1133" "CP1200"
[31] "CP12000" "CP12001" "CP1201" "CP1250" "CP1251"
[36] "CP1252" "CP1253" "CP1254" "CP1255" "CP1256"
[41] "CP1257" "CP1258" "CP1361" "CP154" "CP367"
[46] "CP437" "CP50221" "CP51932" "CP65001" "CP737"
[51] "CP775" "CP819" "CP850" "CP852" "CP853"
[56] "CP855" "CP857" "CP858" "CP860" "CP861"
[61] "CP862" "CP863" "CP864" "CP865" "cp866"
[66] "CP866" "CP869" "CP874" "cp875" "CP932"
[71] "CP936" "CP949" "CP950" "CSASCII" "CSIBM855"
[76] "CSIBM857" "CSIBM860" "CSIBM861" "CSIBM863" "CSIBM864"
[81] "CSIBM865" "CSIBM866" "CSIBM869" "csISO2022JP" "CSISOLATIN1"
[86] "CSPC775BALTIC" "CSPC850MULTILINGUAL" "CSPC862LATINHEBREW" "CSPC8CODEPAGE437" "CSPCP852"
[91] "CSPTCP154" "CSWINDOWS31J" "CYRILLIC-ASIAN" "DOS-720" "DOS-862"
[96] "EUC-CN" "euc-jp" "euc-kr" "EUC-KR" "EUCCN"
[101] "eucjp" "euckr" "GB18030" "gb2312" "GBK"
[106] "hz-gb-2312" "IBM-CP1133" "IBM-Thai" "IBM00858" "IBM00924"
[111] "IBM01047" "IBM01140" "IBM01141" "IBM01142" "IBM01143"
[116] "IBM01144" "IBM01145" "IBM01146" "IBM01147" "IBM01148"
[121] "IBM01149" "IBM037" "IBM1026" "IBM273" "IBM277"
[126] "IBM278" "IBM280" "IBM284" "IBM285" "IBM290"
[131] "IBM297" "IBM367" "IBM420" "IBM423" "IBM424"
[136] "IBM437" "IBM437" "IBM500" "ibm737" "ibm775"
[141] "IBM775" "IBM819" "ibm850" "IBM850" "ibm852"
[146] "IBM852" "IBM855" "IBM855" "ibm857" "IBM857"
[151] "IBM860" "IBM860" "ibm861" "IBM861" "IBM862"
[156] "IBM863" "IBM863" "IBM864" "IBM864" "IBM865"
[161] "IBM865" "IBM866" "ibm869" "IBM869" "IBM870"
[166] "IBM871" "IBM880" "IBM905" "iso-2022-jp" "iso-2022-jp"
[171] "ISO-2022-JP" "ISO-2022-JP-MS" "iso-2022-kr" "ISO-8859-1" "iso-8859-13"
[176] "iso-8859-15" "iso-8859-2" "iso-8859-3" "iso-8859-4" "iso-8859-5"
[181] "iso-8859-6" "iso-8859-7" "iso-8859-8" "iso-8859-8-i" "iso-8859-9"
[186] "ISO-IR-100" "ISO-IR-6" "ISO_646.IRV:1991" "ISO_8859-1" "ISO_8859-1:1987"
[191] "ISO2022-JP" "ISO2022-JP-MS" "iso2022-kr" "ISO646-US" "iso8859-1"
[196] "ISO8859-1" "iso8859-13" "iso8859-15" "iso8859-2" "iso8859-3"
[201] "iso8859-4" "iso8859-5" "iso8859-6" "iso8859-7" "iso8859-8"
[206] "iso8859-8-i" "iso8859-9" "Johab" "JOHAB" "koi8-r"
[211] "koi8-u" "ks_c_5601-1987" "L1" "latin-9" "LATIN1"
[216] "latin2" "latin3" "latin4" "latin5" "latin7"
[221] "latin9" "mac" "mac-centraleurope" "mac-is" "macarabic"
[226] "maccentraleurope" "maccroatian" "maccyrillic" "macgreek" "machebrew"
[231] "maciceland" "macintosh" "macis" "macroman" "macromania"
[236] "macthai" "macturkish" "macukraine" "macukrainian" "MS-ANSI"
[241] "MS-ARAB" "MS-CYRL" "MS-EE" "MS-GREEK" "MS-HEBR"
[246] "MS-TURK" "MS50221" "MS51932" "MS932" "MS936"
[251] "PT154" "PTCP154" "SHIFFT_JIS" "SHIFFT_JIS-MS" "shift-jis"
[256] "shift_jis" "SJIS" "SJIS-MS" "SJIS-OPEN" "SJIS-WIN"
[261] "UCS-2" "UCS-2BE" "UCS-2LE" "UCS-4" "UCS-4BE"
[266] "UCS-4BE" "UCS-4LE" "UCS-4LE" "UCS2" "UCS2BE"
[271] "UCS2LE" "UCS4" "UCS4BE" "UCS4LE" "UHC"
[276] "unicodeFFFE" "US" "US-ASCII" "UTF-16" "UTF-16BE"
[281] "UTF-16LE" "UTF-32" "UTF-32BE" "UTF-32LE" "UTF-8"
[286] "UTF16" "UTF16BE" "UTF16LE" "UTF32" "UTF32BE"
[291] "UTF32LE" "UTF8" "WINBALTRIM" "windows-1250" "windows-1251"
[296] "windows-1252" "windows-1253" "windows-1254" "windows-1255" "windows-1256"
[301] "windows-1257" "windows-1258" "WINDOWS-31J" "WINDOWS-50221" "WINDOWS-51932"
[306] "windows-874" "WINDOWS-932" "WINDOWS-936" "x-Chinese_CNS" "x-cp20001"
[311] "x-cp20003" "x-cp20004" "x-cp20005" "x-cp20261" "x-cp20269"
[316] "x-cp20936" "x-cp20949" "x-cp50227" "x-EBCDIC-KoreanExtended" "x-Europa"
[321] "x-IA5" "x-IA5-German" "x-IA5-Norwegian" "x-IA5-Swedish" "x-iscii-as"
[326] "x-iscii-be" "x-iscii-de" "x-iscii-gu" "x-iscii-ka" "x-iscii-ma"
[331] "x-iscii-or" "x-iscii-pa" "x-iscii-ta" "x-iscii-te" "x-mac-arabic"
[336] "x-mac-ce" "x-mac-chinesesimp" "x-mac-chinesetrad" "x-mac-croatian" "x-mac-cyrillic"
[341] "x-mac-greek" "x-mac-hebrew" "x-mac-icelandic" "x-mac-japanese" "x-mac-korean"
[346] "x-mac-romanian" "x-mac-thai" "x-mac-turkish" "x-mac-ukrainian" "x_Chinese-Eten"
我尝试了很多这些,但没有成功...不幸的是,我也不知道 Excel 以什么编码保存我的文件...
此外,R 中是否有任何简单的函数可以让我将所有希腊字母 alpha、beta、gamma 和 delta(以原始编码形式)转换为“alpha”、“beta”、“gamma”和“delta”(即全文写出来)?或者做相反的事情,即将“alpha”、“beta”、“gamma”等完整写成单个希腊字符?
编辑:关于我尝试过的最后一个问题
togreek=function(compname) {
n=as.character(compname,encoding="UTF-8")
n=gsub("alpha","\u03B1",n)
n=gsub("beta","\u03B2",n)
n=gsub("gamma","\u03B3",n)
n=gsub("delta","\u03B4",n)
n=gsub("epsilon","\u03B5",n)
n
}
tolatin=function(compname) {
n=as.character(compname,encoding="UTF-8")
n=gsub("\u03B1","alpha",n)
n=gsub("\u03B2","beta",n)
n=gsub("\u03B3","gamma",n)
n=gsub("\u03B4","delta",n)
n=gsub("\u03B5","epsilon",n)
n
}
tolatin 功能似乎有效:
library("xlsx")
test=read.xlsx("test.xlsx",1,header=F,colClasses=c("character"),encoding="UTF-8")
tolatin(test$X1)
[1] "alpha-cadinol" "alpha-calacorene" "alpha-caryophyllene alcohol"
[4] "alpha-curcumene" "alpha-elemol" "alpha-muurolene"
[7] "alpha-terpineol acetate" "beta-4-dimethyl-3-cyclohexane-1-ethanol acetate" "beta-bisabolene"
[10] "beta-bisabolol" "beta-bourbonene" "beta-caryophyllene alcohol"
[13] "beta-cyclocitral" "beta-farnesol" "beta-selinene"
[16] "beta-sesquiphellandrene" "gamma-cadinene" "gamma-Carboethoxy-gamma-butyrolactone"
[19] "gamma-ethyl-gamma-butyrolactone" "gamma-eudesmol" "gamma-muurolene"
[22] "gamma-nonalactone" "gamma-octalactone" "gamma-selinene"
[25] "gamma-undecalactone" "delta-cadinene" "delta-cadinol"
[28] "delta-muurolene" "delta-undecalactone"
但是如果我然后转换回希腊字符,我会再次遇到问题:
togreek(tolatin(test$X1))
[1] "α-cadinol" "α-calacorene" "α-caryophyllene alcohol"
[4] "α-curcumene" "α-elemol" "α-muurolene"
[7] "α-terpineol acetate" "ß-4-dimethyl-3-cyclohexane-1-ethanol acetate" "ß-bisabolene"
[10] "ß-bisabolol" "ß-bourbonene" "ß-caryophyllene alcohol"
[13] "ß-cyclocitral" "ß-farnesol" "ß-selinene"
[16] "ß-sesquiphellandrene" "<U+03B3>-cadinene" "<U+03B3>-Carboethoxy-<U+03B3>-butyrolactone"
[19] "<U+03B3>-ethyl-<U+03B3>-butyrolactone" "<U+03B3>-eudesmol" "<U+03B3>-muurolene"
[22] "<U+03B3>-nonalactone" "<U+03B3>-octalactone" "<U+03B3>-selinene"
[25] "<U+03B3>-undecalactone" "d-cadinene" "d-cadinol"
[28] "d-muurolene" "d-undecalactone"
有什么想法我做错了吗?
最佳答案
试试这个:Sys.setlocale(category = "LC_ALL", locale = "希腊语")
关于r - 以正确的编码将包含希腊字符的 Excel 文件导入到 R 中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20916070/
我对自定义 CSS 或在将图像作为 Logo 上传到页面时使用编码 block 有疑问。我正在为我的网站使用 squarespace,我需要帮助编码我的 Logo 以使其适合每个页面。一个选项是使用自
如 encoding/json 包文档中所述, Marshal traverses the value v recursively. If an encountered value implement
我必须做一些相当于Java中的iconv -f utf8 -t sjisMS $INPUT_FILE的事情。该命令在 Unix 中 我在java中没有找到任何带有sjisMS的编码。 Java中有Sh
从 PHP 5.3 迁移到 PHP 5.6 后,我遇到了编码问题。我的 MySQL 数据库是 latin1,我的 PHP 文件是 windows-1251。现在一切都显示为“ñëåäíèòå àäðå
我有一个 RScript文件(我们称之为 main.r ),它引用了另一个文件,使用以下代码: source("functions.R") 但是,当我运行 RScript 文件时,它提示以下错误:
我无法设法从 WSDL 创建 RPC/编码风格的代码 - 有谁知道哪个框架可以做到这一点? 带有 adb 和 xmlbeans 映射的 Axis2 无法正常工作(无法处理响应中的肥皂编码)直接使用 X
安装了最新版本的Node.Js()和npm包**(1.2.10)**当我运行 Express 命令来生成项目时,它向我抛出以下错误 buffer.js:240 switch (encoding &
JavaScript中有JSON编码/解码base64编码/解码函数吗? 最佳答案 是的,btoa() 和 atob() 在某些浏览器中可以工作: var enc = btoa("this is so
>>> unicode('восстановление информации', 'utf-16') Traceback (most recent call last): File "", line
我当然熟悉 java.net.URLEncoder 和 java.net.URLDecoder 类。但是,我只需要 HTML 样式的编码。 (我不想将 ' ' 替换为 '+' 等)。我不知道任何只做
有一个非常简单的 SSIS 包: OLE DB Source 通过 View 获取数据(数据库表 nvarchar 或 nchar 中的所有字符串列)。 派生列,用于格式化现有日期并将其添加到数据集(
我正在使用一个在 Node 中进行base64编码的软件,如下所示: const enc = new Buffer('test', 'base64') console.log(enc) 显示: 我正
前言 下文介绍的自定义协议仅作为学习示例,纯粹是玩具项目,没有实际可用性。无需过度关注和讨论其合理性 进行通信的双方是谁? 常见的模型 客户端-服务器,例如HTTP协议,浏览器<=>
我试图将带有日语字符的数据插入到 oracle 数据库中。事情是保存在数据库中的是一堆倒置的问号。我该如何解决这个问题 最佳答案 见 http://www.errcode.net/blogs/?p=6
当我在 java 中解压 zip 文件时,我发现文件名中出现了带有重音字符的奇怪行为。 西索: Add File user : L'equipe Technique -- Folder : spec
在网上冲浪我找到了 ExtJS 的 Ext.Gantt 插件,该扩展有一个特殊的编码。任何人都知道如何编码那样或其他复杂的形式。 Encoded Gantt Chart 最佳答案 它似乎被 Dean
我正在用C语言做一个编码任务,我进展顺利,直到读取符号并根据表格分配相应的代码的部分。我必须连接几个代码,直到它们的长度达到 32 位,为此我必须将它们写入一个文件中。这种写入文件的方法给我带来了很多
我有一个外部链接的 javascript 文件。在那个 javascript 里面,我有这个功能: function getMonthNumber(monthName){ monthName = mo
使用mechanize,我检索到一个网页的源页面,其中包含一些非ASCII字符,比如汉字。 代码如下: #using python2.6 from mechanize import Browser b
我有一个包含字母 ø 的文件。当我用这段代码 File.ReadLines(filePath) 读取它时,我得到了一个问号而不是它。 当我像这样添加编码时 File.ReadLines(filePat
我是一名优秀的程序员,十分优秀!