gpt4 book ai didi

c# - 如何从双字节字符集中打印字符

转载 作者:太空宇宙 更新时间:2023-11-03 10:24:51 24 4
gpt4 key购买 nike

看看如何从可打印或不可打印的单字节字符集中输出所有字符。输出文件将包含日语字符,例如チホヤツセ。

Encoding enc = Encoding.GetEncoding("shift_jis");
byte[] m_bytes = new byte [1];
StreamWriter sw = new StreamWriter(@"C:\shift_jis.txt");

for (int i = 0; i < 256; i++)
{
m_bytes.SetValue ((byte)i,0);
String Output = enc.GetString(m_bytes);
sw.WriteLine(Output);
}

sw.Close();
sw.Dispose();

这是我尝试使用双字节字符集执行此操作。

Encoding enc = Encoding.GetEncoding("iso-2022-jp");
byte[] m_bytes = new byte[2];
StreamWriter sw = new StreamWriter(@"C:\iso-2022-jp.txt");

for (int i = 0; i < 256; i++)
{
m_bytes.SetValue((byte)i, 0);

for (int j = 0; j < 256; j++)
{
m_bytes.SetValue((byte)j, 1);
String Output = null;
Output = enc.GetString(m_bytes);
sw.WriteLine(Output);
}
}

sw.Close();
sw.Dispose();

问题是输出文件仍然只包含前 255 个字符。每个字节都被单独评估,并分别返回该字节的字符。输出字符串总是包含两个字符而不是一个。由于字符集中的字符是用两个字节表示的,所以必须用两个字节来指定它们对吗?

那么如何遍历并打印双字节字符集中的所有字符呢?

最佳答案

如果可以按 unicode 顺序排列它们,您可以:

Encoding enc = (Encoding)Encoding.GetEncoding("iso-2022-jp").Clone();
enc.EncoderFallback = new EncoderReplacementFallback("");
char[] chars = new char[1];
byte[] bytes = new byte[16];

using (StreamWriter sw = new StreamWriter(@"C:\temp\iso-2022-jp.txt"))
{
for (int i = 0; i <= char.MaxValue; i++)
{
chars[0] = (char)i;
int count = enc.GetBytes(chars, 0, 1, bytes, 0);

if (count != 0)
{
sw.WriteLine(chars[0]);
}
}
}

如果你想按字节顺序排序,你可以:

Encoding enc = (Encoding)Encoding.GetEncoding("iso-2022-jp").Clone();
enc.EncoderFallback = new EncoderReplacementFallback("");
char[] chars = new char[1];
byte[] bytes = new byte[16];

var lst = new List<Tuple<byte[], char>>();

for (int i = 0; i <= char.MaxValue; i++)
{
chars[0] = (char)i;
int count = enc.GetBytes(chars, 0, 1, bytes, 0);

if (count != 0)
{
var bytes2 = new byte[count];
Array.Copy(bytes, bytes2, count);
lst.Add(Tuple.Create(bytes2, chars[0]));
}
}

lst.Sort((x, y) =>
{
int min = Math.Min(x.Item1.Length, y.Item1.Length);

for (int i = 0; i < min; i++)
{
int cmp = x.Item1[i].CompareTo(y.Item1[i]);

if (cmp != 0)
{
return cmp;
}
}

return x.Item1.Length.CompareTo(y.Item1.Length);
});

using (StreamWriter sw = new StreamWriter(@"C:\temp\iso-2022-jp.txt"))
{
foreach (var tuple in lst)
{
sw.WriteLine(tuple.Item2);

// This will print the full byte sequence necessary to
// generate the char. Note that iso-2022-jp uses escape
// sequences to "activate" subtables and to deactivate them.
//sw.WriteLine("{0}: {1}", tuple.Item2, string.Join(",", tuple.Item1.Select(x => x.ToString("x2"))));
}
}

或使用不同的排序顺序(长度优先):

lst.Sort((x, y) =>
{
int cmp2 = x.Item1.Length.CompareTo(y.Item1.Length);

if (cmp2 != 0)
{
return cmp2;
}

int min = Math.Min(x.Item1.Length, y.Item1.Length);

for (int i = 0; i < min; i++)
{
int cmp = x.Item1[i].CompareTo(y.Item1[i]);

if (cmp != 0)
{
return cmp;
}
}

return 0;
});

请注意,在所有示例中,我只生成基本 BMP 平面的字符。我不认为基本 BMP 平面之外的字符包含在任何编码中...如果有必要,我可以修改代码以支持它。

出于好奇,处理非 BMP 字符(iso-2022-jp 中不存在)的第一个代码版本:

Encoding enc = (Encoding)Encoding.GetEncoding("iso-2022-jp").Clone();
enc.EncoderFallback = new EncoderReplacementFallback("");
byte[] bytes = new byte[16];

using (StreamWriter sw = new StreamWriter(@"C:\temp\iso-2022-jp.txt"))
{
int max = -1;
for (int i = 0; i <= 0x10FFFF; i++)
{
if (i >= 0xD800 && i <= 0xDFFF)
{
continue;
}

string chars = char.ConvertFromUtf32(i);

int count = enc.GetBytes(chars, 0, chars.Length, bytes, 0);

if (count != 0)
{
sw.WriteLine(chars);
max = i;
}
}

Console.WriteLine("maximum codepoint: {0}", max);
}

关于c# - 如何从双字节字符集中打印字符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31968851/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com