algorithm - 将任意 GUID 编码为可读的 ASCII (33-127) 的最有效方法是什么？-6ren

algorithm - 将任意 GUID 编码为可读的 ASCII (33-127) 的最有效方法是什么？

转载作者：塔克拉玛干更新时间：2023-11-03 02:13:38

GUID 的标准字符串表示大约需要 36 个字符。这很好，但也很浪费。我想知道如何使用 33-127 范围内的所有 ASCII 字符以最短的方式对其进行编码。简单的实现产生 22 个字符，仅仅是因为 128 位/6 位 产生 22。

哈夫曼编码是我第二好的，唯一的问题是如何选择编码....

当然，编码必须是无损的。

最佳答案

这是一个老问题，但我必须解决它才能使我正在研究的系统向后兼容。

确切的要求是将客户端生成的标识符写入数据库并存储在一个 20 个字符的唯一列中。它从未向用户显示，也未以任何方式编入索引。

由于无法消除要求，我真的很想使用 Guid(即 statistically unique)，如果我可以将其无损编码为 20 个字符，那么考虑到约束条件，这将是一个很好的解决方案。

Ascii-85 允许您将 4 个字节的二进制数据编码为 5 个字节的 Ascii 数据。因此，使用此编码方案，一个 16 字节的 guid 将刚好适合 20 个 Ascii 字符。 Guid 可以有 3.1962657931507848761677563491821e+38 个离散值，而 Ascii-85 的 20 个字符可以有 3.8759531084514355873123178482056e+38 个离散值。

当写入数据库时，我对截断有一些担忧，因此编码中不包含空白字符。我也遇到了 collation 的问题，我通过从编码中排除小写字符来解决这个问题。此外，它只会通过 paramaterized command 传递。 , 所以任何特殊的 SQL 字符都会被自动转义。

我已经包含了执行 Ascii-85 编码和解码的 C# 代码，以防它对任何人有帮助。显然，根据您的使用情况，您可能需要选择不同的字符集，因为我的限制使我选择了一些不常见的字符，如“ß”和“Ø”——但这是简单的部分:

/// <summary>
/// This code implements an encoding scheme that uses 85 printable ascii characters 
/// to encode the same volume of information as contained in a Guid.
/// 
/// Ascii-85 can represent 4 binary bytes as 5 Ascii bytes. So a 16 byte Guid can be 
/// represented in 20 Ascii bytes. A Guid can have 
/// 3.1962657931507848761677563491821e+38 discrete values whereas 20 characters of 
/// Ascii-85 can have 3.8759531084514355873123178482056e+38 discrete values.
/// 
/// Lower-case characters are not included in this encoding to avoid collation 
/// issues. 
/// This is a departure from standard Ascii-85 which does include lower case 
/// characters.
/// In addition, no whitespace characters are included as these may be truncated in 
/// the database depending on the storage mechanism - ie VARCHAR vs CHAR.
/// </summary>
internal static class Ascii85
{
    /// <summary>
    /// 85 printable ascii characters with no lower case ones, so database 
    /// collation can't bite us. No ' ' character either so database can't 
    /// truncate it!
    /// Unfortunately, these limitation mean resorting to some strange 
    /// characters like 'Æ' but we won't ever have to type these, so it's ok.
    /// </summary>
    private static readonly char[] kEncodeMap = new[]
    { 
        '0','1','2','3','4','5','6','7','8','9',  // 10
        'A','B','C','D','E','F','G','H','I','J',  // 20
        'K','L','M','N','O','P','Q','R','S','T',  // 30
        'U','V','W','X','Y','Z','|','}','~','{',  // 40
        '!','"','#','$','%','&','\'','(',')','`', // 50
        '*','+',',','-','.','/','[','\\',']','^', // 60
        ':',';','<','=','>','?','@','_','¼','½',  // 70
        '¾','ß','Ç','Ð','€','«','»','¿','•','Ø',  // 80
        '£','†','‡','§','¥'                       // 85
    };

    /// <summary>
    /// A reverse mapping of the <see cref="kEncodeMap"/> array for decoding 
    /// purposes.
    /// </summary>
    private static readonly IDictionary<char, byte> kDecodeMap;

    /// <summary>
    /// Initialises the <see cref="kDecodeMap"/>.
    /// </summary>
    static Ascii85()
    {
        kDecodeMap = new Dictionary<char, byte>();

        for (byte i = 0; i < kEncodeMap.Length; i++)
        {
            kDecodeMap.Add(kEncodeMap[i], i);
        }
    }

    /// <summary>
    /// Decodes an Ascii-85 encoded Guid.
    /// </summary>
    /// <param name="ascii85Encoding">The Guid encoded using Ascii-85.</param>
    /// <returns>A Guid decoded from the parameter.</returns>
    public static Guid Decode(string ascii85Encoding)
    { 
        // Ascii-85 can encode 4 bytes of binary data into 5 bytes of Ascii.
        // Since a Guid is 16 bytes long, the Ascii-85 encoding should be 20
        // characters long.
        if(ascii85Encoding.Length != 20)
        {
            throw new ArgumentException(
                "An encoded Guid should be 20 characters long.", 
                "ascii85Encoding");
        }

        // We only support upper case characters.
        ascii85Encoding = ascii85Encoding.ToUpper();

        // Split the string in half and decode each substring separately.
        var higher = ascii85Encoding.Substring(0, 10).AsciiDecode();
        var lower = ascii85Encoding.Substring(10, 10).AsciiDecode();

        // Convert the decoded substrings into an array of 16-bytes.
        var byteArray = new[]
        {
            (byte)((higher & 0xFF00000000000000) >> 56),        
            (byte)((higher & 0x00FF000000000000) >> 48),        
            (byte)((higher & 0x0000FF0000000000) >> 40),        
            (byte)((higher & 0x000000FF00000000) >> 32),        
            (byte)((higher & 0x00000000FF000000) >> 24),        
            (byte)((higher & 0x0000000000FF0000) >> 16),        
            (byte)((higher & 0x000000000000FF00) >> 8),         
            (byte)((higher & 0x00000000000000FF)),  
            (byte)((lower  & 0xFF00000000000000) >> 56),        
            (byte)((lower  & 0x00FF000000000000) >> 48),        
            (byte)((lower  & 0x0000FF0000000000) >> 40),        
            (byte)((lower  & 0x000000FF00000000) >> 32),        
            (byte)((lower  & 0x00000000FF000000) >> 24),        
            (byte)((lower  & 0x0000000000FF0000) >> 16),        
            (byte)((lower  & 0x000000000000FF00) >> 8),         
            (byte)((lower  & 0x00000000000000FF)),  
        };

        return new Guid(byteArray);
    }

    /// <summary>
    /// Encodes binary data into a plaintext Ascii-85 format string.
    /// </summary>
    /// <param name="guid">The Guid to encode.</param>
    /// <returns>Ascii-85 encoded string</returns>
    public static string Encode(Guid guid)
    {
        // Convert the 128-bit Guid into two 64-bit parts.
        var byteArray = guid.ToByteArray();
        var higher = 
            ((UInt64)byteArray[0] << 56) | ((UInt64)byteArray[1] << 48) | 
            ((UInt64)byteArray[2] << 40) | ((UInt64)byteArray[3] << 32) |
            ((UInt64)byteArray[4] << 24) | ((UInt64)byteArray[5] << 16) | 
            ((UInt64)byteArray[6] << 8)  | byteArray[7];

        var lower = 
            ((UInt64)byteArray[ 8] << 56) | ((UInt64)byteArray[ 9] << 48) | 
            ((UInt64)byteArray[10] << 40) | ((UInt64)byteArray[11] << 32) |
            ((UInt64)byteArray[12] << 24) | ((UInt64)byteArray[13] << 16) | 
            ((UInt64)byteArray[14] << 8)  | byteArray[15];

        var encodedStringBuilder = new StringBuilder();

        // Encode each part into an ascii-85 encoded string.
        encodedStringBuilder.AsciiEncode(higher);
        encodedStringBuilder.AsciiEncode(lower);

        return encodedStringBuilder.ToString();
    }

    /// <summary>
    /// Encodes the given integer using Ascii-85.
    /// </summary>
    /// <param name="encodedStringBuilder">The <see cref="StringBuilder"/> to 
    /// append the results to.</param>
    /// <param name="part">The integer to encode.</param>
    private static void AsciiEncode(
        this StringBuilder encodedStringBuilder, UInt64 part)
    {
        // Nb, the most significant digits in our encoded character will 
        // be the right-most characters.
        var charCount = (UInt32)kEncodeMap.Length;

        // Ascii-85 can encode 4 bytes of binary data into 5 bytes of Ascii.
        // Since a UInt64 is 8 bytes long, the Ascii-85 encoding should be 
        // 10 characters long.
        for (var i = 0; i < 10; i++)
        {
            // Get the remainder when dividing by the base.
            var remainder = part % charCount;

            // Divide by the base.
            part /= charCount;

            // Add the appropriate character for the current value (0-84).
            encodedStringBuilder.Append(kEncodeMap[remainder]);
        }
    }

    /// <summary>
    /// Decodes the given string from Ascii-85 to an integer.
    /// </summary>
    /// <param name="ascii85EncodedString">Decodes a 10 character Ascii-85 
    /// encoded string.</param>
    /// <returns>The integer representation of the parameter.</returns>
    private static UInt64 AsciiDecode(this string ascii85EncodedString)
    {
        if (ascii85EncodedString.Length != 10)
        {
            throw new ArgumentException(
                "An Ascii-85 encoded Uint64 should be 10 characters long.", 
                "ascii85EncodedString");
        }

        // Nb, the most significant digits in our encoded character 
        // will be the right-most characters.
        var charCount = (UInt32)kEncodeMap.Length;
        UInt64 result = 0;

        // Starting with the right-most (most-significant) character, 
        // iterate through the encoded string and decode.
        for (var i = ascii85EncodedString.Length - 1; i >= 0; i--)
        {
            // Multiply the current decoded value by the base.
            result *= charCount;

            // Add the integer value for that encoded character.
            result += kDecodeMap[ascii85EncodedString[i]];
        }

        return result;
    }
}

此外，这是单元测试。它们没有我想要的那么彻底，而且我不喜欢使用 Guid.NewGuid() 的不确定性，但它们应该可以帮助您入门:

/// <summary>
/// Tests to verify that the Ascii-85 encoding is functioning as expected.
/// </summary>
[TestClass]
[UsedImplicitly]
public class Ascii85Tests
{
    [TestMethod]
    [Description("Ensure that the Ascii-85 encoding is correct.")]
    [UsedImplicitly]
    public void CanEncodeAndDecodeAGuidUsingAscii85()
    {
        var guidStrings = new[]
        {
            "00000000-0000-0000-0000-000000000000",
            "00000000-0000-0000-0000-0000000000FF",
            "00000000-0000-0000-0000-00000000FF00",
            "00000000-0000-0000-0000-000000FF0000",
            "00000000-0000-0000-0000-0000FF000000",
            "00000000-0000-0000-0000-00FF00000000",
            "00000000-0000-0000-0000-FF0000000000",
            "00000000-0000-0000-00FF-000000000000",
            "00000000-0000-0000-FF00-000000000000",
            "00000000-0000-00FF-0000-000000000000",
            "00000000-0000-FF00-0000-000000000000",
            "00000000-00FF-0000-0000-000000000000",
            "00000000-FF00-0000-0000-000000000000",
            "000000FF-0000-0000-0000-000000000000",
            "0000FF00-0000-0000-0000-000000000000",
            "00FF0000-0000-0000-0000-000000000000",
            "FF000000-0000-0000-0000-000000000000",
            "FF000000-0000-0000-0000-00000000FFFF",
            "00000000-0000-0000-0000-0000FFFF0000",
            "00000000-0000-0000-0000-FFFF00000000",
            "00000000-0000-0000-FFFF-000000000000",
            "00000000-0000-FFFF-0000-000000000000",
            "00000000-FFFF-0000-0000-000000000000",
            "0000FFFF-0000-0000-0000-000000000000",
            "FFFF0000-0000-0000-0000-000000000000",
            "00000000-0000-0000-0000-0000FFFFFFFF",
            "00000000-0000-0000-FFFF-FFFF00000000",
            "00000000-FFFF-FFFF-0000-000000000000",
            "FFFFFFFF-0000-0000-0000-000000000000",
            "00000000-0000-0000-FFFF-FFFFFFFFFFFF",
            "FFFFFFFF-FFFF-FFFF-0000-000000000000",
            "FFFFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFF",
            "1000000F-100F-100F-100F-10000000000F"
        };

        foreach (var guidString in guidStrings)
        {
            var guid = new Guid(guidString);
            var encoded = Ascii85.Encode(guid);

            Assert.AreEqual(
                20, 
                encoded.Length, 
                "A guid encoding should not exceed 20 characters.");

            var decoded = Ascii85.Decode(encoded);

            Assert.AreEqual(
                guid, 
                decoded, 
                "The guids are different after being encoded and decoded.");
        }
    }

    [TestMethod]
    [Description(
        "The Ascii-85 encoding is not susceptible to changes in character case.")]
    [UsedImplicitly]
    public void Ascii85IsCaseInsensitive()
    {
        const int kCount = 50;

        for (var i = 0; i < kCount; i++)
        {
            var guid = Guid.NewGuid();

            // The encoding should be all upper case. A reliance 
            // on mixed case will make the generated string 
            // vulnerable to sql collation.
            var encoded = Ascii85.Encode(guid);

            Assert.AreEqual(
                encoded, 
                encoded.ToUpper(), 
                "The Ascii-85 encoding should produce only uppercase characters.");
        }
    }
}

我希望这能为某些人省去一些麻烦。

此外，如果您发现任何错误，请告诉我 ;-)

关于algorithm - 将任意 GUID 编码为可读的 ASCII (33-127) 的最有效方法是什么？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/2827627/

文章推荐： c++ - 处理打开方式的事件 (WinApi)

文章推荐： c++ - 编写一个 ostream 过滤器？

文章推荐： c++ - 从 MFC 应用程序连接到 SQL Server Compact Edition (.sdf)

文章推荐： c++ - GCC ICE——替代函数语法、可变参数模板和元组

ascii - ASCII 中的双引号
双引号的 ASCII 数字是多少？ (") 另外，是否有指向任何地方的列表的链接？最后，如何进入C族(尤其是C#) 最佳答案引号的 ASCII 码是 34。 (好吧，严格来说，它不是真正的引号，而
ascii - ASCII 字符如何存储在内存中？
考虑一台计算机，它有一个字节可寻址内存，根据大端方案组织成 32 位字。程序读取在键盘上输入的 ASCII 字符并将它们存储在连续的字节位置，从位置 1000 开始。在输入名称“johnson”后显示
ascii - 大多数 ASCII 控制字符是否已过时？
\x20 下的大多数 ASCII 代码似乎完全过时了。他们今天有没有使用？它们是否可以被视为“可供抢夺”，还是最好避免它们？我需要一个分隔符来将“行”分组在一起，为此目的选择其中一个肯定会很好。来
ascii - 为什么不可打印的 ASCII 字符实际上可以打印？
非字母数字或标点符号的字符称为不可打印: Codes 20hex to 7Ehex, known as the printable characters 那么为什么是例如005 可表示(并由 club
ascii - 为什么在 ASCII 表中大写字母排在小写字母之前？
在我的一次面试中，面试官问我为什么在 ASCII 表中大写字母在小写字母之前，我在 google.com 上搜索但没有找到，谁能给我答案？多谢! 最佳答案我只是猜测，但我想这是因为最早的字符集根本没
ascii - 普通文本中最少使用的分隔符 < ASCII 128
由于编码原因可能会让您感到恐惧(我不好意思说)，我需要在单个字符串中存储多个文本项。我将使用一个字符来分隔它们。哪个字符最适合用于此目的，即哪个字符最不可能出现在文本中？必须是可打印的，并且可能小
ascii - 安全 ASCII 字符以在存储前替换空格
我的代码将一大堆文本数据传递给负责存储这些数据的遗留库。但是，它倾向于删除尾随空格。当我读回数据时，这是一个问题。由于我无法更改遗留代码，因此我考虑用一些不常见的 ASCII 字符替换所有空格。当我读
ascii - 正确的英镑符号的 ASCII 值
我正在检查井号 (£) 的 ASCII 值。我找到了多个答案: http://www.ascii-code.com/说 A3 = 163 是井号的 ASCII 值。 http://www.asciit
ascii - 其他 ASCII 控制字符在哪里？
我们好像只用了'\0'(null),'\a'(bell),'\b'(backspace),'\t'(水平制表符),'\n'(line fee) ,'\r'(回车),'\v'(垂直制表符),'\e'(转
ascii - 为什么这些 ASCII 方法不一致？
当我查看 rust ASCII operations感觉之间存在一致性问题 is_lowercase/is_uppercase: pub fn is_uppercase(&self) -> bool
ascii - 255 以上的扩展 ASCII 码
我一直假设 ASCII 码的范围是 0 到 255。昨晚我不得不处理一个我认为是下划线但结果是 Chr(8230) 的字符。三个类似下划线的小点。这是在 AutoHotKey 脚本中。问题已解决，但给
ascii - "base ten ASCII"是什么意思？
也许我在使用 Google 方面做得很糟糕，但这些规范适用于 Bencoding继续引用称为“十进制 ASCII”的东西，这让我认为它与常规 ASCII 不同。有人能解释一下吗？最佳答案 base明
ascii - 在 Ada 中将字符串转换为 ascii
我正在尝试将小字符串转换为它们各自的 ascii 十进制值。就像将字符串“Ag”转换为“065103”一样。我尝试使用 integer_variable : Integer := Integer'V
ascii-art - 带有可选字母的 ASCII 艺术库
我想使用程序或图形库将图像转换为 ASCII 艺术，但我想指定要使用的调色板(符号)。所以基本上我想要一个图像，它从某个字母 A 呈现为文本，它是完整 ASCII 表的子集，例如 A := {a,b,
ascii - Graphviz 和 ascii 输出
是否可以使用 Graphviz 绘制 ASCII 图表？类似的事情: digraph { this -> is this -> a a -> test } 给出了不想要的结果。相反，我
ascii-art - 如何生成文本 ASCII 艺术
关闭。这个问题是off-topic .它目前不接受答案。想改进这个问题吗？ Update the question所以它是on-topic用于堆栈溢出。关闭 11 年前。 Improve thi
Bash:将非 ASCII 字符转换为 ASCII
如何将 Žvaigždės aukštybėj užges 或 äüöÖÜÄ 之类的字符串转换为 Zvaigzdes aukstybej uzges 或 auoOUA，分别使用 Bash？基本上我只
c - Ascii 十六进制值到 ascii 数字
这个问题在这里已经有了答案: 关闭 10 年前。 Possible Duplicate: How would you convert from ASCII to Hex by character i
mysql - 如何在不保存以检查是否与外部 ASCII 字符串匹配的情况下即时将列转换为 ASCII？
我有一个成员搜索功能，您可以在其中提供部分姓名，返回的内容应该是至少具有与该输入匹配的用户名、名字或姓氏之一的所有成员。这里的问题是某些名称具有“奇怪”的字符，例如 Renée 中的 é 并且用户不想
python - 如何将非 ASCII 字符编码的文件重命名为 ASCII
我有文件名“abc张.xlsx”，其中包含某种非 ASCII 字符编码，我想删除所有非 ASCII 字符以将其重命名为“abc.xlsx”。这是我尝试过的: import os import str

塔克拉玛干

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

algorithm - 将任意 GUID 编码为可读的 ASCII (33-127) 的最有效方法是什么？