gpt4 book ai didi

algorithm - 将任意 GUID 编码为可读的 ASCII (33-127) 的最有效方法是什么?

转载 作者:塔克拉玛干 更新时间:2023-11-03 02:13:38 25 4
gpt4 key购买 nike

GUID 的标准字符串表示大约需要 36 个字符。这很好,但也很浪费。我想知道如何使用 33-127 范围内的所有 ASCII 字符以最短的方式对其进行编码。简单的实现产生 22 个字符,仅仅是因为 128 位/6 位 产生 22。

哈夫曼编码是我第二好的,唯一的问题是如何选择编码....

当然,编码必须是无损的。

最佳答案

这是一个老问题,但我必须解决它才能使我正在研究的系统向后兼容。

确切的要求是将客户端生成的标识符写入数据库并存储在一个 20 个字符的唯一列中。它从未向用户显示,也未以任何方式编入索引。

由于无法消除要求,我真的很想使用 Guid(即 statistically unique),如果我可以将其无损编码为 20 个字符,那么考虑到约束条件,这将是一个很好的解决方案。

Ascii-85 允许您将 4 个字节的二进制数据编码为 5 个字节的 Ascii 数据。因此,使用此编码方案,一个 16 字节的 guid 将刚好适合 20 个 Ascii 字符。 Guid 可以有 3.1962657931507848761677563491821e+38 个离散值,而 Ascii-85 的 20 个字符可以有 3.8759531084514355873123178482056e+38 个离散值。

当写入数据库时​​,我对截断有一些担忧,因此编码中不包含空白字符。我也遇到了 collation 的问题,我通过从编码中排除小写字符来解决这个问题。此外,它只会通过 paramaterized command 传递。 , 所以任何特殊的 SQL 字符都会被自动转义。

我已经包含了执行 Ascii-85 编码和解码的 C# 代码,以防它对任何人有帮助。显然,根据您的使用情况,您可能需要选择不同的字符集,因为我的限制使我选择了一些不常见的字符,如“ß”和“Ø”——但这是简单的部分:

/// <summary>
/// This code implements an encoding scheme that uses 85 printable ascii characters
/// to encode the same volume of information as contained in a Guid.
///
/// Ascii-85 can represent 4 binary bytes as 5 Ascii bytes. So a 16 byte Guid can be
/// represented in 20 Ascii bytes. A Guid can have
/// 3.1962657931507848761677563491821e+38 discrete values whereas 20 characters of
/// Ascii-85 can have 3.8759531084514355873123178482056e+38 discrete values.
///
/// Lower-case characters are not included in this encoding to avoid collation
/// issues.
/// This is a departure from standard Ascii-85 which does include lower case
/// characters.
/// In addition, no whitespace characters are included as these may be truncated in
/// the database depending on the storage mechanism - ie VARCHAR vs CHAR.
/// </summary>
internal static class Ascii85
{
/// <summary>
/// 85 printable ascii characters with no lower case ones, so database
/// collation can't bite us. No ' ' character either so database can't
/// truncate it!
/// Unfortunately, these limitation mean resorting to some strange
/// characters like 'Æ' but we won't ever have to type these, so it's ok.
/// </summary>
private static readonly char[] kEncodeMap = new[]
{
'0','1','2','3','4','5','6','7','8','9', // 10
'A','B','C','D','E','F','G','H','I','J', // 20
'K','L','M','N','O','P','Q','R','S','T', // 30
'U','V','W','X','Y','Z','|','}','~','{', // 40
'!','"','#','$','%','&','\'','(',')','`', // 50
'*','+',',','-','.','/','[','\\',']','^', // 60
':',';','<','=','>','?','@','_','¼','½', // 70
'¾','ß','Ç','Ð','€','«','»','¿','•','Ø', // 80
'£','†','‡','§','¥' // 85
};

/// <summary>
/// A reverse mapping of the <see cref="kEncodeMap"/> array for decoding
/// purposes.
/// </summary>
private static readonly IDictionary<char, byte> kDecodeMap;

/// <summary>
/// Initialises the <see cref="kDecodeMap"/>.
/// </summary>
static Ascii85()
{
kDecodeMap = new Dictionary<char, byte>();

for (byte i = 0; i < kEncodeMap.Length; i++)
{
kDecodeMap.Add(kEncodeMap[i], i);
}
}

/// <summary>
/// Decodes an Ascii-85 encoded Guid.
/// </summary>
/// <param name="ascii85Encoding">The Guid encoded using Ascii-85.</param>
/// <returns>A Guid decoded from the parameter.</returns>
public static Guid Decode(string ascii85Encoding)
{
// Ascii-85 can encode 4 bytes of binary data into 5 bytes of Ascii.
// Since a Guid is 16 bytes long, the Ascii-85 encoding should be 20
// characters long.
if(ascii85Encoding.Length != 20)
{
throw new ArgumentException(
"An encoded Guid should be 20 characters long.",
"ascii85Encoding");
}

// We only support upper case characters.
ascii85Encoding = ascii85Encoding.ToUpper();

// Split the string in half and decode each substring separately.
var higher = ascii85Encoding.Substring(0, 10).AsciiDecode();
var lower = ascii85Encoding.Substring(10, 10).AsciiDecode();

// Convert the decoded substrings into an array of 16-bytes.
var byteArray = new[]
{
(byte)((higher & 0xFF00000000000000) >> 56),
(byte)((higher & 0x00FF000000000000) >> 48),
(byte)((higher & 0x0000FF0000000000) >> 40),
(byte)((higher & 0x000000FF00000000) >> 32),
(byte)((higher & 0x00000000FF000000) >> 24),
(byte)((higher & 0x0000000000FF0000) >> 16),
(byte)((higher & 0x000000000000FF00) >> 8),
(byte)((higher & 0x00000000000000FF)),
(byte)((lower & 0xFF00000000000000) >> 56),
(byte)((lower & 0x00FF000000000000) >> 48),
(byte)((lower & 0x0000FF0000000000) >> 40),
(byte)((lower & 0x000000FF00000000) >> 32),
(byte)((lower & 0x00000000FF000000) >> 24),
(byte)((lower & 0x0000000000FF0000) >> 16),
(byte)((lower & 0x000000000000FF00) >> 8),
(byte)((lower & 0x00000000000000FF)),
};

return new Guid(byteArray);
}

/// <summary>
/// Encodes binary data into a plaintext Ascii-85 format string.
/// </summary>
/// <param name="guid">The Guid to encode.</param>
/// <returns>Ascii-85 encoded string</returns>
public static string Encode(Guid guid)
{
// Convert the 128-bit Guid into two 64-bit parts.
var byteArray = guid.ToByteArray();
var higher =
((UInt64)byteArray[0] << 56) | ((UInt64)byteArray[1] << 48) |
((UInt64)byteArray[2] << 40) | ((UInt64)byteArray[3] << 32) |
((UInt64)byteArray[4] << 24) | ((UInt64)byteArray[5] << 16) |
((UInt64)byteArray[6] << 8) | byteArray[7];

var lower =
((UInt64)byteArray[ 8] << 56) | ((UInt64)byteArray[ 9] << 48) |
((UInt64)byteArray[10] << 40) | ((UInt64)byteArray[11] << 32) |
((UInt64)byteArray[12] << 24) | ((UInt64)byteArray[13] << 16) |
((UInt64)byteArray[14] << 8) | byteArray[15];

var encodedStringBuilder = new StringBuilder();

// Encode each part into an ascii-85 encoded string.
encodedStringBuilder.AsciiEncode(higher);
encodedStringBuilder.AsciiEncode(lower);

return encodedStringBuilder.ToString();
}

/// <summary>
/// Encodes the given integer using Ascii-85.
/// </summary>
/// <param name="encodedStringBuilder">The <see cref="StringBuilder"/> to
/// append the results to.</param>
/// <param name="part">The integer to encode.</param>
private static void AsciiEncode(
this StringBuilder encodedStringBuilder, UInt64 part)
{
// Nb, the most significant digits in our encoded character will
// be the right-most characters.
var charCount = (UInt32)kEncodeMap.Length;

// Ascii-85 can encode 4 bytes of binary data into 5 bytes of Ascii.
// Since a UInt64 is 8 bytes long, the Ascii-85 encoding should be
// 10 characters long.
for (var i = 0; i < 10; i++)
{
// Get the remainder when dividing by the base.
var remainder = part % charCount;

// Divide by the base.
part /= charCount;

// Add the appropriate character for the current value (0-84).
encodedStringBuilder.Append(kEncodeMap[remainder]);
}
}

/// <summary>
/// Decodes the given string from Ascii-85 to an integer.
/// </summary>
/// <param name="ascii85EncodedString">Decodes a 10 character Ascii-85
/// encoded string.</param>
/// <returns>The integer representation of the parameter.</returns>
private static UInt64 AsciiDecode(this string ascii85EncodedString)
{
if (ascii85EncodedString.Length != 10)
{
throw new ArgumentException(
"An Ascii-85 encoded Uint64 should be 10 characters long.",
"ascii85EncodedString");
}

// Nb, the most significant digits in our encoded character
// will be the right-most characters.
var charCount = (UInt32)kEncodeMap.Length;
UInt64 result = 0;

// Starting with the right-most (most-significant) character,
// iterate through the encoded string and decode.
for (var i = ascii85EncodedString.Length - 1; i >= 0; i--)
{
// Multiply the current decoded value by the base.
result *= charCount;

// Add the integer value for that encoded character.
result += kDecodeMap[ascii85EncodedString[i]];
}

return result;
}
}

此外,这是单元测试。它们没有我想要的那么彻底,而且我不喜欢使用 Guid.NewGuid() 的不确定性,但它们应该可以帮助您入门:

/// <summary>
/// Tests to verify that the Ascii-85 encoding is functioning as expected.
/// </summary>
[TestClass]
[UsedImplicitly]
public class Ascii85Tests
{
[TestMethod]
[Description("Ensure that the Ascii-85 encoding is correct.")]
[UsedImplicitly]
public void CanEncodeAndDecodeAGuidUsingAscii85()
{
var guidStrings = new[]
{
"00000000-0000-0000-0000-000000000000",
"00000000-0000-0000-0000-0000000000FF",
"00000000-0000-0000-0000-00000000FF00",
"00000000-0000-0000-0000-000000FF0000",
"00000000-0000-0000-0000-0000FF000000",
"00000000-0000-0000-0000-00FF00000000",
"00000000-0000-0000-0000-FF0000000000",
"00000000-0000-0000-00FF-000000000000",
"00000000-0000-0000-FF00-000000000000",
"00000000-0000-00FF-0000-000000000000",
"00000000-0000-FF00-0000-000000000000",
"00000000-00FF-0000-0000-000000000000",
"00000000-FF00-0000-0000-000000000000",
"000000FF-0000-0000-0000-000000000000",
"0000FF00-0000-0000-0000-000000000000",
"00FF0000-0000-0000-0000-000000000000",
"FF000000-0000-0000-0000-000000000000",
"FF000000-0000-0000-0000-00000000FFFF",
"00000000-0000-0000-0000-0000FFFF0000",
"00000000-0000-0000-0000-FFFF00000000",
"00000000-0000-0000-FFFF-000000000000",
"00000000-0000-FFFF-0000-000000000000",
"00000000-FFFF-0000-0000-000000000000",
"0000FFFF-0000-0000-0000-000000000000",
"FFFF0000-0000-0000-0000-000000000000",
"00000000-0000-0000-0000-0000FFFFFFFF",
"00000000-0000-0000-FFFF-FFFF00000000",
"00000000-FFFF-FFFF-0000-000000000000",
"FFFFFFFF-0000-0000-0000-000000000000",
"00000000-0000-0000-FFFF-FFFFFFFFFFFF",
"FFFFFFFF-FFFF-FFFF-0000-000000000000",
"FFFFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFF",
"1000000F-100F-100F-100F-10000000000F"
};

foreach (var guidString in guidStrings)
{
var guid = new Guid(guidString);
var encoded = Ascii85.Encode(guid);

Assert.AreEqual(
20,
encoded.Length,
"A guid encoding should not exceed 20 characters.");

var decoded = Ascii85.Decode(encoded);

Assert.AreEqual(
guid,
decoded,
"The guids are different after being encoded and decoded.");
}
}

[TestMethod]
[Description(
"The Ascii-85 encoding is not susceptible to changes in character case.")]
[UsedImplicitly]
public void Ascii85IsCaseInsensitive()
{
const int kCount = 50;

for (var i = 0; i < kCount; i++)
{
var guid = Guid.NewGuid();

// The encoding should be all upper case. A reliance
// on mixed case will make the generated string
// vulnerable to sql collation.
var encoded = Ascii85.Encode(guid);

Assert.AreEqual(
encoded,
encoded.ToUpper(),
"The Ascii-85 encoding should produce only uppercase characters.");
}
}
}

我希望这能为某些人省去一些麻烦。

此外,如果您发现任何错误,请告诉我 ;-)

关于algorithm - 将任意 GUID 编码为可读的 ASCII (33-127) 的最有效方法是什么?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/2827627/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com