将数字字符串转换为以二进制表示的字节数组-6ren

将数字字符串转换为以二进制表示的字节数组

转载作者：塔克拉玛干更新时间：2023-11-03 04:27:25

我想将只包含整数的字符串转换为字节数组，但要有效地存储(所以没有“digits[digitIndex] = string[digitIndex - ‘0’;”)。我希望它们像存储任何类型一样存储:每个字节有 256 种不同的可能性，而不仅仅是前面的错误示例中的 10 种。它还需要保存很多数字(我使用 8 位参数作为大小，所以我相信至少 100 个数字)。编辑:出于个人原因，我也不想使用任何库。

以下是它在函数中的外观示例:

int8_t *stringToBigInt(char *input) {
    uint8_t digitsBase10 = strlen(input);
    uint8_t bytes = ???; //However many bytes to store the result (max 255 bytes in this case)
    int8_t *result = malloc(sizeof(void *) + bytes);
    ... //Code for setting result to input
    return result;
}

这是一个可能的输入和输出示例:

编辑:这是一个简短的示例，仅为简单起见适合 32 位；输入可能远不止一个 32 位(也可能是 64 位)整数

输入:“1234567890”

输出:{01001001, 10010110, 00000010, 11010010}

最佳答案

这是从 base-10 到 base-256 的基本转换，因此就算法而言，这就是您应该寻找的。对于一个简单的实现，首先在字符串上实现长除以 2 的幂。然后将每个余数转换为一个字节:这些字节形成您的输出。您需要重复划分输入，并且每个 8 位余数的字符串形成 base-256 字节，从最低有效位开始(一个字节是一个 base-256 位)。重复除法意味着您将前一个除法的商作为被除数提供给下一个除法。

有一些很酷的算法可以将基数为 10 的数字除以 2 的幂，比广义长除法运算更快且更简单。作为提示，让我们举个例子:510。我们将每个数字除以二，并将余数 *5 提供给下一个数字。让我们去掉小于 0.5 的小数部分:510 变为 2*100 + 5*10 + 5。然后 1*100 + 2*10 + 2 dot 5。然后 6*10 + 1。然后 3*10 dot 5, 2* 10 + 5，然后 1*10 + 2 点 5，然后 6，然后 3，然后 2 点 5，然后 1，然后 0 点 5。

对于 255，我们将得到 127.5、63.5、15.5、7.5、3.5、1.5、0.5。

除以二的更高因子是可能的，但需要重复长加法。例如。 33 格 4 = 0*10 + 7rem1 + 0 rem 0.75(哈哈!)。除以 2 效果更好，因为我们使用了 10=2*5 的事实，并且 base-n 表示法可以很容易地除以基数的因子，而无需执行长加法:所有操作都限于两个相邻的数字，所以它是线性的以位数为单位的成本为 N 的时间过程。但是对于base-256的base转换，你需要重复除法，所以成本是~0.5N^2。易于实现但计算成本高。

当然，还有比这更好的算法。但是上面可以简洁地实现——即使是质量相当好的库函数的形式:

首先，让我们定义一个字节数组类型，以及一种将其转储为人类可读的十六进制输出的方法。为方便起见，对象通过指向其数据的指针来引用，实现细节根本不在接口(interface)中的任何位置。构造函数new_Bytes零初始化数组。还有一种方法可以将数组视为位数组，按 Lest-endian 排序(LSB 优先)，并设置(打开)给定位。

// https://github.com/KubaO/stackoverflown/tree/master/questions/decimal-to-binary-54422895
#include <assert.h>
#include <inttypes.h>
#include <stdbool.h>
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

// Bytes Class Interface

typedef uint8_t *Bytes;
typedef const uint8_t *cBytes;

Bytes new_Bytes(size_t size);
size_t Bytes_size(cBytes bytes);
void Bytes_truncate(Bytes bytes, size_t new_size);
void free_Bytes(cBytes bytes);
char *Bytes_to_hex(cBytes bytes);

static inline void Bytes_set_bit(Bytes const bytes, size_t const bit_num) {
   bytes[bit_num / 8] |= 1 << (bit_num % 8);
}

然后，就地执行除以 2，标志提供基本转换所需的附加信息——尤其是余数。从基数 10 到基数 256 的转换使用除法并返回新的 Bytes大批。

// Division and Base Conversion Interface

typedef enum {
   REMAINDER = 1,           /* there is a non-zero remainder */
   ZERO = 2,                /* the quotient is zero or null */
   NULL_DECIMAL = 4,        /* the dividend is null or empty */
   NON_DECIMALS = 8,        /* division was terminated on non-decimal characters */
   LEADING_ZERO_COUNT = 16, /* count of leading zeroes in the quotient */
   LEADING_ZERO_COUNT_MASK = ~(LEADING_ZERO_COUNT - 1),
   CLR_CARRY_MASK = ~REMAINDER,
   CLR_ZERO_MASK = ~ZERO,
} DivFlags;

DivFlags divide_by_2(char *decimal);
Bytes base_10_to_256(const char *decimal);

除法以十进制表示形式进行，从最高有效位到最低有效位。每个数字与前一个数字除法的余数合并，然后除以 2。余数在数字除法之间进行。除最低有效位后，余数在标志中输出。

这些标志大多是不言自明的，但 LEADING_ZERO_COUNT不完全——因此对它的访问是通过访问器函数实现的。 LEADING_ZERO_COUNT是前导零计数的单位。当除法通过十进制表示时，它将计算前导零，将它们乘以该单位，并将其与标志合并。为了提取计数，标志除以单位。

// Division and Base Conversion Implementation

static inline int leading_zero_count(DivFlags const flags) {
   return (flags & LEADING_ZERO_COUNT_MASK) / LEADING_ZERO_COUNT;
}

static inline void saturated_inc_leading_zero_count(DivFlags *flags) {
   if ((*flags & LEADING_ZERO_COUNT_MASK) != LEADING_ZERO_COUNT_MASK)
      *flags += LEADING_ZERO_COUNT;
}

DivFlags divide_by_2(char *decimal) {
   DivFlags flags = ZERO;
   if (!decimal) return flags | NULL_DECIMAL;
   char c;
   while ((c = *decimal)) {
      if (c < '0' || c > '9') return flags | NON_DECIMALS;
      c = c - '0' + ((flags & REMAINDER) ? 10 : 0);
      if (c & 1)
         flags |= REMAINDER;
      else
         flags &= CLR_CARRY_MASK;
      c >>= 1;
      assert(c >= 0 && c <= 9);
      if (c)
         flags &= CLR_ZERO_MASK;
      else if (flags & ZERO)
         saturated_inc_leading_zero_count(&flags);
      *decimal++ = c + '0';
   }
   return flags;
}

然后，base 转换执行重复除以 2，并将余数位移动到字节数组中，如下所示:

首先，基本转换获取十进制表示的副本，并分配适当大小的输出字节数组。

static void base_10_to_256_impl(Bytes const bytes, char *decimal);

Bytes base_10_to_256(const char *const decimal) {
   assert(decimal);
   size_t const dec_len = strlen(decimal);
   char *const dec_buf = malloc(dec_len + 1);
   if (!dec_buf) return NULL;
   memcpy(dec_buf, decimal, dec_len + 1);

   size_t const BASE_RATIO_NUM = 416, /* ceil(log(10)/log(256)*1000) */
       BASE_RATIO_DENOM = 1000;
   assert(dec_len <= (SIZE_MAX / BASE_RATIO_NUM));
   size_t const len = (size_t)(dec_len * BASE_RATIO_NUM / BASE_RATIO_DENOM) + 1;
   Bytes const bytes = new_Bytes(len);  // little-endian
   if (bytes) base_10_to_256_impl(bytes, dec_buf);
   free(dec_buf);
   return bytes;
}

然后，在实现的“肉”中，函数迭代输出位，重复将十进制表示除以 2，并将每个位设置为余数位的值。

static void base_10_to_256_impl(Bytes const bytes, char *decimal) {
   size_t const len = Bytes_size(bytes);
   for (size_t bit_num = 0;; bit_num++) {
      DivFlags const flags = divide_by_2(decimal);
      assert(!(flags & NULL_DECIMAL));
      decimal += leading_zero_count(flags);
      if (flags & ZERO && !(flags & REMAINDER)) {
         size_t const new_len = ((bit_num + 7) / 8);
         Bytes_truncate(bytes, new_len);
         break;
      }
      // here, there are still non-zero bits - in the dec[imal] and/or in the carry
      assert((bit_num / 8) < len);
      if (flags & REMAINDER) Bytes_set_bit(bytes, bit_num);
   }
}

我们现在可以添加一些测试:

// Tests

void check_bytes(const char *const decimal, const char *const bytes_expected,
                 size_t const bytes_len, const char *const hex_expected) {
   cBytes const bytes = base_10_to_256(decimal);
   assert(bytes && Bytes_size(bytes) == bytes_len);
   assert(memcmp(bytes, bytes_expected, bytes_len) == 0);
   char *const hex = Bytes_to_hex(bytes);
   assert(hex && strcmp(hex, hex_expected) == 0);
   printf("%s\n", hex);
   free(hex);
   free_Bytes(bytes);
}

int main() {
   check_bytes("4294967297" /* 2^32+1 */, "\1\0\0\0\1", 5, "01 00000001");
   check_bytes("4294967296" /* 2^32   */, "\0\0\0\0\1", 5, "01 00000000");
   check_bytes("4294967295" /* 2^32-1 */, "\xFF\xFF\xFF\xFF", 4, "FFFFFFFF");
   check_bytes("16777217" /* 2^24+1 */, "\1\0\0\1", 4, "01000001");
   check_bytes("16777216" /* 2^24   */, "\0\0\0\1", 4, "01000000");
   check_bytes("16777215" /* 2^24-1 */, "\xFF\xFF\xFF", 3, "FFFFFF");
   check_bytes("256", "\0\1", 2, "0100");
   check_bytes("255", "\xFF", 1, "FF");
   check_bytes("254", "\xFE", 1, "FE");
   check_bytes("253", "\xFD", 1, "FD");
   check_bytes("3", "\3", 1, "03");
   check_bytes("2", "\2", 1, "02");
   check_bytes("1", "\1", 1, "01");
   check_bytes("0", "\0", 1, "00");
}

Bytes的执行类结束示例:

// Bytes Implementation

struct BytesImpl {
   size_t size;
   uint8_t data[1];
};
static const size_t Bytes_header_size = offsetof(struct BytesImpl, data);
_Static_assert(offsetof(struct BytesImpl, data) == sizeof(size_t),
               "unexpected layout of struct BytesImpl");

Bytes new_Bytes(size_t size) {
   assert(size <= SIZE_MAX - Bytes_header_size);
   if (!size) size++;
   struct BytesImpl *const impl = calloc(Bytes_header_size + size, 1);
   if (!impl) return NULL;
   impl->size = size;
   return &impl->data[0];
}

static const struct BytesImpl *Bytes_get_const_impl_(cBytes const bytes) {
   return (const struct BytesImpl *)(const void *)((const char *)bytes -
                                                   Bytes_header_size);
}

static struct BytesImpl *Bytes_get_impl_(Bytes const bytes) {
   return (struct BytesImpl *)(void *)((char *)bytes - Bytes_header_size);
}

size_t Bytes_size(cBytes const bytes) { return Bytes_get_const_impl_(bytes)->size; }

void Bytes_truncate(Bytes const bytes, size_t new_size) {
   size_t *const size = &Bytes_get_impl_(bytes)->size;
   if (!new_size) {
     new_size++;  // we always leave one byte in the array
     bytes[0] = 0;
   }
   assert(*size);
   if (*size <= new_size) return;
   *size = new_size;
}

void free_Bytes(cBytes const bytes) {
   if (bytes) free((void *)(intptr_t)(const void *)Bytes_get_const_impl_(bytes));
}

char *Bytes_to_hex(cBytes const bytes) {
   size_t n = Bytes_size(bytes);
   size_t spaces = (n - 1) / 4;
   char *const out = malloc(n * 2 + spaces + 1);
   if (out)
      for (char *o = out; n;) {
         uint8_t const c = bytes[n - 1];
         snprintf(o, 3, "%02" PRIX8, c);
         o += 2;
         n--;
         if (n && n % 4 == 0) {
            assert(spaces);
            *o++ = ' ';
            spaces--;
         }
      }
   return out;
}

关于将数字字符串转换为以二进制表示的字节数组，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/54421779/

文章推荐： java - 精确的大 O 执行计数

文章推荐： java - 最高效但线程安全的列表/集合

文章推荐： java - 需要帮助从 web.xml spring boot 转换

javascript - 正则表达式匹配字符/数字/数字/数字
fiddle :http://jsfiddle.net/rtucgv74/ 我正在尝试将第一个字符与 3 位数字匹配。所以下面的代码应该提醒f234。但反而返回 null ？源代码: var reg
asp正则表达式匹配数字$数字$数字$
复制代码代码如下: Dim strOk,strNo strOk = "12312321$12
c#数字/数字/字符串模式的正则表达式
我想找 {a number} / { a number } / {a string}模式。我可以得到number / number工作，但是当我添加 / string它不是。我试图找到的例子: 15
java - 数字.数字.数字的模式表达式？
我，我正在做一个模式正则表达式来检查字符串是否是: 数字.数字.数字，如下所示: 1.1.1 0.20.2 58.55541.5221 在java中我使用这个: private static Patt
python - 检查字符串是否包含python中的数字/数字/数字
我有一个字符串，我需要检查它是否在字符串的末尾包含一个数字/数字，并且需要将该数字/数字递增到字符串末尾 +1 我会得到下面的字符串 string2 = suppose_name_1 string3
java - (数字/数字)的正则表达式
我正在寻找一个正则表达式 (数字/数字)，如(1/2) 数字必须是 1-3 位数字。我使用 Java。我认为我的问题比正则表达式更深。我无法让这个工作 String s ="(1/15)";
typescript [数字，数字]与数字[]
谁能帮我理解为什么我在使用以下代码时会出现类型错误: function sumOfTwoNumbersInArray(a: [number, number]) { return a[0] +
google-apps-script - Apps 脚本错误 : Cannot find method getRange(number, 数字、数字、数字)
我看到有些人过去也遇到过类似的问题，但他们似乎只是不同，所以解决方案也有所不同。所以这里是: 我正在尝试在 Google Apps 脚本中返回工作表的已知尺寸范围，如下所示: var myRange
Python - 如何将此模式(数字/数字)与正则表达式匹配？
我试图了解python中的正则表达式模块。我试图让我的程序从用户输入的一行文本中匹配以下模式: 8-13 之间的数字“/” 0-15 之间的数字例如:8/2、11/13、10/9 等。我想出的模式
java - 如何将扫描仪输入拆分为(数字)(带空格的字符串)(数字)
简单地说，我当前正在开发的程序要求我拆分扫描仪输入(例如:2 个火腿和奶酪 5.5)。它应该读取杂货订单并将其分成三个数组。我应该使用 string.split 并能够将此输入分成三部分，而不管中间字
c++ - (数字)和(-数字)的含义
(number) & (-number) 是什么意思？我已经搜索过了，但无法找到含义我想在 for 循环中使用 i & (-i)，例如: for (i = 0; i 110000 .对于i没有高于
javascript - 数字 = parseInt(数字);需要从 rel 属性中获取非数字
需要将图像ID设置为数字 var number = $(this).attr('rel'); number = parseInt(number); $('#carousel .slid
typescript - Typescript 可以确保数组具有重复的类型模式吗？例如[字符串，数字，字符串，数字，....(永远)]
我有一个函数，我想确保它接受一个字符串，后跟一个数字。并且可选地，更多的字符串数字对。就像一个元组，但“无限”次: const fn = (...args: [string, number] | [s
javascript - html 输入类型更改=数字 "available"值。还将更改另一个输入类型=数字 "Total"
我想复制“可用”输入数字的更改并将其添加或减去到“总计”中如果此人将“可用”更改为“3”，则“总计”将变为“9”。如果用户将“可用”更改为“5”，则“总计”将变为“11”。 $('#id1').b
r - 如何在 R 中的(字符/数字)和(字符/数字)类型之间进行换行
我有一个与 R 中的断线相关的简单问题。我正在尝试粘贴，但在获取(字符/数字)之间的断线时遇到问题。请注意，这些值包含在向量中(V1=81,V2=55,V3=25)我已经尝试过这段代码: cat(p
c++ - 数字 xor K - K = 数字 + K xor K，为什么？
很难说出这里问的是什么。这个问题是含糊的、模糊的、不完整的、过于宽泛的或修辞性的，无法以目前的形式得到合理的回答。如需帮助澄清此问题以便重新打开它，visit the help center 。已关
angular - typescript 错误 "Argument of type ' 数字[ ]' is not assignable to parameter of type ' 数字'”
我在 Typescript 中收到以下错误: Argument of type 'number[]' is not assignable to parameter of type 'number' 我
JavaScript 数字
在本教程中，您将通过示例了解JavaScript 数字。在JavaScript中，数字是基本数据类型。例如， const a = 3; const b = 3.13; 与其他一些编程语言不同
JavaScript 数字
我在 MDN Reintroduction to JavaScript 上阅读JavaScript 数字只是浮点精度类型，JavaScript 中没有整数。然而 JavaScript 有两个函数，pa
Excel编程自动完成部分输入(数字)
我们在 Excel 中管理库存。我知道这有点过时，但我们正在发展商业公司，我们所有的钱都被困在业务上，没有钱投资 IT。所以我想知道我可以用Excel自动完成产品编号的方式进行编程吗？这是一个产品

塔克拉玛干

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

将数字字符串转换为以二进制表示的字节数组