pointers - 在 64 位指针中使用额外的 16 位-6ren

pointers - 在 64 位指针中使用额外的 16 位

转载作者：行者123 更新时间：2023-12-02 20:13:56

25

4

我读到了 a 64-bit machine actually uses only 48 bits of address (具体来说，我使用的是英特尔酷睿 i7)。

我希望额外的 16 位(位 48-63)与地址无关，并且会被忽略。但是当我尝试访问这样的地址时，我收到了一个信号 EXC_BAD_ACCESS .

我的代码是:

int *p1 = &val;
int *p2 = (int *)((long)p1 | 1ll<<48);//set bit 48, which should be irrelevant
int v = *p2; //Here I receive a signal EXC_BAD_ACCESS.

为什么会这样？有没有办法使用这 16 位？

这可用于构建对缓存更友好的链表。代替下一个 ptr 使用 8 个字节，键使用 8 个字节(由于对齐限制)，可以将键嵌入到指针中。

最佳答案

保留高位以备将来增加地址总线时使用，因此您不能像那样简单地使用它

The AMD64 architecture defines a 64-bit virtual address format, of which the low-order 48 bits are used in current implementations (...) The architecture definition allows this limit to be raised in future implementations to the full 64 bits, extending the virtual address space to 16 EB (2⁶⁴ bytes). This is compared to just 4 GB (2³² bytes) for the x86.

^{http://en.wikipedia.org/wiki/X86-64#Architectural_features}

更重要的是，根据同一篇文章[强调我的]:

... in the first implementations of the architecture, only the least significant 48 bits of a virtual address would actually be used in address translation (page table lookup). Further, bits 48 through 63 of any virtual address must be copies of bit 47 (in a manner akin to sign extension), or the processor will raise an exception. Addresses complying with this rule are referred to as "canonical form."

由于即使高位未使用，CPU 也会检查它们，因此它们并不是真正“无关紧要”。在使用指针之前，您需要确保地址是规范的。其他一些 64 位架构(如 ARM64)可以选择忽略高位，因此您可以更轻松地将数据存储在指针中。

也就是说，在 x86_64 中，您仍然是 如果需要，可以免费使用高 16 位 (如果虚拟地址不超过 48 位，见下文)，但你必须通过 sign-extending 检查并修复指针值在取消引用之前。
请注意，将指针值转换为 long是 不是正确的做法因为 long不能保证足够宽以存储指针。您需要使用 uintptr_t or intptr_t .

int *p1 = &val; // original pointer
uint8_t data = ...;
const uintptr_t MASK = ~(1ULL << 48);

// === Store data into the pointer ===
// Note: To be on the safe side and future-proof (because future implementations
//     can increase the number of significant bits in the pointer), we should
//     store values from the most significant bits down to the lower ones
int *p2 = (int *)(((uintptr_t)p1 & MASK) | (data << 56));

// === Get the data stored in the pointer ===
data = (uintptr_t)p2 >> 56;

// === Deference the pointer ===
// Sign extend first to make the pointer canonical
// Note: Technically this is implementation defined. You may want a more
//     standard-compliant way to sign-extend the value
intptr_t p3 = ((intptr_t)p2 << 16) >> 16;
val = *(int*)p3;

WebKit's JavaScriptCore and Mozilla's SpiderMonkey engine以及 LuaJIT在 nan-boxing technique 中使用它.如果值为 NaN，则低 48 位将存储指向对象的指针，高 16 位用作标记位，否则为 double 值。
以前 Linux also uses the 63^rd bit of the GS base address指示该值是否由内核写入
实际上，您通常也可以使用第 48 位。因为大多数现代 64 位操作系统将内核空间和用户空间一分为二，所以第 47 位始终为零，您可以免费使用 17 位高位

您也可以使用低位来存储数据。它被称为 tagged pointer .如 int是 4 字节对齐的，那么 2 个低位始终为 0，您可以像在 32 位架构中一样使用它们。对于 64 位值，您可以使用 3 个低位，因为它们已经是 8 字节对齐的。同样，您还需要在取消引用之前清除这些位。

int *p1 = &val; // the pointer we want to store the value into
int tag = 1;
const uintptr_t MASK = ~0x03ULL;

// === Store the tag ===
int *p2 = (int *)(((uintptr_t)p1 & MASK) | tag);

// === Get the tag ===
tag = (uintptr_t)p2 & 0x03;

// === Get the referenced data ===
// Clear the 2 tag bits before using the pointer
intptr_t p3 = (uintptr_t)p2 & MASK;
val = *(int*)p3;

一个著名的用户是带有 SMI (small integer) optimization 的 V8 引擎。 .地址中的最低位将用作类型的标记:

如果是 1 ，该值是指向实际数据(对象、浮点数或更大的整数)的指针。下一个较高位 (w) 表示指针是弱的还是强的。只需清除标记位并取消引用它

如果是 0 ，它是一个小整数。在带有指针压缩的 32 位 V8 或 64 位 V8 中，它是一个 31 位 int，执行有符号右移 1 以恢复该值；在没有指针压缩的 64 位 V8 中，它的上半部分是一个 32 位 int

   32-bit V8
                           |----- 32 bits -----|
   Pointer:                |_____address_____w1|
   Smi:                    |___int31_value____0|
   
   64-bit V8
               |----- 32 bits -----|----- 32 bits -----|
   Pointer:    |________________address______________w1|
   Smi:        |____int32_value____|0000000000000000000|

https://v8.dev/blog/pointer-compression

因此，正如下面评论的那样，英特尔发布了 PML5它提供了一个 57-bit virtual address space , 如果你在这样的系统上，你只能使用 7 个高位
不过，您仍然可以使用一些变通办法来获得更多免费位。首先，您可以尝试在 64 位操作系统中使用 32 位指针。在 Linux 中，如果允许 x32abi，则指针只有 32 位长。在 Windows 中只需清除 /LARGEADDRESSAWARE标志和指针现在只有 32 位有效位，您可以根据需要使用高 32 位。见 How to detect X32 on Windows? .另一种方法是使用一些 pointer compression技巧: How does the compressed pointer implementation in V8 differ from JVM's compressed Oops?
您可以通过请求操作系统仅在低区域分配内存来进一步获得更多位。例如，如果您可以确保您的应用程序永远不会使用超过 64MB 的内存，那么您只需要一个 26 位地址。如果所有分配都是 32 字节对齐的，那么您还有 5 位可以使用，这意味着您可以在指针中存储 64 - 21 = 43 位信息!
我猜 ZGC就是一个例子。它仅使用 42 位进行寻址，这允许 242 字节 = 4 × 240 字节 = 4 TB

ZGC therefore just reserves 16TB of address space (but not actually uses all of this memory) starting at address 4TB.

A first look into ZGC

它使用指针中的位，如下所示:

 6                 4 4 4  4 4                                             0
 3                 7 6 5  2 1                                             0
+-------------------+-+----+-----------------------------------------------+
|00000000 00000000 0|0|1111|11 11111111 11111111 11111111 11111111 11111111|
+-------------------+-+----+-----------------------------------------------+
|                   | |    |
|                   | |    * 41-0 Object Offset (42-bits, 4TB address space)
|                   | |
|                   | * 45-42 Metadata Bits (4-bits)  0001 = Marked0
|                   |                                 0010 = Marked1
|                   |                                 0100 = Remapped
|                   |                                 1000 = Finalizable
|                   |
|                   * 46-46 Unused (1-bit, always zero)
|
* 63-47 Fixed (17-bits, always zero)

有关如何执行此操作的更多信息，请参阅

Allocating Memory Within A 2GB Range

How can I ensure that the virtual memory address allocated by VirtualAlloc is between 2-4GB

Allocate at low memory address

How to malloc in address range > 4 GiB

Custom heap/memory allocation ranges

旁注:与指针相比，对于键值很小的情况使用链表是一种巨大的内存浪费，而且由于缓存位置不好，它也会变慢。事实上，你不应该在大多数现实生活中的问题中使用链表

Bjarne Stroustrup says we must avoid linked lists

Why you should never, ever, EVER use linked-list in your code again

Number crunching: Why you should never, ever, EVER use linked-list in your code again

Bjarne Stroustrup: Why you should avoid Linked Lists

Are lists evil?—Bjarne Stroustrup

关于pointers - 在 64 位指针中使用额外的 16 位，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/16198700/

25

4

0

文章推荐： C# 实现接口(interface)定义的派生类型？

文章推荐： file - Notepad++ 崩溃时丢失代码行

文章推荐： google-analytics - 如何在 SO 上的问答中添加实时分析？

c - *指针 = - *指针？
我刚接触 C 语言几周，所以对它还很陌生。我见过这样的事情 * (variable-name) = -* (variable-name) 在讲义中，但它到底会做什么？它会否定所指向的值吗？最佳答案
c - void 指针 = int 指针 = float 指针
我有一个指向内存地址的void 指针。然后，我做 int 指针 = void 指针 float 指针 = void 指针然后，取消引用它们以获取值。 { int x = 25; vo
c++ - 需要将char*(指针)转换为wchar_t*(指针)
我正在与计算机控制的泵进行一些串行端口通信，我用来通信的 createfile 函数需要将 com 端口名称解析为 wchar_t 指针。我也在使用 QT 创建一个表单并获取 com 端口名称作为
C 指针。将大数赋值给 char * 指针
#include "stdio.h" #include "malloc.h" int main() { char*x=(char*)malloc(1024); *(x+2)=3; --
c - int 指针转换为 void 指针，然后转换为 double 指针
#include #include main() { int an_int; void *void_pointer = &an_int; double *double_ptr = void
从 C 中的 3D 指针/数组调用 2D 指针/数组
对于每个时间步长，我都有一个二维矩阵 a[ix][iz]，ix 从 0 到 nx-1 和 iz 从 0 到 nz-1。为了组装所有时间步长的矩阵，我定义了一个长度为 nx*nz*nt 的 3D 指针
c - 释放已分配给 char 指针(字符串)数组的内存。我必须释放每个字符串还是只释放 "main"指针？
我有一个函数，它接受一个指向 char ** 的指针并用字符串填充它(我猜是一个字符串数组)。 *list_of_strings* 在函数内部分配内存。 char * *list_of_strings
c - 使用 malloc 初始化 char 指针 VS 不使用 malloc 直接将字符串赋给 char 指针
我试图了解当涉及到字符和字符串时，内存分配是如何工作的。我知道声明的数组的名称就像指向数组第一个元素的指针，但该数组将驻留在内存的堆栈中。另一方面，当我们想要使用内存堆时，我们使用 malloc，
c# - 通过 P/Ivoke 在 C# 中传递 Struct 指针(主体中带有 char 指针)
我有一个 C 语言的 .DLL 文件。该 DLL 中所有函数所需的主要结构具有以下形式。 typedef struct { char *snsAccessID; char *
指针,C语言的精髓
指针, C语言的精髓莫队先咕几天, 容我先讲完树剖 (因为后面树上的东西好多都要用树剖求 LCA). 什么是指针保存变量地址的变量叫做指针. 这是大概的定义, 但是Defad认为
javascript使递归数组遍历并更新一些内部值(指针)
我得到了以下数组: let arr = [ { children: [ { children: [], current: tru
C程序输出困惑(指针)
#include int main(void) { int i; int *ptr = (int *) malloc(5 * sizeof(int)); for (i=0;
c程序，指针
我正在编写一个程序，它接受一个三位数整数并将其分成两个整数。 224 将变为 220 和 4。 114 将变为 110 和 4。基本上，您可以使用模数来完成。我写了我认为应该工作的东西，编译器一直说
循环中指向int的C++指针？
好吧，我对 C++ 很陌生，我确定这个问题已经在某个地方得到了回答，而且也很简单，但我似乎找不到答案.... 我有一个自定义数组类，我将其用作练习来尝试了解其工作原理，其定义如下: 标题: class
C++ `this` 指针
1) this 指针与其他指针有何不同？据我了解，指针指向堆中的内存。如果有指向它们的指针，这是否意味着对象总是在堆中构造？ 2)我们可以在 move 构造函数或 move 赋值中窃取this指针吗？
C结构，指针
这个问题在这里已经有了答案: 关闭 11 年前。 Possible Duplicate: C : pointer to struct in the struct definition 在我的初学者类
复制多维数组(指针)
我有两个指向指针的结构指针 typedef struct Square { ... ... }Square; Square **s1; //Representing 2D array of say,
c变量分配内存，指针
变量在内存中是如何定位的？我有这个代码 int w=1; int x=1; int y=1; int z=1; int main(int argc, char** argv) { printf
c编程语言，指针
#include #include main() { char *q[]={"black","white","red"}; printf("%s",*q+3); getch()
指向类成员的模板函数的C++指针
我在“C”类中有以下函数 class C { template void Func1(int x); template void Func2(int x); }; template void

首页

博学

6Ren·AI

商城

pointers - 在 64 位指针中使用额外的 16 位