gpt4 book ai didi

C-两个指针之间的转换行为

转载 作者:行者123 更新时间:2023-12-03 16:17:11 30 4
gpt4 key购买 nike

更新2020-12-11:感谢@“某些程序员老兄”的意见。
我的根本问题是我们的团队正在实现动态类型的存储引擎。我们分配了多个char数组[PAGE_SIZE]缓冲区,它们的和16对齐的用于存储动态类型的数据(没有固定的结构)。出于效率原因,我们无法执行字节编码或分配额外的空间来使用memcpy
由于已经确定了对齐方式(即16),剩下的就是使用指针的类型转换来访问指定类型的对象,例如:

int main() {
// simulate our 16-aligned malloc
_Alignas(16) char buf[4096];

// store some dynamic data:
*((unsigned long *) buf) = 0xff07;
*(((double *) buf) + 2) = 1.618;
}
但是我们的团队对此操作是否为未定义行为表示怀疑。

我读过很多类似的问题,例如
  • Why does -Wcast-align not warn about cast from char* to int* on x86?
  • How to cast char array to int at non-aligned position?
  • C undefined behavior. Strict aliasing rule, or incorrect alignment?
  • SEI CERT C C.S EXP36-C

  • 但是这些与我对C标准的解释不同,我想知道这是否是我的误解。
    主要的困惑是关于C11的 6.3.2.3 #7部分:

    A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned 68) for the referenced type, the behavior is undefined.

    68) In general, the concept ‘‘correctly aligned’’ is transitive: if a pointer to type A is correctly aligned for a pointer to type B, which in turn is correctly aligned for a pointer to type C, then a pointer to type A is correctly aligned for a pointer to type C.



    此处的 结果指针是指 指针对象还是 指针值
    我认为答案是 指针对象,但是更多答案似乎表明 指针值

    解释A:指针对象
    我的想法如下:指针本身就是一个对象。根据 6.2.5 #28,不同的指针可能具有不同的表示形式和对齐要求。因此,根据 6.3.2.3 #7,只要两个指针具有相同的对齐方式,就可以安全地转换它们而没有未定义的行为,但是不能保证可以将它们取消引用。
    在程序中表达这个想法:
    #include <stdio.h>

    int main() {
    char buf[4096];

    char *pc = buf;
    if (_Alignof(char *) == _Alignof(int *)) {
    // cast safely, because they have the same alignment requirement?
    int *pi = (int *) pc;
    printf("pi: %p\n", pi);
    } else {
    printf("char * and int * don't have the same alignment.\n");
    }
    }

    解释B:指针值
    但是,如果C11标准讨论的是引用类型的 指针值,而不是 指针对象。上面代码的对齐检查是没有意义的。
    在程序中表达这个想法:
    #include <stdio.h>

    int main() {
    char buf[4096];

    char *pc = buf;

    /*
    * undefined behavior, because:
    * align of char is 1
    * align of int is 4
    *
    * and we don't know whether the `value` of pc is 4-aligned.
    */
    int *pi = (int *) pc;
    printf("pi: %p\n", pi);
    }

    哪种解释是正确的?

    最佳答案

    解释B是正确的。该标准谈论的是指向对象的指针,而不是对象本身。 “结果指针”指的是强制类型转换的结果,并且强制类型转换不会产生左值,因此它是指强制类型转换之后的指针值。
    以您示例中的代码为例,假设int必须在4字节边界上对齐,即它的地址必须是4的倍数。如果buf的地址是0x1001,则将该地址转换为int *是无效的,因为指针值为排列不正确。如果buf的地址是0x1000,则将其转换为int *是有效的。
    更新:
    您添加的代码解决了对齐问题,因此在这方面很好。但是,它有一个不同的问题:它违反了严格的别名。
    您定义的数组包含char类型的对象。通过将地址强制转换为其他类型,然后取消引用转换后的类型类型,可以将一种类型的对象作为另一种类型的对象进行访问。 C标准不允许这样做。
    尽管标准中未使用“严格混叠”一词,但在6.5节第6和第7段中描述了该概念:

    6 The effective type of an object for an access to its stored value is the declared type of the object, if any.87) If avalue is stored into an object having no declared type through anlvalue having a type that is not a character type, then the type ofthe lvalue becomes the effective type of the object for that accessand for subsequent accesses that do not modify the stored value. If avalue is copied into an object having no declared type using memcpyor memmove, or is copied as an array of character type, then theeffective type of the modified object for that access and forsubsequent accesses that do not modify the value is the effective typeof the object from which the value is copied, if it has one. For allother accesses to an object having no declared type, the effectivetype of the object is simply the type of the lvalue used for theaccess.

    7 An object shall have its stored value accessed only by an lvalue expression that has one of the following types:88)

    • a type compatible with the effective type of the object,
    • a qualified version of a type compatible with the effective type of the object,
    • a type that is the signed or unsigned type corresponding to the effective type of the object,
    • a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
    • an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of asubaggregate or contained union), or
    • a character type.

    ...

    87 ) Allocated objects have no declared type.

    88 ) The intent of this list is to specify those circumstances in whichan object may or may not be aliased.


    在您的示例中,您正在 unsigned long对象的顶部编写一个 doublechar。这些类型都不满足第7段的条件。
    除此之外,此处的指针算法无效:
     *(((double *) buf) + 2) = 1.618;
    当您将 buf视为不是 double的数组时,会将其视为。至少,您将需要直接对 buf执行必要的算术并将结果强制转换为最后。
    那么,为什么这是 char数组而不是 malloc返回的缓冲区的问题?因为从 malloc返回的内存没有有效的类型,除非您在其中存储了内容,这就是第6段和脚注87所描述的。
    因此,从标准的严格角度来看,您正在执行的操作是未定义的行为。但是,根据您的编译器,您可能可以禁用严格的别名,因此这将起作用。如果您使用的是gcc,则需要传递 -fno-strict-aliasing标志

    关于C-两个指针之间的转换行为,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65240303/

    30 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com