sorting - 重叠寄存器错误-6ren

sorting - 重叠寄存器错误

转载作者：行者123 更新时间：2023-12-02 09:14:01

内联汇编代码在for循环之一内替换了C++语句。

有时，它神奇地起作用并产生正确的结果-用基数排序的数组。另外一次Xcode生成Thread 1: EXC_BAD_ACCESS (code=1, address=0x1eccccccccd)错误，我使用反汇编 View 追溯到incq (%[count], %%rdx, 4)行。

我所了解的

反汇编将incq (%[count], %%rdx, 4)视为incq (%rax,%rdx,4)。这可能意味着同一个寄存器用于不同的操作数(%%rax行上已经使用了movq (%[array], %%rcx, 4), %%rax)，问题出在这里:: [array] "r" (array), [count] "r" (count), "b" (digit), "c" (i)。

我不懂的

如何管理寄存器，以便我有足够的空间来使用(分配给输入操作数，以及稍后在主体代码中使用)，并且它们不会同时重叠。我尝试了几种组合，但没有一种有效。

void countingSort(int array[], int length, int digit) {
    int i, count[10] = { };
    int sorted[length];

    // Store number of occurrences in count[].
    // for (i = 0; i < length; i++)
    //     count[ (array[i] / digit) % 10 ]++;

    for (i = 0; i < length; i++)
        asm volatile (
                      "movq (%[array], %%rcx, 4), %%rax \n\t"

                      "xorq %%rdx, %%rdx \n\t"
                      "divq %%rbx \n\t"
                      "movq $10, %%rbx \n\t"
                      "xorq %%rdx, %%rdx \n\t"
                      "divq %%rbx \n\t"

                      "incq (%[count], %%rdx, 4) \n\t"

                      :: [array] "r" (array), [count] "r" (count), "b" (digit), "c" (i)
                      : "memory"
        );

    // ...
}

完整代码:

#include<iostream>
using namespace std;

void print(int array[], int length) {
    for (int i = 0; i < length; i++)
        cout << array[i] << " ";
}

int findMax(int array[], int length) {
    int max = array[0];
    for (int i = 1; i < length; i++)
        if (array[i] > max)
            max = array[i];
    return max;
}

void countingSort(int array[], int length, int digit) {
    int i = 0, count[10] = { };
    int sorted[length];

    // Store number of occurrences in count[].
    // for (i = 0; i < length; i++)
    //     count[ (array[i] / digit) % 10 ]++;

    for (i = 0; i < length; i++)
        asm volatile (
                      "movq (%[array], %%rcx, 4), %%rax \n\t"

                      "xorq %%rdx, %%rdx \n\t"
                      "divq %%rbx \n\t"
                      "movq $10, %%rbx \n\t"
                      "xorq %%rdx, %%rdx \n\t"
                      "divq %%rbx \n\t"

                      "incq (%[count], %%rdx, 4) \n\t"

                      :: [array] "r" (array), [count] "r" (count), "b" (digit), "c" (i)
                      : "memory"
        );

    // Change count[i] so that count[i] now contains actual
    // position of the digit in sorted[].
    // for (i = 1; i < 10; i++)
    //     count[i] += count[i - 1];

    for (i = 1; i < 10; i++)
        count[i] += count[i - 1];

    // Build the sorted array.
    for (i = length - 1; i >= 0; i--) {
        sorted[count[ (array[i] / digit) % 10 ] - 1] = array[i];
        count[ (array[i] / digit) % 10 ]--;
    }

    // Copy the sorted array to array[].
    for (i = 0; i < length; i++)
        array[i] = sorted[i];
}

void radixSort(int array[], int length) {
    // Maximum number helps later when counting number of digits.
    int max = findMax(array, length);

    // Do Counting sort for every digit.
    for (int digit = 1; max / digit > 0; digit *= 10)
        countingSort(array, length, digit);
}

int main() {
    int array[] = { 2, 4928, 48, 72, 280, 4, 66, 3, 1, 0, 4829 };
    int length = sizeof(array) / sizeof(array[0]);

    radixSort(array, length);
    print(array, length);
    return 0;
}

最佳答案

看来这是给 class 分配的。在现实世界中this would not be done with inline assembly。

问题:

当您将10移到RBX时，通过覆盖关联的寄存器来破坏digit。 C++编译器不知道这一点，因为您没有提到在约束中修改RBX。内联汇编之前和之后，编译器可能会假定RBX是相同的。

由于您使用的是32位整数，因此可以使用long代替四字分割。

DIV是无符号除法，IDIV是有符号除法。您的C++代码对带符号的数字进行操作。但这实际上并不重要，因为此代码将在数组中带有负数时崩溃。如果使用带符号除法，则可以使用CDQ将EAX扩展为EDX。

使用汇编模板通过编译器选择的寄存器将除数(10)传递。

您可以在约束中将i转换为long类型。在64位代码中，long是64位，并且在汇编模板中被引用时，默认情况下将使编译器使用64位寄存器。

整数为4个字节，而不是8个字节。访问与整数数组关联的内存时，要使用读取4个字节的指令。您使用的MOVQ和INCQ指令移动8个字节。您将要考虑MOVL和INCL或其他移动4个字节的等效项。如果要将带符号的4字节值从内存移动到64位寄存器，可以使用MOVSXD。在AT＆T语法中，最好使用MOVSLQ / MOVSL或MOVSXL，因为CLANG和GCC / G++都可以理解。

允许编译器为输入和输出操作数选择寄存器，而不必在汇编模板中对其进行硬编码。在您的代码中，唯一可以被其中一条指令隐式修改的寄存器，而RDX是我们无法做的。将其添加到clobbers列表中，以便编译器知道其内容可能会更改。

修改后的代码如下所示:

    asm (
             "movslq (%[array], %[index], 4), %%rax \n\t"
             "cdq \n\t"               /* Sign extend eax into edx */
             "idivl %[digit] \n\t"    /* array[i]/digit */
             "cdq \n\t"               /* Sign extend eax into edx */
             "idivl %[divisor] \n\t"  /* (array[i] / digit) mod 10 */
             "incl (%[count], %%rdx, 4)"
              : "=m" (*(int (*)[]) count) /* instead of memory clobber */
              : [divisor] "r" (10), [array] "r" (array), [count] "r" (count),
                [digit] "r" (digit), [index] "r" ((long)i),
                "m" (*(const int (*)[]) array) /* instead of memory clobber */
              : "rax", "rdx", "cc"
    );

我还删除了内存破坏器，并告诉它该数组是将被修改的输出内存操作数。这在 GCC inline assembly documentation中讨论。由于除了约束和掩体中指定的以外，程序集模板没有其他副作用，因此我们不需要使用 volatile。

您可以删除第一条MOV指令并使用中间变量。这将允许您通过EAX通过约束将当前值传递给 array[i]。由于EAX现在处于其自身的输入/输出约束中(使用 +)，我们可以将其从Clobbers中删除。代码如下所示:

    int curval;
    asm (
             "cdq \n\t"               /* Sign extend eax into edx */
             "idivl %[digit] \n\t"    /* array[i]/digit */
             "cdq \n\t"               /* Sign extend eax into edx */
             "idivl %[divisor] \n\t"  /* (array[i] / digit) mod 10 */
             "incl (%[count], %%rdx, 4)"
             : "=m" (*(int (*)[]) count), /* instead of memory clobber */
               "+&a" (curval = array[i])  /* Early clobber, we modify it
                                             before all inputs processed */
             : [divisor] "r" (10), [array] "r" (array), [count] "r" (count),
               [digit] "r" (digit), [index] "r" ((long)i)
             : "rdx", "cc"
    );

如果上面的答案过于复杂，并且您想解决代码中的直接问题，那么最小的更改集可能看起来像:

    asm volatile (
                  "movl (%[array], %%rcx, 4), %%eax \n\t"
                  "xorq %%rdx, %%rdx \n\t"
                  "divq %%rbx \n\t"
                  "movq $10, %%rsi \n\t"
                  "xorq %%rdx, %%rdx \n\t"
                  "divq %%rsi \n\t"
                  "incl (%[count], %%rdx, 4) \n\t"
                  :: [array] "r" (array), [count] "r" (count), "b" (digit), "c" ((long)i)
                  : "memory", "rax", "rdx", "rsi"
    );

我们修复了修改包含digit的寄存器的问题，方法是使用另一个寄存器存储10进行除法。如果优化编译器假定寄存器的值未更改，则将仅列出为输入约束的寄存器进行修改可能会导致 undefined 的行为。编译器必须知道已更改的内容。

因为我们的程序集模板现在修改了RAX，RDX和RSI，所以我们必须将它们添加到Clobber列表中。如果它们还没有与之关联的输出约束，则它们必须位于“破坏者”列表中。

我们使用MOVL指令将4字节整数从数组移动到EAX。当指令的目标寄存器是32位寄存器时，CPU会自动将其零扩展到64位寄存器的高32位。

INCQ更改为INCL以更新4字节的存储器地址。

注释:

当针对x86和x96-64平台时，C / C++会有效地忽略"cc"破坏者。如果模板确实破坏了标记，则将"cc"指定为破坏者不是坏主意，就像该代码中的情况一样。如果有什么习惯，那么如果您曾经在处理器列表中明确指定要修改的标志的处理器上工作，则应该养成这种习惯。

如果程序集模板没有输出(或输入/输出)约束，则它是隐式易失的，因此不需要volatile修饰符。

关于sorting - 重叠寄存器错误，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/48854260/

文章推荐： r - 根据现有变量的值创建特定数量的新变量

文章推荐： java - Java 模板引擎中 URL 的最佳实践

文章推荐： java - 如何在 Java 中设置 AccessControlContext？

文章推荐： r - 使用 "cases"在大括号中显示线性方程组(R-markdown)

sorting - 性能 : Sorting Slice vs Sorting Type (of Slice) with Sort implementation
我在玩一些代码挑战时发现自定义排序(排序接口(interface)的实现)比仅针对 slice 的原始结构要快得多。这是为什么？将 slice 转换为类型是否会产生一些魔力(例如转换为指向结构的指针
reactjs - 未找到规则 'simple-import-sort/sort' 的定义 simple-import-sort/sort
我正在使用 simple-import-sort eslint 插件进行 react 。我想我的 .eslintrc.js是对的，但我无法使这个特定的插件工作。我在文件的第一行收到以下错误: 未找到规
sorting - sort.Slice排序不正确
Closed. This question is not reproducible or was caused by typos。它当前不接受答案。想改善这个问题吗？更新问题，以便将其作为on-to
golang 不能在 sort.Sort 的参数中使用类型作为类型 sort.Interface
好的，所以我是 Go 的新手，我正在努力让自己熟悉按函数排序。我可能误解了什么，所以如果我错了请纠正我。我正在尝试创建一个包含字段 key 和 value 的 Nodes 数组。我想创建一个自定义排
sorting - `take n (sort xs)` ("sorted prefix") 问题的内存高效算法
我想从惰性列表中取出 n 个最大的元素。我听说在 Data.List.sort 中实现的合并排序是惰性的，它不会产生不必要的元素。就比较而言，这可能是正确的，但在内存使用方面肯定不是这样。下面的程序
javascript - 两个回调作为 Array.sort() 的参数，而不是 Array.sort().sort()
这个问题已经有答案了: Javascript sort function. Sort by First then by Second (10 个回答) 已关闭 3 年前。我正在尝试返回已排序产品的列
c++ - 成对 vector : first pair values are non-sorted and second pair values are sorted: how to find a sorted value when having the non-sorted one
我有一个 vector 对，如下所示。第一对值未排序，第二对值已排序(从零开始)。我可能想通过实现 std::vector 和 std::pair 来存储数据。当我有第一对值(未排序)时，找到相应的第
nstableview - swift 3 : Sort (formerly sort-in-place) array by sort descriptors
直到现在(Swift 2.2)我一直愉快地使用来自 this answer 的代码- 它迅速，优雅，它像梦一样工作。 extension MutableCollectionType where Ind
sorting - Golang Sort :does not implement sort. 接口(interface)(缺少 Len 方法)
我在我的 Go 应用程序中实现排序界面时遇到问题。这是相关代码: type Group struct { Teams []*Team } type Team struct { Point
sorting - table.sort 使用什么算法？
我很好奇 Lua 的默认算法是什么 table.sort使用，只是因为它比我遇到的其他一些排序算法慢。我也很好奇 Lua 的 table.sort是在引擎中用 C 编写的，或者如果它在 Lua 中的库
sorting - "partially sorted"的数学定义
例如，插入排序被描述为部分排序数组的有效算法。但如何精确定义“部分排序”呢？最佳答案这是一个只有少数元素不合适的数组。如果没有指定百分比或其他阈值，则部分排序和未排序之间没有严格的区别。正式定义
sorting - Thrust::sort 有多快以及最快的基数排序实现是什么
我是 GPU 编程的新手。最近，我正在尝试根据一个教程实现gpu bvh构建算法:http://devblogs.nvidia.com/parallelforall/thinking-parallel
sorting - Gnumeric Sort 函数
有人可以指导我 Gnumeric 排序函数的详细说明(链接)吗？ Gnumeric 手册很简短并且没有示例。我无法通过搜索引擎找到任何合适的信息，甚至 Stackoverflow 上也只有六个不合适的
Python sort and sorted -- 列表的列表如何精确排序？
在 Python 中使用什么精确规则来对列表进行排序，其中元素是列表？这可以表示为“key”或“cmp”吗功能？问题来自于有两件事考虑:长度和它们位置的值。 sorted([ [ 0, 1, 2
go - sort.Sort 不修改数组
下面的代码应该创建一个整数数组 (a) 并对它进行排序，但是 sort.Sort 似乎没有修改变量。 package main import ( "fmt" "sort" ) type
sorting - golang sort.Sort随机输出并且是错误的
我有一个应用于结构的自定义排序函数。完整代码是 here on play.golang.org . type Stmt struct { Name string After []st
对python3 sort sorted 函数的应用详解
python3 sorted取消了对cmp的支持。 python3 帮助文档： ?
algorithm - Shellsort - 如果一个数组是 g-sorted 然后 h-sorted，数组仍然是 g-sorted
以下是来自普林斯顿的 coursera 算法类(class)的练习。如果一个数组既是 3 次排序又是 5 次排序，那么它是否也是 6 次、7 次、8 次、9 次和 10 次排序？我知道任何序列如果先
algorithm - "A g-sorted array remains g-sorted even after h-sorting it"的含义是什么？
当我看到上面的语句时，我正在阅读 shell-sorting。这意味着什么？它对我看待 shell 排序的方式有何不同？ PS:我不是在寻找声明的证据。最佳答案好吧，你可能暗示下一个排序阶段不会“
mysql Sort aborted: Out of sort memory, consider increasing server sort buffer size的解决方法
今天在检查mysql服务器的时候提示Sort aborted: Out of sort memory, consider increasing server sort buffer size，安装字

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

sorting - 重叠寄存器错误