c++ - 在两个 ASM GCC 内联 block 之间传播进位-6ren

c++ - 在两个 ASM GCC 内联 block 之间传播进位

转载作者：可可西里更新时间：2023-11-01 16:20:28

尊敬的程序集/C++ 开发人员，

The question is: Does propagate the carry (or any flag) between two ASM block is realistic or totally insane, even if it works ?

几年前，我为低于 512 位(编译时)的大型算术开发了一个整数库。我此时没有使用 GMP，因为对于这种规模，由于内存分配和二进制表示的模型选择，GMP 变慢了 bench .

我必须承认我使用 BOOST_PP 创建了我的 ASM(字符串 block )，它不是很出色(如果好奇请看一下 vli)。图书馆运作良好。

但是我注意到此时不可能在两个 ASM 内联 block 之间传播状态寄存器的进位标志。这是合乎逻辑的，因为对于编译器在两个 block 之间生成的任何助记符，寄存器都会被重置( mov 指令除外(根据我的汇编知识))。

昨天我有了一个想法，在两个 ASM block 之间传播进位有点棘手(使用递归算法)。它正在运行，但我认为我很幸运。

#include <iostream>
#include <array>
#include <cassert>
#include <algorithm>

//forward declaration
template<std::size_t NumBits>
struct integer;


//helper using object function, partial specialization  is forbiden on functions
template <std::size_t NumBits, std::size_t W, bool K = W == integer<NumBits>::numwords>
struct helper {
    static inline void add(integer<NumBits> &a, const integer<NumBits> &b){
        helper<NumBits, integer<NumBits>::numwords>::add(a,b);
    }
};

// first addition (call first)
template<std::size_t NumBits, std::size_t W>
struct helper<NumBits, W, 1> {
    static inline void add(integer<NumBits> &a, const integer<NumBits> &b){
        __asm__ (
                              "movq %1, %%rax \n"
                              "addq %%rax, %0 \n"
                              : "+m"(a[0]) // output
                              : "m" (b[0]) // input only
                              : "rax", "cc", "memory");
        helper<NumBits,W-1>::add(a,b);
    }
};

//second and more propagate the carry (call next)
template<std::size_t NumBits, std::size_t W>
struct helper<NumBits, W, 0> {
    static inline void add(integer<NumBits> &a, const integer<NumBits> &b){
        __asm__ (
                              "movq %1, %%rax \n"
                              "adcq %%rax, %0 \n"
                              : "+m"(a[integer<NumBits>::numwords-W])
                              : "m" (b[integer<NumBits>::numwords-W])
                              : "rax", "cc", "memory");
        helper<NumBits,W-1>::add(a,b);
    }
};

//nothing end reccursive process (call last)
template<std::size_t NumBits>
struct helper<NumBits, 0, 0> {
    static inline void add(integer<NumBits> &a, const integer<NumBits> &b){};
};

// tiny integer class
template<std::size_t NumBits>
struct integer{
    typedef uint64_t      value_type;
    static const std::size_t numbits = NumBits;
    static const std::size_t numwords = (NumBits+std::numeric_limits<value_type>::digits-1)/std::numeric_limits<value_type>::digits;
    using container = std::array<uint64_t, numwords>;

    typedef typename container::iterator             iterator;

    iterator begin() { return data_.begin();}
    iterator end() { return data_.end();}

    explicit integer(value_type num = value_type()){
        assert( -1l >> 1 == -1l );
        std::fill(begin(),end(),value_type());
        data_[0] = num;
    }

    inline value_type& operator[](std::size_t n){ return data_[n];}
    inline const value_type& operator[](std::size_t n) const { return data_[n];}

    integer& operator+=(const integer& a){
        helper<numbits,numwords>::add(*this,a);
        return *this;
    }

    integer& operator~(){
        std::transform(begin(),end(),begin(),std::bit_not<value_type>());
        return *this;
    }

    void print_raw(std::ostream& os) const{
        os << "(" ;
        for(std::size_t i = numwords-1; i > 0; --i)
            os << data_[i]<<" ";
        os << data_[0];
        os << ")";
    }

    void print(std::ostream& os) const{
        assert(false && " TO DO ! \n");
    }

private:
    container data_;
};

template <std::size_t NumBits>
std::ostream& operator<< (std::ostream& os, integer<NumBits> const& i){
    if(os.flags() & std::ios_base::hex)
        i.print_raw(os);
    else
        i.print(os);
    return os;
}

int main(int argc, const char * argv[]) {
    integer<256> a; // 0
    integer<256> b(1);

    ~a; //all the 0 become 1

    std::cout << " a: " << std::hex << a << std::endl;
    std::cout << " ref: (ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff) " <<  std::endl;

    a += b; // should propagate the carry

    std::cout << " a+=b: " << a << std::endl;
    std::cout << " ref: (0 0 0 0) " <<  std::endl; // it works but ...

    return 0;
}

我得到了正确的结果(它必须在版本 -O2 或 -O3 中编译!)并且 ASM 是正确的(在我的 Mac 上使用 clang++:Apple LLVM 版本 9.0.0 (clang-900.0.39.2))

    movq    -96(%rbp), %rax
    addq    %rax, -64(%rbp)

    ## InlineAsm End
    ## InlineAsm Start
    movq    -88(%rbp), %rax
    adcq    %rax, -56(%rbp)

    ## InlineAsm End
    ## InlineAsm Start
    movq    -80(%rbp), %rax
    adcq    %rax, -48(%rbp)

    ## InlineAsm End
    ## InlineAsm Start
    movq    -72(%rbp), %rax
    adcq    %rax, -40(%rbp)

我确信它正在工作，因为在优化过程中，编译器删除了 ASM block 之间的所有无用指令(在 Debug模式下它失败了)。

你怎么看？绝对不安全？编译器专家知道它的稳定性吗？

总结:我这样做只是为了好玩 :) 是的，GMP 是大型算术的解决方案!

最佳答案

__volatile__ 的使用是一种滥用。

__volatile__ 的目的是强制编译器在编写的位置发出汇编代码，而不是依靠数据流分析来解决这个问题。如果你在用户空间中对数据进行普通操作，通常你不应该使用 __volatile__，如果你需要 __volatile__ 来让你的代码工作，它几乎总是意味着您的操作数指定不正确。

是的，操作数指定不正确。让我们看一下第一个 block 。

__asm__ __volatile__ (
                      "movq %1, %%rax \n"
                      "addq %%rax, %0 \n"
                      : "=m"(a[0]) // output
                      : "m" (b[0]) // input only
                      : "rax", "memory");

这里有两个错误。

输出 "=m"(a[0]) 的约束不正确。回想一下，addq 的目标既是输入又是输出，因此正确的约束是 +，所以使用 "+m"(a[0])。如果你告诉编译器 a[0] 只是输出，编译器可能会安排 a[0] 包含一个垃圾值(通过死存储消除)，这是不是你想要的。
程序集规范中缺少标志。在不告诉编译器标志已修改的情况下，编译器可能会假设标志在整个汇编 block 中都保留了下来，这将导致编译器在其他地方生成不正确的代码。

不幸的是，这些标志只能用作汇编 block 的输出或破坏操作数，不能用作输入。所以在正确指定操作数上大惊小怪，所以你不使用 __volatile__...事实证明，无论如何都没有指定操作数的好方法!

因此这里的建议是，您应该至少修复您可以修复的操作数，并将"cc" 指定为一个clobber。但是有一些更好的解决方案根本不需要 __volatile__...

解决方案 #1:使用 GMP。

用于加法的mpn_ 函数不分配内存。 mpz_ 函数是 mpn_ 函数的包装器，带有一些额外的逻辑和内存分配。

解决方案 #2:将所有内容写入一个汇编 block 。

如果您在一个汇编 block 中编写整个循环，则不必担心在 block 之间保留标志。您可以使用汇编宏来执行此操作。请原谅，我不是一个汇编程序员:

template <int N>
void add(unsigned long long *dest, unsigned long long *src) {
  __asm__(
      "movq (%1), %%rax"
      "\n\taddq %%rax, (%0)"
      "\n.local add_offset"
      "\n.set add_offset,0"
      "\n.rept %P2" // %P0 means %0 but without the $ in front
      "\n.set add_offset,add_offset+8"
      "\n\tmovq add_offset(%1), %%rax"
      "\n\tadcq %%rax, add_offset(%0)"
      "\n.endr"
      :
      : "r"(dest), "r"(src), "n"(N-1)
      : "cc", "memory", "rax");
}

它所做的是使用 .rept 汇编指令评估循环。您最终将获得 1 个 addq 拷贝和 N-1 个 adcq 拷贝，尽管如果您使用 -S 查看 GCC 的汇编输出你只会看到一个。汇编程序本身将创建拷贝，展开循环。

参见要点:https://gist.github.com/depp/966fc1f4d535e31d9725cc71d97daf91

关于c++ - 在两个 ASM GCC 内联 block 之间传播进位，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/48810179/

文章推荐： hadoop - 无法为 pig 中的别名打开迭代器

文章推荐： javascript - 发送/接收图像以响应 http

文章推荐： null - Hadoop 0.20.205.0 WritableComparator 不遵守可配置键

delphi - 将 ASM 与非 asm 代码组合(或需要 SwapInt64 ASM 函数)
我需要处理来自旧 Mac 时代(旧摩托罗拉 CPU)的文件。字节是大端字节序，所以我有一个函数可以将 Int64 交换为英特尔小端字节序。该函数是 ASM，可在 32 位 CPU 上运行，但不能在 6
【ASM】史上最通俗易懂的ASM教程 ASM 插件
1.概述转载：史上最通俗易懂的ASM教程一勺思想 We are all in the gutter, but some of us are looking at the stars. （我们都生活
【ASM】ASM 与 Presto 动态代码生成简介
1.概述转载：ASM 与 Presto 动态代码生成简介代码生成是很多计算引擎中常用的执行优化技术，比如我们熟悉的 Apache Spark 和 Presto 在表达式等地方就使用到代码生成技术。
function - ASM , 用 asm 调用函数
我想在 C++ 程序中使用 ASM 调用地址为 774a7fdch 的函数(kernel32.dll 函数) 我正在使用 Visual Studio 2010。我该怎么做？ call 774a7fd
c++ - VS C++ ASM 到 GCC ASM
我是否正确转换了它？原始 VS C++ 版本: _TEB *pTeb = NULL; _asm { mov eax, fs:[0x18];
linux - linux/include/asm 中的 asm 代表什么
阅读自howto_add_systemcall "In general, header files for machine architecture independent system calls
c - asm、asm volatile 和 clobbering 内存之间的区别
在实现无锁数据结构和时序代码时，通常需要抑制编译器的优化。通常人们使用 asm volatile 和 clobber 列表中的 memory 来执行此操作，但有时您只会看到 asm volatile
c - gcc : ‘asm’ operand has impossible constraints 中的扩展 asm
这个“strcpy”函数的目的是将src的内容复制到dest，结果很好:显示两行“Hello_src”。 #include static inline char * strcpy(char * de
c - asm 语法高亮和 asm 文件在 Visual Studio 中显示
我正在尝试进行一些汇编编码，我从 C 语言调用函数。代码本身运行良好，但我有两个巨大的问题在很长一段时间内无法解决。第一个是语法高亮 - 我安装了两个不同的(当时一个)asm 高亮扩展到 Visual
Java、ASM : How to Get Opcode Name and TagValue from ASM InsnNode?
我正在研究一些类文件分析，并且正在研究使用 ASM 来读取类。在 Javap 中，操作码以及 tagName 和 tagValue 是内联打印的，但在每个 AbstractInsnNode 中，我只看
c++ - 内联 ASM C++ 中的 DB ASM 变量
我正在尝试弄清楚如何将 ASM 中的 DB 变量用于内联 ASM C++ 我有这个 ASM 代码: filename db "C:\imagen.bmp" eti0: mov ah,3dh mov a
c - gcc : ‘asm’ operand has impossible constraints 中的扩展 asm
这个“strcpy”函数的目的是将src的内容复制到dest，结果很好:显示两行“Hello_src”。 #include static inline char * strcpy(char * de
linux-kernel - 在 linux 内核中，asm 还是 asm-generic？
在 mm/memory.c 中，它包含一个文件: #include tlb.h 是 include/asm-generic/tlb.h或 arch/arm/include/asm/tlb.h ? 最
c++ - C++ 项目中的 ASM ...这个小 asm 代码在 C++ 中的表现如何
你好我找到了一个asm代码......它被集成到c++项目中 template T returned; BYTE *tem = buffer; __asm { mov eax, tem
TASM 找不到 .asm 文件错误 : **Fatal** Command line: Can't locate file: filename. asm
问题:当我运行 @ 命令提示符 >tasm HelloWorld.asm 顺便说一句，我在输入文件名 HelloWorld.asm 时使用 TAB，所以没有错字.我收到这个致命的命令行错误: Turb
c - 8086/386 asm 与 bcc5 : returning long int from asm proc
尝试通过 eax 从 asm proc 返回一个 long int，后来又尝试通过 dx:ax。两者都不适合我，因为 C printf 打印的数字与所需的 320L 不同。 x.asm: .model
c++ - 为什么 godbolt 生成的 asm 输出与我在 Visual Studio 中的实际 asm 代码不同？
这是 godbolt 生成的代码. 下面是 Visual Studio 在我的 main.asm 文件上生成的相同代码(通过 Project->C/C++->Output Files->Assembl
maven-2 - Maven 3 警告 : Failure to transfer asm:asm/maven-metadata. xml
在构建具有依赖项的 giraph jar 时，我们收到以下警告.. 真的不知道如何解决这些.. 我们已经尝试过了 useProjectArtifact 为 false 和解压为真两者似乎都有效任
c - "#include "导致 "error: asm/io.h: No such file or directory"
我正在使用 gentoo 并尝试编译一个程序来控制并行端口上的位。它的顶部附近有这条线: #include 当我尝试在其上使用 gcc 时，它会产生以下输出: port.c:4:20: error:
java - Jersey + hibernate = NoSuchMethodError : org. objectweb.asm.ClassReader.accept(Lorg/objectweb/asm/ClassVisitor
(原帖)将 hibernate 依赖项添加到 pom.xml 时显示错误 2011-10-11 10:36:53.710::WARN: failed guiceFilter java.lang.No

可可西里

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城