c - ARM 皮质 M0+ : How to use "Branch if Carry" instructions in C-code?-6ren

c - ARM 皮质 M0+ : How to use "Branch if Carry" instructions in C-code?

转载作者：行者123 更新时间：2023-12-04 12:04:38

25

4

我有一些 C 代码可以逐位处理数据。简化示例:

// input data, assume this is initialized
uint32_t data[len];

for (uint32_t idx=0; idx<len; idx++)
{
    uint32_t tmp = data[idx];
    
    // iterate over all bits
    for (uint8_t pos=0; pos<32; pos++)
    {
        if (tmp & 0b1)
        {
            // some code
        }
    
        tmp = tmp >> 1;
    }
}

在我的应用程序中， len 比较大，所以我想尽可能地优化内部循环。 // some code 部分很小并且已经进行了大量优化。
我正在使用 ARM Cortex M0+ MCU，如果设置了进位位，该 MCU 具有分支指令(请参阅 cortex-m0+ manual ，第 45 页)。方便地移位位将 LSB(或 MSB)放入进位标志，因此理论上它可以在没有比较的情况下进行分支，如下所示:

// input data, assume this is initialized
uint32_t data[len];

for (uint32_t idx=0; idx<len; idx++)
{
    uint32_t tmp = data[idx];

    // iterate over all bits
    for (uint8_t pos=0; pos<32; pos++)
    {
        tmp = tmp >> 1;

        if ( CARRY_SET )
        {
            // some code
        }
    }
}

使用 C 代码和/或内联汇编程序对此进行存档的最佳方法是什么？理想情况下，为了简单和更好的可读性，我想将 // come code 保留在 C 中。

编辑 1:我已经使用 -O1、-O2 和 -03 在 GCC 5.4 GCC 6.3 上测试了此代码。对于每个设置，它都会生成以下汇编代码(请注意我尝试获取的专用 tst 指令):

        if (data & 0b1)             
00000218   movs r3, #1       
0000021A   tst  r3, r6       
0000021C   beq  #4

编辑 2:最小的可重现示例。我正在 Atmel Studio 7 中编写代码(因为它适用于 MCU)并检查内置调试器中的值。如果您使用不同的环境，您可能需要添加一些 IO 代码:

int main(void)
{
    uint32_t tmp = 0x12345678;
    volatile uint8_t bits = 0;  // volatile needed in this example to prevent compiler from optimizing away all code.

    // iterate over all bits
    for (uint8_t pos=0; pos<32; pos++)
    {
        if (tmp & 1)
        {
            bits++;    // the real code isn't popcount.  Some compilers may transform this example loop into a different popcount algorithm if bits wasn't volatile.
        }
        tmp = tmp >> 1;
    }

    // read bits here with debugger
    while(1);
}

最佳答案

我没有找到“简单”的解决方案，所以我不得不在汇编程序中编写我的简短算法。这是演示代码的样子:

// assume these values as initialized
uint32_t data[len];     // input data bit stream
uint32_t out;           // algorithm input + output
uint32_t in;            // algorithm input (value never written in asm)

for (uint32_t idx=0; idx<len; idx++)
{
    uint32_t tmp = data[idx];
    
    // iterate over all bits
    for (uint8_t pos=0; pos<32; pos++)
    {
        // use optimized code only on supported devices
        #if defined(__CORTEX_M) && (__CORTEX_M <= 4)
        asm volatile  // doesn't need to be volatile if you use the result
        (
            "LSR    %[tmp], %[tmp], #1" "\n\t"  // shift data by one. LSB is now in carry
            "BCC    END_%="             "\n\t"  // branch if carry clear (LSB was not set)
            
            /* your code here */        "\n\t"
        
            "END_%=:"                   "\n\t"  // label only, doesn't generate any instructions
        
            : [tmp]"+l"(tmp), [out]"+l"(out)    // out; l = register 0..7 = general purpose registers
            : [in]"l"(in)                       // in;
            : "cc"                              // clobbers: "cc" = CPU status flags have changed
                               // Add any other registers you use as temporaries, or use dummy output operands to let the compiler pick registers.
        );
        #else
        if (tmp & 0b1)
        {
            // some code
        }
        tmp = tmp >> 1;
        #endif
    }
}

对于您的应用程序，在标记的位置添加您的汇编代码，并使用寄存器从 C 函数中输入数据。请记住，在 Thumb 模式下，许多指令只能使用 16 个通用寄存器中的 8 个，因此您不能传递更多的值。
内联汇编很容易以微妙的方式出错，这些方式看似有效，但在内联到不同的周围代码后可能会中断。 (例如，忘记声明一个clobber。) https://gcc.gnu.org/wiki/DontUseInlineAsm除非您需要(包括性能)，但如果是这样，请确保您检查文档( https://stackoverflow.com/tags/inline-assembly/info )。
请注意，技术上正确的移位指令是 LSRS (带有 s 后缀来设置标志)。但是在 GCC 6.3 + GAS 写入 lsrs在 asm 代码中会导致在拇指模式下组装错误，但如果你写 lsr它成功组装成 lsrs操作说明。 (在 Cortex-M 不支持的 ARM 模式下， lsr 和 lsrs 都按预期汇编为单独的指令。)

虽然我不能分享我的应用程序代码，但我可以告诉你这个变化有多少加速:

-O1
-O2
-O3

原来的
812us
780us
780us

带汇编
748us
686us
716us

带 asm + 一些循环展开
732us
606us
648us

因此，使用我的 ASM 代码和 -O2 而不是 -O1，我获得了 15% 的加速，并且通过额外的循环展开，我获得了 25% 的加速。
使用 __attribute__ ((section(".ramfunc"))) 将函数放在 RAM 中产生另外 1% 的改进。 (请务必在您的设备上对此进行测试，某些 MCU 的闪存缓存未命中惩罚很严重。)
有关更多通用优化的信息，请参阅下面的 old_timer 答案。

关于c - ARM 皮质 M0+ : How to use "Branch if Carry" instructions in C-code?，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/68601363/

25

4

0

文章推荐： python - Cython 没有加速

文章推荐： spring-boot - Spring Cloud 配置服务器

文章推荐： r - 在data.table的组的第一行中分配值

文章推荐： c - 什么时候写 "0 - x"而不是 "-x"有用？

branch.io - Android : Branch. io 教程 : Branch. getAutoInstance(this);
我试图让 Branch.io 在 Android 上工作，但我遇到了: myapplication.MainActivity cannot be cast to android.app.Applica
git - 本地和远程分支名称相同仍然得到 "The upstream branch of your current branch does not match the name of your current branch "
当我执行 git branch 时，我知道我在分支 v0.2 上。 git branch v0.1 * v0.2 但是当我执行 git push 时它说“你当前分支的上游分支与你当前分支的名称不
git - 致命的 : The upstream branch of your current branch does not match the name of your current branch
在使用 Git GUI 检查远程分支 releases/rel_5.4.1 之后，当我尝试 push 时看到了这个意外的错误消息: fatal: The upstream branch of your
git - 我如何压制 "fatal: The upstream branch of your current branch does not match the name of your current branch"？
SO 上有一个相关问题处理如何更改 push 命令的参数以避免出现此消息: fatal: The upstream branch of your current branch does not mat
git - arc feature [branch-name] 和 git branch [branch-name] 有什么区别？
arc feature [branch-name] 和 git branch [branch-name] 有什么区别？他们似乎都创建了一个新分支。最佳答案 arc feature [branch-
algorithm - FIFO Branch and Bound、LIFO Branch and Bound 和 LC Branch and Bound 之间有什么区别？
FIFO、LIFO 和LC Branch and Bound 有什么区别？最佳答案 Branch & Bound 通过使用估计边界来限制可能解决方案的数量来发现完整搜索空间内的分支。不同的类型(FI
git - git checkout --track origin/branch 和 git checkout -b branch origin/branch 的区别
有人知道这两个切换和跟踪远程分支的命令之间的区别吗？ git checkout -b branch origin/branch git checkout --track origin/branch 我
branch - git-svn branch - 如何使分支与主干保持同步？
关于 git-svn 工作流程有很多问题，但我一直无法弄清楚这一点: This section of the svn book谈到 SVN 的一个常见做法:创建一个分支，并在主干更新时不断合并主干中的
php - 我怎样才能从 git status "On branch master\n Nothing to commit.."到 "On branch master\n Your branch is up to date\n Nothing to .."？”
我正在构建一个控制 git 存储库的 PHP 应用程序。我有一个执行命令 git status 的同步函数，虽然没有返回“你的分支是最新的”，但可能会提前采取必要的行动，比如远程或本地分支。我还构建
branch.io - 使用 branch.io 的自定义链接
是否可以使用 branch.io 创建自定义链接，例如 https://example.app.link/fzmLEhobLD所以我可以用我的自定义 10 位参数(如 amitpp8888)控制 fz
git : New branch is pointing to the dev branch
我从 github 克隆了一个分支，它的名字是 dev。我已经开始使用它， pull 和推送代码更改并确保我的本地存储库与远程存储库保持同步。我要开始实现一个新功能，因此创建了一个新分支，如下所示:
svn - SVN 分支模型 : branch -> next-branch, 放弃主干有什么缺点吗？
我们有一个发布模型，为简单起见，我们假设每月 1 次。所以，我们通常会去: Jan -> trunk trunk -> Feb trunk trunk et
branch.io - 使用 Branch HTTP API 创建快速链接？
使用 Branch.io HTTP API 创建的链接不会在 Branch 门户中显示为快速链接。快速链接很方便，因为它们在一个 View 中显示“点击”、“打开”等内容用于创建链接的 API:li
TFS 2010 : Branching - Why do changesets say they haven't been merged when they were from before the Branch?
我创建了一个分支，当我第一次从源代码合并到分支时，出现了一大堆旧的变更集，它说没有合并，但它们在分支之前就存在，我确认它们在那里。例子: 假设当 Source 中有 9 个变更集时，我从 Sourc
git - 为什么我没有收到错误 "fatal: The current branch A has no upstream branch."
这是关于我为什么这样做不是收到错误“致命:当前分支 A 没有上游分支”。我删除了远程分公司一个使用命令 git push origin :A . 然后我切换到本地分支 A 使用命令 g
java - colver中的 'true branch executed'和 'branch executed'有什么不同？
我正在使用 clover 插件来检查我的 java 代码测试覆盖率。我为所有行编写了单元测试。当我点击红线时，它显示“true分支执行了2次，分支执行了0次”。这是什么意思？我该如何解决这个问题？
Git 哲学 : how to get "master" branch to "production" branch?
很确定我误解了 git。我的目标我在 github 上有一个带有“master”分支的私有(private)存储库。我还想有一个生产分支，我会将所有更改从 master 推送到该分支。然后我想
git - 比较 git branch 和 rebased branch
我将一个相当老的主题分支重新定位到 master 上。由于在 rebase 期间有很多冲突，我想将旧主题分支与 rebased 分支进行比较，以确保我没有意外删除或搞砸主题上的任何更改。我最接近的是比
git - 致命的 : The current branch master has no upstream branch
我正在尝试将我的一个项目推送到 github，但我一直收到此错误: peeplesoft@jane3:~/846156 (master) $ git push fatal: The current b
混帐: "+refs/heads/:refs/remotes/origin/"
Jenkins Git 插件根据我的引用规范在控制台输出中生成了以下命令下面两个命令有什么区别？他们的输出看起来没什么不同。我在下面给出了他们的输出: 命令 1: git fetch --no-ta

首页

博学

6Ren·AI

商城

c - ARM 皮质 M0+ : How to use "Branch if Carry" instructions in C-code?