Trying to create the smallest 32-bit Bit Reversal in C using inline __asm()
.
尝试使用INLINE__ASM()在C中创建最小的32位反转。
So far, I've managed to get the __asm()
code size to 23
bytes.
到目前为止,我已经设法将__ASM()代码大小设置为23个字节。
I'm curious if there are ways to further decrease the code size by using
我很好奇是否有方法通过使用
- compiler instrinsics
- specialized assembly instructions
- vanilla C code
Example
示例
godbolt
龙珠
#include <stdio.h>
unsigned int CodeSize;
void print_binary(unsigned int number) { if (number >> 1) print_binary(number >> 1); putc((number & 1) ? '1' : '0', stdout); }
void reverseBits(unsigned int OriginalValue)
{
unsigned int ReversedValue = 0;
start_asm:
__asm__ (
"movl %1, %%eax\n" // load the value
"xorl %%ebx, %%ebx\n" // clear EBX (optimized)
"bsrl %%eax, %%ecx\n" // find highest order bit set to 1 in EAX
"incl %%ecx\n" // increment to get the correct number of iterations
"reverse:\n\t" // local label
"shrl $1, %%eax\n" // shift the LSB of EAX to the carry flag CF
"rcll $1, %%ebx\n" // the MSB of EBX goes into CF and CF's previous value goes into the LSB of EBX
"loop reverse\n" // loop back to the local label
"movl %%ebx, %0" // move the result to the output
: "=r" (ReversedValue) // output
: "r" (OriginalValue) // input
: "eax", "ebx", "ecx" // clobbered registers
);
end_asm:
CodeSize = (char *)&&end_asm - (char *)&&start_asm;
printf("\nOriginal: 0x%X ", OriginalValue); print_binary(OriginalValue);
printf("\nReversed: 0x%X ", ReversedValue); print_binary(ReversedValue);
printf("\n");
}
int main()
{
reverseBits(0xfeedface);
reverseBits(0xfeed);
reverseBits(0xfe);
printf("\nCodeSize: %u bytes\n", CodeSize);
return 0;
}
Output
输出
Original: 0xFEEDFACE 11111110111011011111101011001110
Reversed: 0x735FB77F 1110011010111111011011101111111
Original: 0xFEEDFA 111111101110110111111010
Reversed: 0x5FB77F 10111111011011101111111
Original: 0xFEED 1111111011101101
Reversed: 0xB77F 1011011101111111
Original: 0xFE 11111110
Reversed: 0x7F 1111111
CodeSize: 23 bytes
Update
更新
From the helpful comments below, the code size is now 13 bytes
根据下面有用的注释,代码大小现在是13个字节
uint32_t reverseBits(uint32_t OriginalValue) {
uint32_t ReversedValue = 0;
__asm__ (
"start_asm:\n" // start label
"xorl %%ebx, %%ebx\n" // clear EBX
"bsrl %1, %%ecx\n" // find highest order bit set to 1 in EAX (which is %1)
"reverse:\n" // local label for looping
"shrl $1, %1\n" // shift the LSB of EAX (which is %1) to the carry flag CF
"rcll $1, %%ebx\n" // the MSB of EBX goes into CF and CF's previous value goes into the LSB of EBX
"decl %%ecx\n" // manually decrement ECX
"jns reverse\n" // jump to reverse if sign flag is set (i.e., if ecx is negative)
"end_asm:\n" // end label
: "=b" (ReversedValue) // output directly to EBX
: "a" (OriginalValue) // input directly to EAX
: "ecx" // clobbered register
);
return ReversedValue;
}
更多回答
Since you specified input and output in registers it makes little sense to hardcode eax
and ebx
. Just use those registers instead. Also if you are optimizing for size, you can loop a fixed 32 times
由于您在寄存器中指定了输入和输出,因此硬编码eax和ebx没有什么意义。只需使用这些寄存器即可。此外,如果您正在优化大小,您可以循环固定的32次
Though I see that you wish to only reverse as many bits as the number is large. Try this (input in ecx
, output in eax
): xor %eax, %eax; 0: rcl %eax; shr %ecx; jbe 0b
. 8 bytes in total.
虽然我看到您只希望反转数字很大的位数。试试这个(输入为ecx,输出为eax):XOR%eax,%eax;0:rcl%eax;shr%ecx;jbe 0b。总共8个字节。
Added x86-64 since that appears to be the only architecture you are interested in. Please always use an architecture tag for assembly questions. The answer could be very different for other architectures, e.g. ARMv8 simply has an rbit
instruction.
添加了x86-64,因为这似乎是您唯一感兴趣的体系结构。请始终使用架构标签回答组装问题。对于其他体系结构,答案可能会非常不同,例如ARMv8只有一条rbit指令。
@vengy: If you want to minimize machine-code size using inline asm, don't make it a naked
function; let the compiler inline it so you don't need a ret
. A naked
function is defeating the purpose of inline asm()
; it's basically equivalent to writing a separate .s
.
@vengy:如果您想使用内联ASM最小化机器代码大小,请不要使其成为一个裸函数;让编译器内联它,这样您就不需要ret了。裸函数违背了内联ASM()的目的;它基本上等同于编写一个单独的.s。
10 bytes:
10个字节:
__asm__ (
"start_asm:\n"
"xorl %%ebx, %%ebx\n" //Clear destination and CF
"repeat:\n\t"
"rcll $1, %%ebx\n" // Shift CF into destination LSB
"shrl $1, %%eax\n" // Shift LSB from source into CF
"jnz repeat\n" // If source is not zero - repeat
// else (source is zero, CF is always 1)
"rcll $1, %%ebx\n" // Shift the last 1 into destination
"end_asm:\n"
: "=b" (ReversedValue) // output
: "a" (OriginalValue) // input
: // no clobbered registers
);
6 bytes
6个字节
uint32_t reverseBits(uint32_t OriginalValue) {
uint32_t ReversedValue = 0;
__asm__ (
"repeat:\n\t"
"shrl $1, %%eax\n" // Shift LSB from source into CF
"rcll $1, %%ebx\n" // Shift CF into destination LSB
"jnz repeat\n" // If source EAX is not zero - repeat
: "=b" (ReversedValue) // output: EBX
: "a" (OriginalValue), // input: OriginalValue is loaded into EAX.
"b" (ReversedValue) // input: ReversedValue (0) is loaded into EBX.
: // no clobbered registers
);
return ReversedValue;
}
更多回答
Nice! So the last instruction that affects ZF is shrl $1, %%eax
. The jnz
checks if EAX has become zero after the shift operation. If it's zero, all the bits from the original number have been processed, and we can move on. If it's non-zero, there are still bits left to reverse, so we keep looping. The final instruction rcll $1, %%ebx
processes the last bit still in the carry flag (CF) to ensure that the final bit from EAX is shifted into EBX. Very clever! :)
好的!因此,影响ZF的最后一条指令是SHRL$1,%%eax。JNZ检查在移位操作之后EAX是否已变为零。如果它是零,则来自原始数字的所有位都已被处理,我们可以继续。如果它不是零,则仍有位需要反转,因此我们继续循环。最终指令RCLL$1,%%EBX处理进位标志(CF)中的最后一位,以确保来自EAX的最后一位被移位到EBX。非常聪明!:)
I generally prefer adc reg, reg
to rcl reg, 1
, but it does not matter for code size here (+1)
我通常更喜欢ADC reg,reg而不是RCL reg,1,但这里的代码大小(+1)无关紧要
Oh, RCL does not affect ZF. Didn't know that.
哦,RCL不影响ZF。我不知道这一点。
I was surprised too that RCL does not affect the SF, ZF, AF, and PF flags. Credit really goes to your ideas. Thanks!
我也很惊讶,RCL不影响SF、ZF、AF和PF旗帜。功劳真的归功于你的想法。谢谢!
我是一名优秀的程序员,十分优秀!