I'm starting to code more in linux and trying to get a better feel for the environment/APIs which are very different from windows. Anyhow, I'm dabbling with shared libraries .so (versus a windows .dll) and noticed that when a shared object is loaded into memory, it's surprisingly LARGER by 7752 bytes than it was on disk? I was expecting the image on disk to match the image in memory, or maybe there's a bug in my demo code below?
我开始在linux中编写更多的代码,并试图更好地了解与windows截然不同的环境/API。不管怎样,我正在尝试共享库。所以(与windows.dll相比),我注意到当共享对象加载到内存中时,它比磁盘上大7752字节,这令人惊讶吗?我希望磁盘上的图像与内存中的图像匹配,或者下面的演示代码中可能有错误?
Example from godbolt shows this output:
godbolt的示例显示了此输出:
Program returned: 0
Program stdout
Loaded: linux-vdso.so.1
Loaded: /lib/x86_64-linux-gnu/libc.so.6
---------------------MEMORY-------------------------
libc.so.6 size: 2037344 bytes
7fb5733e4000 7f 45 4c 46 02 01 01 03 00 00 00 00 00 00 00 00 .ELF............
7fb5733e4010 03 00 3e 00 01 00 00 00 c0 41 02 00 00 00 00 00 ..>......A......
7fb5733e4020 40 00 00 00 00 00 00 00 18 e7 1e 00 00 00 00 00 @...............
...
7fb5735d5630 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
7fb5735d5640 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
7fb5735d5650 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
---------------------DISK-------------------------
libc.so.6 size: 2029592 bytes
7fb5731f2010 7f 45 4c 46 02 01 01 03 00 00 00 00 00 00 00 00 .ELF............
7fb5731f2020 03 00 3e 00 01 00 00 00 c0 41 02 00 00 00 00 00 ..>......A......
7fb5731f2030 40 00 00 00 00 00 00 00 18 e7 1e 00 00 00 00 00 @...............
...
7fb5733e17f8 00 00 00 00 00 00 00 00 c8 e2 1e 00 00 00 00 00 ................
7fb5733e1808 4b 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 K...............
7fb5733e1818 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Question
问题
I'm curious why the shared library size discrepancy between disk versus memory?
我很好奇为什么磁盘和内存之间的共享库大小不一致?
更多回答
In addition to globals, as pointed out by Misha below, sections in memory are often page-aligned but use a smaller padding size on disk.
正如下面Misha所指出的,除了全局变量之外,内存中的部分通常是页面对齐的,但在磁盘上使用较小的填充大小。
As a variant, it could use rather large .bss section (uninitialized static and global variables).
作为一种变体,它可以使用相当大的.bss部分(未初始化的静态和全局变量)。
For example, function
例如,函数
void foo(char * dst, size_t len)
{
static char table[4096];
static int initialized;
if (!initialized)
{
for (size_t i = 0; i < sizeof table; i++)
table[i] = rand();
initialized = 0;
}
for (size_t i = 0; i < length; i++)
dst[i] &= table[i % sizeof(table)];
}
Why is a linux shared library .so possibly larger in memory than on disk?
A shared object is basically a program that has several entry points instead of one. As such, it has a .text segment, a .data segment, and a .bss segment, exactly equal as a program itself. When the dynamic loader loads it, it copies the .text segment and .data (initialized data) from the file it comes from, but the .bss (uninitialized data) comes directly from a zero filled segment, which has no representation in the file the library is stored.
共享对象基本上是一个具有多个入口点而不是一个入口点的程序。因此,它有一个.text段、一个.data段和一个.bss段,与程序本身完全相同。当动态加载程序加载它时,它会从文件中复制.text段和.data(初始化数据),但.bss(未初始化数据)直接来自一个零填充段,该段在库存储的文件中没有表示形式。
For this reason, a library with a large uninitialized global data segment (with a large .bss segment) can be larger, when loaded, than the original file it loads from.
因此,具有较大未初始化全局数据段(具有较大的.bss段)的库在加载时可能比从中加载的原始文件大。
This also makes a program's size to be normally larger than the file it is stored in. Didn't you realize? :)
这也使得程序的大小通常比存储在其中的文件大。你没有意识到吗?:)
更多回答
Are you saying that a static char table[4096];
occupies minimal space in the ,bss section on disk, but when loaded into memory, it expands to 4096 bytes, making the memory image larger than on disk, is that a correct interpretation? Thanks.
你是说静态字符表[4096];在磁盘上的、bss部分占据了最小的空间,但当加载到内存中时,它扩展到4096字节,使内存映像比磁盘上的大,这是正确的解释吗?谢谢
@vengy Exactly. BSS section occupies minimal section on disk, origin address and length, just two numbers. At the start this memory set to zero. Static variables just lie in this area of memory (i.e. their addresses are inside BSS), they aren't loaded into memory.
@vengy没错。BSS部分占用磁盘上最小的部分,原始地址和长度,只有两个数字。开始时,此内存设置为零。静态变量只是位于内存的这个区域(即它们的地址在BSS内),它们不会加载到内存中。
The GNU toolchain supports a way to compress individual ELF sections by renaming a section name from .debug*
to .zdebug*
by using objcopy --compress-debug-sections ...
so that would be another reason why a disk image would be smaller than its memory footprint.
GNU工具链支持一种压缩单个ELF节的方法,方法是使用objcopy压缩调试节,将节名从.debug*重命名为.zdebug*。。。因此,这将是磁盘映像小于其内存占用空间的另一个原因。
Thanks @vengy, but I'm talking about something general, not about the GNU toolchain.
谢谢@vengy,但我说的是一般性的东西,而不是GNU工具链。
我是一名优秀的程序员,十分优秀!