作者热门文章
- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我试图在一个大的内存映射文件中找到一个正则表达式通过使用 regexec() 函数。我发现程序崩溃时文件大小是页面大小的倍数。
是否有一个regexec() 函数,它具有字符串的长度作为附加参数?
或者:
如何在内存映射文件中查找正则表达式?
这是总是崩溃的最小示例(如果我运行少于 3 个线程程序不会崩溃):
ls -la ttt.txt
-rwx------ 1 bob bob 409600 Jun 14 18:16 ttt.txt
gcc -Wall mal.c -o mal -lpthread -g && ./mal
[1] 11364 segmentation fault (core dumped) ./mal
程序是:
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
#include <stdio.h>
#include <assert.h>
#include <pthread.h>
#include <regex.h>
void* f(void*arg) {
int size = 409600;
int fd = open("ttt.txt", O_RDONLY);
char* text = mmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0);
close(fd);
fd = open("/dev/zero", O_RDONLY);
char* end = mmap(text + size, 4096, PROT_READ, MAP_PRIVATE | MAP_FIXED, fd, 0);
close(fd);
assert(text+size == end);
regex_t myre;
regcomp(&myre, "XXXXX", REG_EXTENDED);
regexec(&myre, text, 0, NULL, 0);
regfree(&myre);
return NULL;
}
int main(int argc, char* argv[]) {
int n = 10;
int i;
pthread_t t[n];
for (i = 0; i < n; ++i) {
pthread_create(&t[n], NULL, f, NULL);
}
for (i = 0; i < n; ++i) {
pthread_join(t[n], NULL);
}
return 0;
}
附言这是 gdb 的输出:
gdb ./mal
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /home/bob/prog/c/mal...done.
(gdb) r
Starting program: /home/srdjan/prog/c/mal
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff77ff700 (LWP 11817)]
[New Thread 0x7ffff6ffe700 (LWP 11818)]
[New Thread 0x7ffff6799700 (LWP 11819)]
[New Thread 0x7fffeffff700 (LWP 11820)]
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff6799700 (LWP 11819)]
__strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:72
72 ../sysdeps/x86_64/multiarch/../strlen.S: No such file or directory.
(gdb) bt
#0 __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:72
#1 0x00007ffff78df254 in __regexec (preg=0x7ffff6798e80, string=0x7fffef79b000 'a' <repeats 200 times>..., nmatch=<optimized out>,
pmatch=0x0, eflags=<optimized out>) at regexec.c:245
#2 0x00000000004008e6 in f (arg=0x0) at mal.c:24
#3 0x00007ffff7bc4e9a in start_thread (arg=0x7ffff6799700) at pthread_create.c:308
#4 0x00007ffff78f24bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#5 0x0000000000000000 in ?? ()
(gdb)
最佳答案
Celada正确识别问题 - 文件数据不一定包含空终止符。
您可以通过在文件后立即映射一页零来解决此问题:
int fd;
char *text;
fd = open("ttt.txt", O_RDONLY);
text = mmap(NULL, 409600, PROT_READ, MAP_PRIVATE, fd, 0);
close(fd);
fd = open("/dev/zero", O_RDONLY);
mmap(text + 409600, 4096, PROT_READ, MAP_PRIVATE | MAP_FIXED, fd, 0);
close(fd);
(注意可以在mmap()
之后立即关闭fd
,因为mmap()
添加了对打开文件描述的引用).
您当然应该在上面添加错误检查。此外,许多 UNIX 系统支持 MAP_ANONYMOUS
标志,您可以使用它来代替打开 /dev/zero
(但这不在 POSIX 中)。
关于c - 如何将 regexec 与内存映射文件一起使用?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11037521/
我是一名优秀的程序员,十分优秀!