gpt4 book ai didi

C:连接字符串的最佳和最快方法是什么

转载 作者:太空狗 更新时间:2023-10-29 16:28:19 25 4
gpt4 key购买 nike

我目前使用 string.h 库中的 strcat() 函数在 c 语言中连接字符串。

我想了想,我得出的结论是它应该是一个非常昂贵的函数,因为在它开始连接之前,它必须遍历 char 数组直到找到 '\0' 字符。

例如,如果我使用 strcat() 将字符串 "horses" 连接 1000 次,我将不得不支付(1 + 2 + 3 + ... + 1000) * strlen("马") = (1000*1001)/2 * 6 = 3003000

我想到了非标准的方法,用字符串长度维护一个整数,然后将指向字符串末尾的指针发送到 strcat():

strcat(dest + dest_len, "string");

在这种情况下,我将只支付 1000 * strlen("horses") = 1000 * 6 = 6000

60003003000 小 500 倍,因此如果您进行大量此类串联,它对性能非常关键。

有没有比我的解决方案看起来更好的更标准的方法?

最佳答案

Joel Spolsky,在他的 Back to Basics 中文章,描述了strcat 的低效字符串连接问题,作为Shlemiel the painter's algorithm(阅读文章,相当不错)。作为低效代码的一个例子,他给出了这个例子,它在 O(n2) 时间内运行:

char bigString[1000];     /* I never know how much to allocate... */
bigString[0] = '\0';
strcat(bigString,"John, ");
strcat(bigString,"Paul, ");
strcat(bigString,"George, ");
strcat(bigString,"Joel ");

第一次遍历第一个字符串并不是真正的问题;因为我们已经遍历了第二个字符串,one strcat 的运行时间与结果的长度成线性关系。但是,多个 strcat 是有问题的,因为我们一次又一次地遍历之前连接的结果。他提供了这个替代方案:

How do we fix this? A few smart C programmers implemented their own mystrcat as follows:

char* mystrcat( char* dest, char* src )
{
while (*dest) dest++;
while (*dest++ = *src++);
return --dest;
}

What have we done here? At very little extra cost we're returning a pointer to the end of the new, longer string. That way the code that calls this function can decide to append further without rescanning the string:

char bigString[1000];     /* I never know how much to allocate... */
char *p = bigString;
bigString[0] = '\0';
p = mystrcat(p,"John, ");
p = mystrcat(p,"Paul, ");
p = mystrcat(p,"George, ");
p = mystrcat(p,"Joel ");

This is, of course, linear in performance, not n-squared, so it doesn't suffer from degradation when you have a lot of stuff to concatenate.

当然,如果你想使用标准的 C 字符串,你可以这样做。您描述的缓存字符串长度和使用特殊连接函数(例如,使用略有不同的参数调用 strcat )的替代方法是 Pascal 字符串的一种变体,Joel 也提到了:

The designers of Pascal were aware of this problem and "fixed" it by storing a byte count in the first byte of the string. These are called Pascal Strings. They can contain zeros and are not null terminated. Because a byte can only store numbers between 0 and 255, Pascal strings are limited to 255 bytes in length, but because they are not null terminated they occupy the same amount of memory as ASCIZ strings. The great thing about Pascal strings is that you never have to have a loop just to figure out the length of your string. Finding the length of a string in Pascal is one assembly instruction instead of a whole loop. It is monumentally faster.

For a long time, if you wanted to put a Pascal string literal in your C code, you had to write:

char* str = "\006Hello!";

Yep, you had to count the bytes by hand, yourself, and hardcode it into the first byte of your string. Lazy programmers would do this, and have slow programs:

char* str = "*Hello!";
str[0] = strlen(str) - 1;

关于C:连接字符串的最佳和最快方法是什么,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21880730/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com