gpt4 book ai didi

C 在字符串中搜索单词

转载 作者:太空宇宙 更新时间:2023-11-04 07:03:56 25 4
gpt4 key购买 nike

我希望有人能帮助我。我认为这是一个简单的问题,我想编写一个程序来搜索文件中的单词。

char *such = "Ingo";
char *fund;
FILE *datei;
char text[100];

datei = fopen("names.txt", "r");

if (datei == NULL) {
printf("Fehler\n");
}
else
{
fscanf(datei, "%100c", text);
text[100] = '\0';
//i think this dont work
if (fgets(text, 100, datei) != NULL)
{
printf("%s \n", text);
}
}

return 0;

该文件包含以下内容:

Ingo Test Test 123 Test Ingo Ingo

现在我想搜索名称“Ingo”在文件中出现的频率。

是否可以搜索更多的词,比如“ingo”和“test”并计算这个?

最佳答案

您应该测试很多条件以确保您只匹配整个单词等。以下是搜索 jury 的一种方法。并且只匹配 jury , jury's , 但不是 injury .您还应该考虑是否要匹配单词的复数形式(例如 reviewreviews 。在单个定界符集合 ( delim ) 下方被认为可以确保您匹配整个单词。您可以轻松地打破它如果您想匹配复数形式或各种其他后缀,则分为两部分并设置开头和结尾。

代码期望文件名作为第一个参数进行搜索,搜索词 ( sterm ) 作为第二个参数。 (如果没有给出参数,它将在 stdin 上的文本中搜索 'the' )。该代码将文件中的每一行读入一个名为 line 的临时缓冲区中然后搜索 line 中的每个字符对于 sterm 中的起始字符.如果找到,则检查前一个字符以确保它是定界符,然后单词后面的字符(按 sterm 长度)也是定界符。如果是与sterm相同字符开头的单词, 前后分隔,然后使用 strncmp 比较内容.

如果满足所有条件,则将单词复制到tmpcount递增。结果与 line从零开始的位置一起打印为了比赛。这只是一个基本的全词搜索,尚未优化,但应该为您提供一个从较少包含的子字符串中区分全词的起点。 (即搜索 'the' 也不会匹配 'them''then''they' 等)。您还可以将此代码转换为一个函数,它将每个匹配项的行号和位置保存在一个结构数组中,您可以将指针返回到该数组。这样你就可以解析你的文本并返回一个指向保存每个匹配项的行和位置的数组的指针。 (那是另一天)。

查看代码,如果您有任何问题,请告诉我。如果您不关心只匹配全词,那么您可以简单地调用strstr在每一行上重复,同时推进指针以计算搜索词的出现次数。最能满足您需求的内容。

#include <stdio.h>
#include <string.h>

#define MAXS 256

int main (int argc, char **argv)
{
char line[MAXS] = {0}; /* line buffer for fgets */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
char *sterm = argc > 2 ? argv[2] : "the";
char *delim = " \t\n\'\".";
size_t count = 0, idx = 0, slen = strlen (sterm);

if (!fp) {
fprintf (stderr, "error: file open failed '%s'\n", argv[1]);
return 1;
}

while (fgets (line, MAXS, fp))
{
size_t i, llen = strlen (line);
idx++;

if (llen < slen + 1)
continue; /* line not longer than search term + \n */

for (i = 0; i < llen - slen + 1; i++) {

if (line[i] != *sterm)
continue; /* char != first char in sterm */
if (i && !strchr (delim, line[i-1]))
continue; /* prior char is not a delim */
if (!strchr (delim, line[i+slen]))
continue; /* next char is not a delim */
if (strncmp (&line[i], sterm, slen))
continue; /* chars don't match sterm */

printf (" line[%2zu] match %2zu. '%s' at location %zu\n",
idx, ++count, sterm, &line[i] - line);
}
}
if (fp != stdin) fclose (fp);

printf ("\n total occurrences of '%s' in '%s' : %zu\n\n",
sterm, argc > 1 ? argv[1] : "stdin", count);

return 0;
}

示例文件

$ cat dat/damages.txt
Personal injury damage awards are unliquidated
and are not capable of certain measurement; thus, the
jury has broad discretion in assessing the amount of
damages in a personal injury case. Yet, at the same
time, a factual sufficiency review insures that the
evidence supports the jury's award; and, although
difficult, the law requires appellate courts to conduct
factual sufficiency reviews on damage awards in
personal injury cases. Thus, while a jury has latitude in
assessing intangible damages in personal injury cases,
a jury's damage award does not escape the scrutiny of
appellate review.

Because Texas law applies no physical manifestation
rule to restrict wrongful death recoveries, a
trial court in a death case is prudent when it chooses
to submit the issues of mental anguish and loss of
society and companionship. While there is a
presumption of mental anguish for the wrongful death
beneficiary, the Texas Supreme Court has not indicated
that reviewing courts should presume that the mental
anguish is sufficient to support a large award. Testimony
that proves the beneficiary suffered severe mental
anguish or severe grief should be a significant and
sometimes determining factor in a factual sufficiency
analysis of large non-pecuniary damage awards.

输出

$ ./bin/searchterm dat/damages.txt jury
line[ 3] match 1. 'jury' at location 0
line[ 6] match 2. 'jury' at location 22
line[ 9] match 3. 'jury' at location 37
line[11] match 4. 'jury' at location 2

total occurrences of 'jury' in 'dat/damages.txt' : 4

$ ./bin/searchterm <dat/damages.txt
line[ 2] match 1. 'the' at location 50
line[ 3] match 2. 'the' at location 39
line[ 4] match 3. 'the' at location 43
line[ 5] match 4. 'the' at location 48
line[ 6] match 5. 'the' at location 18
line[ 7] match 6. 'the' at location 11
line[11] match 7. 'the' at location 38
line[17] match 8. 'the' at location 10
line[19] match 9. 'the' at location 34
line[20] match 10. 'the' at location 13
line[21] match 11. 'the' at location 42
line[23] match 12. 'the' at location 12

total occurrences of 'the' in 'stdin' : 12

使用指针而不是数组索引符号

您可能会发现使用指针 而不是数组索引 符号更自然。 (例如,使用 char *p = line; 并推进 p ,而不是使用 line[X] 符号)。如果是这样,您可以将读取循环替换为以下内容:

    while (fgets (line, MAXS, fp))
{
char *p = line;
size_t llen = strlen (line);
idx++;

if (llen < slen + 1)
continue; /* line not longer than search term + \n */

for (;p < (line + llen - slen + 1); p++) {

if (*p != *sterm)
continue; /* char != first char in sterm */
if (p > line && !strchr (delim, *(p - 1)))
continue; /* prior char is not a delim */
if (!strchr (delim, *(p + slen)))
continue; /* next char is not a delim */
if (strncmp (p, sterm, slen))
continue; /* chars don't match sterm */

printf (" line[%2zu] match %2zu. '%s' at location %zu\n",
idx, ++count, sterm, p - line);
}
}

指针表示法在 C 中可能更自然一些。如果您有任何问题,请告诉我。

关于C 在字符串中搜索单词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34907493/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com