- c - 在位数组中找到第一个零
- linux - Unix 显示有关匹配两种模式之一的文件的信息
- 正则表达式替换多个文件
- linux - 隐藏来自 xtrace 的命令
考虑到这段计算所有出现次数的代码,您如何删除常用词?
For example, if the word is from the top 100 English words then, don't count that word.
如果您根据维基百科取最常见的 100 个单词,您如何将它们添加到数组中并检查以不将它们计算在列表中: https://en.wikipedia.org/wiki/Most_common_words_in_English
数组形式的前 100 个最常见的单词:
#define NUMBER_OF_STRING 100
#define MAX_STRING_SIZE 50
char commonWords[NUMBER_OF_STRING][MAX_STRING_SIZE] = {"the", "be", "to", "of", "and", "a", "in", "that", "have", "I", "it", "for", "not", "on", "with", "he", "as", "you", "do", "at", "this", "but", "his", "by", "from", "they", "we", "say", "her", "she", "or", "an", "will", "my", "one", "all", "would", "there", "their", "what", "so", "up", "out", "if", "about", "who", "get", "which", "go", "me", "when", "make", "can", "like", "time", "no", "just", "him", "know", "take", "people", "into", "year", "your", "good", "some", "could", "them", "see", "other", "than", "then", "now", "look", "only", "come", "its", "over", "think", "also", "back", "after", "use", "two", "how", "our", "work", "first", "well", "way", "even", "new", "want", "because", "any", "these", "give", "day", "most", "us"};
代码示例:
/**
* C program to count occurrences of all words in a file.
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <limits.h>
#define MAX_WORD 20000 /* max word size */
#define MAX_WORDS 8 /* initial number of struct to allocate */
#ifndef PATH_MAX
#define PATH_MAX 2048 /* max path (defined for Linux in limits.h) */
#endif
typedef struct { /* use a struct to hold */
char word[MAX_WORD]; /* lowercase word, and */
int cap, count; /* if it appeast capitalized, and its count */
} words_t;
char *strlwr (char *str) /* no need for unsigned char */
{
char *p = str;
while (*p) {
*p = tolower(*p);
p++;
}
return str;
}
int main (void) {
FILE *fptr;
char path[PATH_MAX], word[MAX_WORD];
size_t i, len, index = 0, max_words = MAX_WORDS;
/* pointer to allocated block of max_words struct initialized zero */
words_t *words = calloc (max_words, sizeof *words);
if (!words) { /* valdiate every allocation */
perror ("calloc-words");
exit (EXIT_FAILURE);
}
/* Input file path */
printf ("Enter file path: ");
if (scanf ("%s", path) != 1) { /* validate every input */
fputs ("error: invalid file path or cancellation.\n", stderr);
return 1;
}
fptr = fopen (path, "r"); /* open file */
if (fptr == NULL) { /* validate file open */
fputs ( "Unable to open file.\n"
"Please check you have read privileges.\n", stderr);
exit (EXIT_FAILURE);
}
while (fscanf (fptr, "%s", word) == 1) { /* while valid word read */
int iscap = 0, isunique = 1; /* is captial, is unique flags */
if (isupper (*word)) /* is the word uppercase */
iscap = 1;
/* remove all trailing punctuation characters */
len = strlen (word); /* get length */
while (len && ispunct(word[len - 1])) /* only if len > 0 */
word[--len] = 0;
strlwr (word); /* convert word to lowercase */
/* check if word exits in list of all distinct words */
for (i = 0; i < index; i++) {
if (strcmp(words[i].word, word) == 0) {
isunique = 0; /* set unique flag zero */
if (iscap) /* if capital flag set */
words[i].cap = iscap; /* set capital flag in struct */
words[i].count++; /* increment word count */
break; /* bail - done */
}
}
if (isunique) { /* if unique, add to array, increment index */
if (index == max_words) { /* is realloc needed? */
/* always use a temporary pointer with realloc */
void *tmp = realloc (words, 2 * max_words * sizeof *words);
if (!tmp) { /* validate every allocation */
perror ("realloc-words");
break; /* don't exit, original data still valid */
}
words = tmp; /* assign reallocated block to words */
/* (optional) set all new memory to zero */
memset (words + max_words, 0, max_words * sizeof *words);
max_words *= 2; /* update max_words to reflect new limit */
}
memcpy (words[index].word, word, len + 1); /* have len */
if (iscap) /* if cap flag set */
words[index].cap = iscap; /* set capital flag in struct */
words[index++].count++; /* increment count & index */
}
}
fclose (fptr); /* close file */
/*
* Print occurrences of all words in file.
*/
puts ("\nOccurrences of all distinct words with Cap in file:");
for (i = 0; i < index; i++) {
if (words[i].cap) {
strcpy (word, words[i].word);
*word = toupper (*word);
/*
* %-15s prints string in 15 character width.
* - is used to print string left align inside
* 15 character width space.
*/
printf("%-8d %s\n", words[i].count, word);
}
}
free (words);
return 0;
}
要测试的文本文件:(cars.txt)
A car (or automobile) is a wheeled motor vehicle used for transportation. Most definitions of car say they run primarily on roads, seat one to eight people, have four tires, and mainly transport people rather than goods.[2][3]
Cars came into global use during the 20th century, and developed economies depend on them. The year 1886 is regarded as the birth year of the modern car when German inventor Karl Benz patented his Benz Patent-Motorwagen. Cars became widely available in the early 20th century. One of the first cars accessible to the masses was the 1908 Model T, an American car manufactured by the Ford Motor Company. Cars were rapidly adopted in the US, where they replaced animal-drawn carriages and carts, but took much longer to be accepted in Western Europe and other parts of the world.
Cars have controls for driving, parking, passenger comfort, and a variety of lights. Over the decades, additional features and controls have been added to vehicles, making them progressively more complex. These include rear reversing cameras, air conditioning, navigation systems, and in-car entertainment. Most cars in use in the 2010s are propelled by an internal combustion engine, fueled by the combustion of fossil fuels. Electric cars, which were invented early in the history of the car, began to become commercially available in 2008.
There are costs and benefits to car use. The costs include acquiring the vehicle, interest payments (if the car is financed), repairs and maintenance, fuel, depreciation, driving time, parking fees, taxes, and insurance.[4] The costs to society include maintaining roads, land use, road congestion, air pollution, public health, health care, and disposing of the vehicle at the end of its life. Road traffic accidents are the largest cause of injury-related deaths worldwide.[5]
The benefits include on-demand transportation, mobility, independence, and convenience.[6] The societal benefits include economic benefits, such as job and wealth creation from the automotive industry, transportation provision, societal well-being from leisure and travel opportunities, and revenue generation from the taxes. People's ability to move flexibly from place to place has far-reaching implications for the nature of societies.[7] There are around 1 billion cars in use worldwide. The numbers are increasing rapidly, especially in China, India and other newly industrialized countries.[8]
当前输出:
Occurrences of all distinct words with Cap in file:
3 A
2 Motor
2 Most
2 One
8 Cars
29 The
1 German
1 Karl
2 Benz
1 Patent-motorwagen
1 Model
1 T
1 American
1 Ford
1 Company
1 Us
1 Western
1 Europe
1 Over
1 These
1 Electric
2 There
2 Road
1 People's
1 China
1 India
预期输出:(仅示例)
2 Motor
1 German
1 Karl
2 Benz
1 Patent-motorwagen
1 Model
1 T
1 American
1 Ford
1 Company
编辑更新:可能的解决方案:
// skip the word if it is a common word
for (int i = 0; i < NUMBER_OF_STRING; i++) {
if (strcmp(word, commonWords[i])==0) {
continue;
}
}
最佳答案
一种稍微更有效的方法是使用对 strstr
的单个调用,而不是尝试与前 100 个最常见的单词中的每一个进行比较。由于您知道 100 个最常见的单词,并且它们不会改变,因此您可以轻松确定最长的是 7 个字符。换句话说,您只需要测试 word
是否是最常见的词之一,如果它小于:
#define TOP_LEN 8 /* longest string in TOP100 + nul-character */
由于单词没有改变,您可以继续:
const char TOP100[] = " the be to of and a in that have i it for not on with"
" he as you do at this but his by from they we say her she or"
" an will my one all would there their what so up out if about"
" who get which go me when make can like time no just him know"
" take people into year your good some could them see other"
" than then now look only come its over think also back after"
" use two how our work first well way even new want because"
" any these give day most us ";
(注意每个单词之前的空格
和之后的空格
,这允许您创建一个teststr
通过在单词的两边包含一个空格来使用 strstr
进行搜索。'I'
已转换为小写以在 strlwr (word) 之后工作;
)
(另请注意:您还可以在 #define TOP100 "the ... us "
中使用常量文字,但它会换行并可怕地滚动到页面之外在这里——由你决定)
有了 100 个最常用单词的常量字符串,唯一需要添加的是:
...
strlwr (word); /* convert word to lowercase */
/* check against 100 most common words (TOP100) */
if (len < TOP_LEN) { /* word less than TOP_LEN? */
char teststr[TOP_LEN * 2]; /* buffer for " word " */
sprintf (teststr, " %s ", word); /* create teststr */
if (strstr (TOP100, teststr)) /* check if in TOP100 */
continue; /* if so, get next word */
}
...
如上所示,您检查单词是否为 7 个字符或更少(否则无需检查最常见的字符)。然后你声明一个 teststr
来保存你的字符串,两端都有一个空格。 (因为最长的常用词是 7 个字符,那么 7 个字符加上 2 个空格是 9 个字符,再加上 nul 字符是 10 个字符,所以 16 个字符在这里绰绰有余。)
只需调用 sprintf
即可将空格放在 word
的每一端,然后调用 strstr
是查看 word
是否在前 100 个最常见单词中所需的全部内容。如果是,则无需继续,只需继续
并获取下一个单词即可。
将其完全放入您的代码中您将拥有:
/**
* C program to count occurrences of all words in a file.
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <limits.h>
#define MAX_WORD 20000 /* max word size */
#define MAX_WORDS 8 /* initial number of struct to allocate */
#define TOP_LEN 8 /* longest string in TOP100 */
#ifndef PATH_MAX
#define PATH_MAX 2048 /* max path (defined for Linux in limits.h) */
#endif
const char TOP100[] = " the be to of and a in that have i it for not on with"
" he as you do at this but his by from they we say her she or"
" an will my one all would there their what so up out if about"
" who get which go me when make can like time no just him know"
" take people into year your good some could them see other"
" than then now look only come its over think also back after"
" use two how our work first well way even new want because"
" any these give day most us ";
typedef struct { /* use a struct to hold */
char word[MAX_WORD]; /* lowercase word, and */
int cap, count; /* if it appeast capitalized, and its count */
} words_t;
char *strlwr (char *str) /* no need for unsigned char */
{
char *p = str;
while (*p) {
*p = tolower(*p);
p++;
}
return str;
}
int main (void) {
FILE *fptr;
char path[PATH_MAX], word[MAX_WORD];
size_t i, len, index = 0, max_words = MAX_WORDS;
/* pointer to allocated block of max_words struct initialized zero */
words_t *words = calloc (max_words, sizeof *words);
if (!words) { /* valdiate every allocation */
perror ("calloc-words");
exit (EXIT_FAILURE);
}
/* Input file path */
printf ("Enter file path: ");
if (scanf ("%s", path) != 1) { /* validate every input */
fputs ("error: invalid file path or cancellation.\n", stderr);
return 1;
}
fptr = fopen (path, "r"); /* open file */
if (fptr == NULL) { /* validate file open */
fputs ( "Unable to open file.\n"
"Please check you have read privileges.\n", stderr);
exit (EXIT_FAILURE);
}
while (fscanf (fptr, "%s", word) == 1) { /* while valid word read */
int iscap = 0, isunique = 1; /* is captial, is unique flags */
if (isupper (*word)) /* is the word uppercase */
iscap = 1;
/* remove all trailing punctuation characters */
len = strlen (word); /* get length */
while (len && ispunct(word[len - 1])) /* only if len > 0 */
word[--len] = 0;
strlwr (word); /* convert word to lowercase */
/* check against 100 most common words (TOP100) */
if (len < TOP_LEN) { /* word less than TOP_LEN? */
char teststr[TOP_LEN * 2]; /* buffer for " word " */
sprintf (teststr, " %s ", word); /* create teststr */
if (strstr (TOP100, teststr)) /* check if in TOP100 */
continue; /* if so, get next word */
}
/* check if word exits in list of all distinct words */
for (i = 0; i < index; i++) {
if (strcmp(words[i].word, word) == 0) {
isunique = 0; /* set unique flag zero */
if (iscap) /* if capital flag set */
words[i].cap = iscap; /* set capital flag in struct */
words[i].count++; /* increment word count */
break; /* bail - done */
}
}
if (isunique) { /* if unique, add to array, increment index */
if (index == max_words) { /* is realloc needed? */
/* always use a temporary pointer with realloc */
void *tmp = realloc (words, 2 * max_words * sizeof *words);
if (!tmp) { /* validate every allocation */
perror ("realloc-words");
break; /* don't exit, original data still valid */
}
words = tmp; /* assign reallocated block to words */
/* (optional) set all new memory to zero */
memset (words + max_words, 0, max_words * sizeof *words);
max_words *= 2; /* update max_words to reflect new limit */
}
memcpy (words[index].word, word, len + 1); /* have len */
if (iscap) /* if cap flag set */
words[index].cap = iscap; /* set capital flag in struct */
words[index++].count++; /* increment count & index */
}
}
fclose (fptr); /* close file */
/*
* Print occurrences of all words in file.
*/
puts ("\nOccurrences of all distinct words with Cap in file:");
for (i = 0; i < index; i++) {
if (words[i].cap) {
strcpy (word, words[i].word);
*word = toupper (*word);
/*
* %-15s prints string in 15 character width.
* - is used to print string left align inside
* 15 character width space.
*/
printf("%-8d %s\n", words[i].count, word);
}
}
free (words);
return 0;
}
示例使用/输出
与上次一样,您的预期输出:(仅示例) 是错误的,因为您的代码中没有任何内容可以删除复数、所有 或复数所有格,因此您的 cars.txt
文件的输出将是:
$ ./bin/unique_words_exclude_top_100
Enter file path: dat/cars.txt
Occurrences of all distinct words with Cap in file:
2 Motor
8 Cars
1 German
1 Karl
2 Benz
1 Patent-motorwagen
1 Model
1 T
1 American
1 Ford
1 Company
1 Western
1 Europe
1 Electric
2 Road
1 People's
1 China
1 India
检查一下,如果您还有其他问题,请告诉我。
关于c - 如果单词出现在单词数组中,则排除该单词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56165180/
#include using namespace std; class C{ private: int value; public: C(){ value = 0;
这个问题已经有答案了: What is the difference between char a[] = ?string?; and char *p = ?string?;? (8 个回答) 已关闭
关闭。此题需要details or clarity 。目前不接受答案。 想要改进这个问题吗?通过 editing this post 添加详细信息并澄清问题. 已关闭 7 年前。 此帖子已于 8 个月
除了调试之外,是否有任何针对 c、c++ 或 c# 的测试工具,其工作原理类似于将独立函数复制粘贴到某个文本框,然后在其他文本框中输入参数? 最佳答案 也许您会考虑单元测试。我推荐你谷歌测试和谷歌模拟
我想在第二台显示器中移动一个窗口 (HWND)。问题是我尝试了很多方法,例如将分辨率加倍或输入负值,但它永远无法将窗口放在我的第二台显示器上。 关于如何在 C/C++/c# 中执行此操作的任何线索 最
我正在寻找 C/C++/C## 中不同类型 DES 的现有实现。我的运行平台是Windows XP/Vista/7。 我正在尝试编写一个 C# 程序,它将使用 DES 算法进行加密和解密。我需要一些实
很难说出这里要问什么。这个问题模棱两可、含糊不清、不完整、过于宽泛或夸夸其谈,无法以目前的形式得到合理的回答。如需帮助澄清此问题以便重新打开,visit the help center . 关闭 1
有没有办法强制将另一个 窗口置于顶部? 不是应用程序的窗口,而是另一个已经在系统上运行的窗口。 (Windows, C/C++/C#) 最佳答案 SetWindowPos(that_window_ha
假设您可以在 C/C++ 或 Csharp 之间做出选择,并且您打算在 Windows 和 Linux 服务器上运行同一服务器的多个实例,那么构建套接字服务器应用程序的最明智选择是什么? 最佳答案 如
你们能告诉我它们之间的区别吗? 顺便问一下,有什么叫C++库或C库的吗? 最佳答案 C++ 标准库 和 C 标准库 是 C++ 和 C 标准定义的库,提供给 C++ 和 C 程序使用。那是那些词的共同
下面的测试代码,我将输出信息放在注释中。我使用的是 gcc 4.8.5 和 Centos 7.2。 #include #include class C { public:
很难说出这里问的是什么。这个问题是含糊的、模糊的、不完整的、过于宽泛的或修辞性的,无法以目前的形式得到合理的回答。如需帮助澄清此问题以便重新打开它,visit the help center 。 已关
我的客户将使用名为 annoucement 的结构/类与客户通信。我想我会用 C++ 编写服务器。会有很多不同的类继承annoucement。我的问题是通过网络将这些类发送给客户端 我想也许我应该使用
我在 C# 中有以下函数: public Matrix ConcatDescriptors(IList> descriptors) { int cols = descriptors[0].Co
我有一个项目要编写一个函数来对某些数据执行某些操作。我可以用 C/C++ 编写代码,但我不想与雇主共享该函数的代码。相反,我只想让他有权在他自己的代码中调用该函数。是否可以?我想到了这两种方法 - 在
我使用的是编写糟糕的第 3 方 (C/C++) Api。我从托管代码(C++/CLI)中使用它。有时会出现“访问冲突错误”。这使整个应用程序崩溃。我知道我无法处理这些错误[如果指针访问非法内存位置等,
关闭。这个问题不符合Stack Overflow guidelines .它目前不接受答案。 我们不允许提问寻求书籍、工具、软件库等的推荐。您可以编辑问题,以便用事实和引用来回答。 关闭 7 年前。
已关闭。此问题不符合Stack Overflow guidelines 。目前不接受答案。 要求我们推荐或查找工具、库或最喜欢的场外资源的问题对于 Stack Overflow 来说是偏离主题的,因为
我有一些 C 代码,将使用 P/Invoke 从 C# 调用。我正在尝试为这个 C 函数定义一个 C# 等效项。 SomeData* DoSomething(); struct SomeData {
这个问题已经有答案了: Why are these constructs using pre and post-increment undefined behavior? (14 个回答) 已关闭 6
我是一名优秀的程序员,十分优秀!