gpt4 book ai didi

c++ - 如何标记(单词)将标点符号分类为空格

转载 作者:塔克拉玛干 更新时间:2023-11-03 00:05:50 29 4
gpt4 key购买 nike

基于这个问题很快就结束了:
Trying to create a program to read a users input then break the array into seperate words are my pointers all valid?

我认为可以做一些额外的工作来帮助 OP 澄清问题,而不是结束。

问题:

我想标记用户输入并将标记存储到一个单词数组中。
我想使用标点符号 (.,-) 作为分隔符,因此将其从 token 流中删除。

在 C 中,我会使用 strtok() 将数组分解为标记,然后手动构建数组。
像这样:

主要功能:

char **findwords(char *str);

int main()
{
int test;
char words[100]; //an array of chars to hold the string given by the user
char **word; //pointer to a list of words
int index = 0; //index of the current word we are printing
char c;

cout << "die monster !";
//a loop to place the charecters that the user put in into the array

do
{
c = getchar();
words[index] = c;
}
while (words[index] != '\n');

word = findwords(words);

while (word[index] != 0) //loop through the list of words until the end of the list
{
printf("%s\n", word[index]); // while the words are going through the list print them out
index ++; //move on to the next word
}

//free it from the list since it was dynamically allocated
free(word);
cin >> test;

return 0;
}

行分词器:

char **findwords(char *str)
{
int size = 20; //original size of the list
char *newword; //pointer to the new word from strok
int index = 0; //our current location in words
char **words = (char **)malloc(sizeof(char *) * (size +1)); //this is the actual list of words

/* Get the initial word, and pass in the original string we want strtok() *
* to work on. Here, we are seperating words based on spaces, commas, *
* periods, and dashes. IE, if they are found, a new word is created. */

newword = strtok(str, " ,.-");

while (newword != 0) //create a loop that goes through the string until it gets to the end
{
if (index == size)
{
//if the string is larger than the array increase the maximum size of the array
size += 10;
//resize the array
char **words = (char **)malloc(sizeof(char *) * (size +1));
}
//asign words to its proper value
words[index] = newword;
//get the next word in the string
newword = strtok(0, " ,.-");
//increment the index to get to the next word
++index;
}
words[index] = 0;

return words;
}

如对上述代码提出任何意见,我们将不胜感激。
但是,此外,在 C++ 中实现此目标的最佳技术是什么?

最佳答案

看看boost tokenizer对于在 C++ 上下文中比 strtok() 好得多的东西。

关于c++ - 如何标记(单词)将标点符号分类为空格,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/6154204/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com