gpt4 book ai didi

c++ - 字符串和 unordered_map 运行缓慢

转载 作者:行者123 更新时间:2023-11-30 04:20:31 26 4
gpt4 key购买 nike

我的代码中有 2 个函数运行非常慢。基本上我读入文档名称,打开文档,然后一次处理一个单词。我需要将文档拆分成句子,并为每个句子提供一个哈希表,表示单词在句子中出现的次数。我还需要跟踪所有新词,以及整个文档的哈希表。

当我现在在 10 个文档上运行我的代码时,总共有 8000 个单词和 2100 个 uniq 单词,运行大约需要 8000 多秒...每个单词几乎 1 秒。

你能告诉我if(istream.good()) 需要多长时间吗?

或者,如果您能知道什么时候延迟了我的代码。如果某个部分不清楚,请告诉我,我会提供帮助。

附言您可以在代码中看到我有一个 start = clock()end = clock() 评论它不断返回 < 1ms。这是令人难以置信的

void  DocProcess::indexString(string sentenceString, hash * sent){

stringstream iss;

string word;
iss.clear();
iss << sentenceString;

while(iss.good())
{

iss >> word;
word = formatWord(word);

std::unordered_map<std::string,int>::const_iterator IsNewWord = words.find(word);

if(IsNewWord == words.end())
{
std::pair<std::string,int> newWordPair (word,0);
std::pair<std::string,int> newWordPairPlusOne (word,1);

words.insert(newWordPair);
sent->insert(newWordPairPlusOne);
}
else
{
std::pair<std::string,int> newWordPairPlusOne (word,1);
sent->insert(newWordPairPlusOne);
}
}

}void DocProcess::indexFile(string iFileName){

hash newDocHash;
hash newSentHash;
scoreAndInfo sentenceScore;
scoreAndInfo dummy;

fstream iFile;
fstream dFile;
string word;
string newDoc;
string fullDoc;
int minSentenceLength = 5;
int docNumber = 1;
int runningLength = 0;
int ProcessedWords = 0;
stringstream iss;

iFile.open(iFileName.c_str());

if(iFile.is_open())
{
while(iFile.good())
{
iFile >> newDoc;
dFile.open(newDoc.c_str());
DocNames.push_back(newDoc);

if(dFile.is_open())
{
scoreAndInfo documentScore;
//iss << dFile.rdbuf();
while(dFile.good())
{
//start = clock();
dFile >> word;
++ProcessedWords;

std::unordered_map<std::string,int>::const_iterator IsStopWord = stopWords.find(word);


if(runningLength >= minSentenceLength && IsStopWord != stopWords.end() || word[word.length()-1] == '.')
{

/* word is in the stop list, process the string*/
documentScore.second.second.append(" "+word);
sentenceScore.second.second.append(" "+word);

indexString(sentenceScore.second.second, &sentenceScore.second.first);

sentenceScore.first=0.0;
SentList.push_back(sentenceScore);
sentenceScore.second.first.clear(); //Clear hash
sentenceScore.second.second.clear(); // clear string
//sentenceScore = dummy;
runningLength = 0;
}
else
{
++runningLength;
sentenceScore.second.second.append(" "+word);
documentScore.second.second.append(" "+word);

}
//end = clock();
system("cls");
cout << "Processing doc number: " << docNumber << endl
<< "New Word count: " << words.size() << endl
<< "Total words: " << ProcessedWords << endl;
//<< "Last process time****: " << double(diffclock(end,start)) << " ms"<< endl;

}
indexString(documentScore.second.second, &documentScore.second.first);
documentScore.first=0.0;
DocList.push_back(documentScore);
dFile.close();
//iss.clear();
//documentScore = dummy;
++docNumber;
//end = clock();
system("cls");
cout << "Processing doc number: " << docNumber << endl
<< "Word count: " << words.size();
//<< "Last process time: " << double(diffclock(end,start)) << " ms"<< endl;

}
}

iFile.close();
}
else{ cout << "Unable to open index file: "<<endl <<iFileName << endl;}

}`

最佳答案

你可以试试吗

                system("cls");

在任何循环中?这肯定没有帮助,这是一个昂贵的电话。

关于c++ - 字符串和 unordered_map 运行缓慢,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/15235732/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com