- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我正在尝试使用朴素贝叶斯分类器对推文的情绪进行分类。所以当我运行下面的代码时,我收到了这个错误,
ValueError:ELE 概率分布必须至少有一个 bin。
代码如下
import re,nltk
# start process_tweet
def processTweet(tweet):
# process the tweets
# Convert to lower case
tweet = tweet.lower()
# Convert www.* or https?://* to URL
tweet = re.sub('((www\.[^\s]+)|(https?://[^\s]+))', 'URL', tweet)
# Convert @username to AT_USER
tweet = re.sub('@[^\s]+', 'AT_USER', tweet)
# Remove additional white spaces
tweet = re.sub('[\s]+', ' ', tweet)
# Replace #word with word
tweet = re.sub(r'#([^\s]+)', r'\1', tweet)
# trim
tweet = tweet.strip('\'"')
return tweet
# end
# Read the tweets one by one and process it
fp = open('/home/ashish/PyCharm_proj/twitter_sentiment/data/sampleData.txt', 'r')
line = fp.readline()
print "Processed tweets\n"
while line:
processedTweet = processTweet(line)
print processedTweet
line = fp.readline()
# end loop
#start getfeatureVector
def getFeatureVector(tweet):
featureVector = []
#split tweet into words
words = tweet.split()
for w in words:
#replace two or more with two occurrences
w = replaceTwoOrMore(w)
#strip punctuation
w = w.strip('\'"?,.')
#check if the word stats with an alphabet
val = re.search(r"^[a-zA-Z][a-zA-Z0-9]*$", w)
#ignore if it is a stop word
if(w in stopWords or val is None):
continue
else:
featureVector.append(w.lower())
return featureVector
#end
#fp.close()
# initialize stopWords
stopWords = []
inpTweets=fp
featureList=[]
#Read the tweets one by one and process it
tweets = []
for row in inpTweets:
sentiment = row[0]
tweet = row[1]
processedTweet = processTweet(tweet)
featureVector = getFeatureVector(processedTweet, stopWords)
featureList.extend(featureVector)
tweets.append((featureVector, sentiment));
#end loop
#start extract_features
def extract_features(tweet):
tweet_words = set(tweet)
features = {}
for word in featureList:
features['contains(%s)' % word] = (word in tweet_words)
#print "Features are: "+features
return features
#end
#print "Feature List is:"+"\n"+featureList
# Remove featureList duplicates
featureList = list(set(featureList))
training_set = nltk.classify.util.apply_features(extract_features, tweets)
# start replaceTwoOrMore
def replaceTwoOrMore(s):
# look for 2 or more repetitions of character and replace with the character itself
pattern = re.compile(r"(.)\1{1,}", re.DOTALL)
return pattern.sub(r"\1\1", s)
# end
# start getStopWordList
def getStopWordList(stopWordListFileName):
# read the stopwords file and build a list
stopWords = []
stopWords.append('AT_USER')
stopWords.append('URL')
fp = open(stopWordListFileName, 'r')
line = fp.readline()
while line:
word = line.strip()
stopWords.append(word)
line = fp.readline()
fp.close()
return stopWords
# end
# start getfeatureVector
def getFeatureVector(tweet):
featureVector = []
# split tweet into words
words = tweet.split()
for w in words:
# replace two or more with two occurrences
w = replaceTwoOrMore(w)
# strip punctuation
w = w.strip('\'"?,.')
# check if the word stats with an alphabet
val = re.search(r"^[a-zA-Z][a-zA-Z0-9]*$", w)
# ignore if it is a stop word
if (w in stopWords or val is None):
continue
else:
featureVector.append(w.lower())
return featureVector
# Train the classifier
NBClassifier = nltk.NaiveBayesClassifier.train(training_set)
# Test the classifier
testTweet = 'Congrats @ashish, The classifier works'
processedTestTweet = processTweet(testTweet)
print NBClassifier.classify(extract_features(getFeatureVector(processedTestTweet)))
# end
# Read the tweets one by one and process it
fp = open('/home/ashish/PyCharm_proj/twitter_sentiment/data/sampleData.txt', 'r')
line = fp.readline()
stopWords = getStopWordList('/home/ashish/PyCharm_proj/twitter_sentiment/data/feature_list/stopwords.txt')
print "\n Feature vectors are:\n "
while line:
processedTweet = processTweet(line)
featureVector = getFeatureVector(processedTweet)
print featureVector
line = fp.readline()
# end loop
fp.close()
最佳答案
您必须首先为训练数据创建字典格式。如果您查看 .train()
的文档你会发现很多细节。
关于python - 值错误 : A ELE probability distribution must have at least one bin,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31937797/
我正在尝试使用 Actions 类的 click(WebElement) 方法点击 google 主页上的元素。代码运行成功,但没有触发点击事件。 package p1; import org.ope
我正在使用 Cytoscape.js 2.7.15 进行我的毕业项目,我需要进行一些简单的可视化,例如更改节点的标签。 subjectNode.style('label',myDesiredLabel
我有以下点击事件处理程序: $('html').click(function() { do something}); $('#my_div').click(function() { do some
很难说出这里问的是什么。这个问题是含糊的、模糊的、不完整的、过于宽泛的或修辞性的,无法以目前的形式得到合理的回答。如需帮助澄清此问题以便重新打开它,visit the help center 。 已关
我有两个断开连接的组件。其中之一是“控制面板”,其中的每个节点在单击时都会触发一个事件,根据边缘所具有的权重,从另一个节点中删除某些边缘。 cy.on('tap', 'node', function(
我正在尝试使用朴素贝叶斯分类器对推文的情绪进行分类。所以当我运行下面的代码时,我收到了这个错误, ValueError:ELE 概率分布必须至少有一个 bin。 代码如下 import re,nltk
std::initializer_list分配一个临时数组T[],其元素是使用list-initializer复制的。它是begin和end方法返回const T*。这样一来,您就无法移动元素,而又可
我用 JavaScript 编写了一个函数,如下所示: function scrollToElement(ele) { //alert(ele); $(window).scrollTop(ele
我一直在想办法如何管理在我的应用程序中实现谷歌地图自动完成后遇到的这个错误。我的maps.ts 文件看起来像; import { Component, NgZone } from "@angular/
我是一名优秀的程序员,十分优秀!