python - 值错误 : A ELE probability distribution must have at least one bin-6ren

python - 值错误 : A ELE probability distribution must have at least one bin

转载作者：行者123 更新时间：2023-12-02 03:23:06

我正在尝试使用朴素贝叶斯分类器对推文的情绪进行分类。所以当我运行下面的代码时，我收到了这个错误，

ValueError:ELE 概率分布必须至少有一个 bin。

代码如下

import re,nltk

# start process_tweet
def processTweet(tweet):
    # process the tweets

    # Convert to lower case
    tweet = tweet.lower()
    # Convert www.* or https?://* to URL
    tweet = re.sub('((www\.[^\s]+)|(https?://[^\s]+))', 'URL', tweet)
    # Convert @username to AT_USER
    tweet = re.sub('@[^\s]+', 'AT_USER', tweet)
    # Remove additional white spaces
    tweet = re.sub('[\s]+', ' ', tweet)
    # Replace #word with word
    tweet = re.sub(r'#([^\s]+)', r'\1', tweet)
    # trim
    tweet = tweet.strip('\'"')
    return tweet


# end
# Read the tweets one by one and process it
fp = open('/home/ashish/PyCharm_proj/twitter_sentiment/data/sampleData.txt', 'r')

line = fp.readline()
print "Processed tweets\n"
while line:
    processedTweet = processTweet(line)
    print processedTweet
    line = fp.readline()
# end loop

#start getfeatureVector
def getFeatureVector(tweet):
    featureVector = []
    #split tweet into words
    words = tweet.split()
    for w in words:
        #replace two or more with two occurrences
        w = replaceTwoOrMore(w)
        #strip punctuation
        w = w.strip('\'"?,.')
        #check if the word stats with an alphabet
        val = re.search(r"^[a-zA-Z][a-zA-Z0-9]*$", w)
        #ignore if it is a stop word
        if(w in stopWords or val is None):
            continue
        else:
            featureVector.append(w.lower())
    return featureVector
#end

#fp.close()
# initialize stopWords
stopWords = []

inpTweets=fp
featureList=[]
#Read the tweets one by one and process it
tweets = []
for row in inpTweets:
    sentiment = row[0]
    tweet = row[1]
    processedTweet = processTweet(tweet)
    featureVector = getFeatureVector(processedTweet, stopWords)
    featureList.extend(featureVector)
    tweets.append((featureVector, sentiment));
#end loop

#start extract_features
def extract_features(tweet):
    tweet_words = set(tweet)
    features = {}
    for word in featureList:
        features['contains(%s)' % word] = (word in tweet_words)
    #print "Features are: "+features
    return features

#end

#print "Feature List is:"+"\n"+featureList

# Remove featureList duplicates
featureList = list(set(featureList))
training_set = nltk.classify.util.apply_features(extract_features, tweets)
# start replaceTwoOrMore
def replaceTwoOrMore(s):
    # look for 2 or more repetitions of character and replace with the character itself
    pattern = re.compile(r"(.)\1{1,}", re.DOTALL)
    return pattern.sub(r"\1\1", s)


# end

# start getStopWordList
def getStopWordList(stopWordListFileName):
    # read the stopwords file and build a list
    stopWords = []
    stopWords.append('AT_USER')
    stopWords.append('URL')

    fp = open(stopWordListFileName, 'r')
    line = fp.readline()
    while line:
        word = line.strip()
        stopWords.append(word)
        line = fp.readline()
    fp.close()
    return stopWords


# end

# start getfeatureVector
def getFeatureVector(tweet):
    featureVector = []
    # split tweet into words
    words = tweet.split()
    for w in words:
        # replace two or more with two occurrences
        w = replaceTwoOrMore(w)
        # strip punctuation
        w = w.strip('\'"?,.')
        # check if the word stats with an alphabet
        val = re.search(r"^[a-zA-Z][a-zA-Z0-9]*$", w)
        # ignore if it is a stop word
        if (w in stopWords or val is None):
            continue
        else:
            featureVector.append(w.lower())
    return featureVector


# Train the classifier
NBClassifier = nltk.NaiveBayesClassifier.train(training_set)

# Test the classifier
testTweet = 'Congrats @ashish, The classifier works'
processedTestTweet = processTweet(testTweet)
print NBClassifier.classify(extract_features(getFeatureVector(processedTestTweet)))


# end

# Read the tweets one by one and process it
fp = open('/home/ashish/PyCharm_proj/twitter_sentiment/data/sampleData.txt', 'r')

line = fp.readline()

stopWords = getStopWordList('/home/ashish/PyCharm_proj/twitter_sentiment/data/feature_list/stopwords.txt')
print "\n Feature vectors are:\n "
while line:
    processedTweet = processTweet(line)
    featureVector = getFeatureVector(processedTweet)
    print featureVector
    line = fp.readline()
# end loop
fp.close()

我怎么解决这个问题。
谢谢

最佳答案

您必须首先为训练数据创建字典格式。如果您查看 .train() 的文档你会发现很多细节。

关于python - 值错误 : A ELE probability distribution must have at least one bin，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/31937797/

文章推荐： angularjs - Angular http 拦截器

selenium - 操作类 - 单击(WebElement ele)函数不单击
我正在尝试使用 Actions 类的 click(WebElement) 方法点击 google 主页上的元素。代码运行成功，但没有触发点击事件。 package p1; import org.ope
javascript - Cytoscape.js eles.style 更改立即更新
我正在使用 Cytoscape.js 2.7.15 进行我的毕业项目，我需要进行一些简单的可视化，例如更改节点的标签。 subjectNode.style('label',myDesiredLabel
javascript - Jquery事件处理顺序: click on ele: itself or parent first?
我有以下点击事件处理程序: $('html').click(function() { do something}); $('#my_div').click(function() { do some
python - 使用 insort(lst,ele) 时函数不返回值
很难说出这里问的是什么。这个问题是含糊的、模糊的、不完整的、过于宽泛的或修辞性的，无法以目前的形式得到合理的回答。如需帮助澄清此问题以便重新打开它，visit the help center 。已关
javascript - 使用 eles.restore();恢复边缘不工作——我的代码出了什么问题？
我有两个断开连接的组件。其中之一是“控制面板”，其中的每个节点在单击时都会触发一个事件，根据边缘所具有的权重，从另一个节点中删除某些边缘。 cy.on('tap', 'node', function(
python - 值错误 : A ELE probability distribution must have at least one bin
我正在尝试使用朴素贝叶斯分类器对推文的情绪进行分类。所以当我运行下面的代码时，我收到了这个错误， ValueError:ELE 概率分布必须至少有一个 bin。代码如下 import re,nltk
c++ - 为什么不 `std::move(*const_cast(ele))`元素 `std::initializer_list`呢？
std::initializer_list分配一个临时数组T[]，其元素是使用list-initializer复制的。它是begin和end方法返回const T*。这样一来，您就无法移动元素，而又可
javascript - 如何将 HTML 元素 ID 传递给用 JavaScript 编写的函数？错误 'ele.offset is not a function' 是什么？
我用 JavaScript 编写了一个函数，如下所示: function scrollToElement(ele) { //alert(ele); $(window).scrollTop(ele
javascript - 在 Ionic 2 中点击 google map 时出现 "Uncaught TypeError: ele.hasAttribute is not a function"
我一直在想办法如何管理在我的应用程序中实现谷歌地图自动完成后遇到的这个错误。我的maps.ts 文件看起来像； import { Component, NgZone } from "@angular/

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 值错误 : A ELE probability distribution must have at least one bin