gpt4 book ai didi

python - 对来自不同用户的多个响应进行评分

转载 作者:塔克拉玛干 更新时间:2023-11-03 04:04:21 25 4
gpt4 key购买 nike

我想对不同用户输入的响应进行评分/评分。为此,我使用了 Multinomial navie bayes。下面是我的代码。

# use natural language toolkit
import nltk
from nltk.stem.lancaster import LancasterStemmer
import os
import json
import datetime
stemmer = LancasterStemmer()
# 3 classes of training data
training_data = []
# capture unique stemmed words in the training corpus
class_words={}
corpus_words = {}
classes = list(set([a['class'] for a in training_data]))
for c in classes:
class_words[c] = []

for data in training_data:
# tokenize each sentence into words
for word in nltk.word_tokenize(data['sentence']):
# ignore a few things
if word not in ["?", "'s"]:
# stem and lowercase each word
stemmed_word = stemmer.stem(word.lower())
if stemmed_word not in corpus_words:
corpus_words[stemmed_word] = 1
else:
corpus_words[stemmed_word] += 1

class_words[data['class']].extend([stemmed_word])

# we now have each word and the number of occurances of the word in our training corpus (the word's commonality)
print ("Corpus words and counts: %s" % corpus_words)
# also we have all words in each class
print ("Class words: %s" % class_words)
sentence="The biggest advantages to a JavaScript having a ability to support all modern browser and produce the same result."
def calculate_class_score(sentence, class_name):
score = 0
for word in nltk.word_tokenize(sentence):
if word in class_words[class_name]:
score += 1
return score
for c in class_words.keys():
print ("Class: %s Score: %s" % (c, calculate_class_score(sentence, c)))
# calculate a score for a given class taking into account word commonality
def calculate_class_score_commonality(sentence, class_name):
score = 0
for word in nltk.word_tokenize(sentence):
if word in class_words[class_name]:
score += (1 / corpus_words[word])
return score
# now we can find the class with the highest score
for c in class_words.keys():
print ("Class: %s Score: %s" % (c, calculate_class_score_commonality(sentence, c)))
def find_class(sentence):
high_class = None
high_score = 0
for c in class_words.keys():
score = calculate_class_score_commonality(sentence, c)
if score > high_score:
high_class = c
high_score = score
return high_class, high_score

注意:我没有添加任何训练数据。

当我将输入作为

find_class("the biggest advantages to a JavaScript having a ability to
support all modern browser and produce the same result.JavaScript
small bit of code you can test")

我得到的输出是

('Advantages', 5.07037037037037)

但是当我将输入作为

find_class("JavaScript can be executed within the user's browser
without having to communicate with the server, saving on bandwidth")

我得到的响应/输出为

('Advantages', 2.0454545)

我正在为 JavaScript 面试/viva 问题构建它。当用户以我上面提到的不同方式键入相同的答案时,我得到不同的分数。我希望分数准确。我该怎么做。

最佳答案

多项式朴素贝叶斯比较词的出现次数。它不考虑顺序,因为它认为每个特征都独立于其他特征。因此,语义相似性(不同的句子,相同的意思)并不总是一个容易用朴素贝叶斯解决的问题。

如果在你的情况下语义相似性与出现的单词有一些直接相关(在某种程度上可以忽略顺序),那么你可以尝试以下事情:

  1. 玩转数据。查看停止词删除或使用 TF-IDF 等技术的成果。
  2. 看看 Word2Vec(或 Doc2Vec)是否能让你得到更好的结果
  3. 使用更多训练数据

这些是我可以在不太了解您的数据外观的情况下提供的非常懒惰的建议。

关于python - 对来自不同用户的多个响应进行评分,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53083136/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com