作者热门文章
- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我想对德国推文进行情感分析。我使用的代码适用于英语,但是当我加载德语单词列表时,所有分数都为零。据我猜测,它一定与单词列表的不同结构有关。所以我需要知道的是,如何使我的代码适应德语单词列表的结构。有人可以看看这两个列表吗?
English Wordlist
German Wordlist
# load the wordlists
pos.words = scan("~/positive-words.txt",what='character', comment.char=';')
neg.words = scan("~/negative-words.txt",what='character', comment.char=';')
# bring in the sentiment analysis algorithm
# we got a vector of sentences. plyr will handle a list or a vector as an "l"
# we want a simple array of scores back, so we use "l" + "a" + "ply" = laply:
score.sentiment = function(sentences, pos.words, neg.words, .progress='none')
{
require(plyr)
require(stringr)
scores = laply(sentences, function(sentence, pos.words, neg.words)
{
# clean up sentences with R's regex-driven global substitute, gsub():
sentence = gsub('[[:punct:]]', '', sentence)
sentence = gsub('[[:cntrl:]]', '', sentence)
sentence = gsub('\\d+', '', sentence)
# and convert to lower case:
sentence = tolower(sentence)
# split into words. str_split is in the stringr package
word.list = str_split(sentence, '\\s+')
# sometimes a list() is one level of hierarchy too much
words = unlist(word.list)
# compare our words to the dictionaries of positive & negative terms
pos.matches = match(words, pos.words)
neg.matches = match(words, neg.words)
# match() returns the position of the matched term or NA
# we just want a TRUE/FALSE:
pos.matches = !is.na(pos.matches)
neg.matches = !is.na(neg.matches)
# and conveniently enough, TRUE/FALSE will be treated as 1/0 by sum():
score = sum(pos.matches) - sum(neg.matches)
return(score)
},
pos.words, neg.words, .progress=.progress )
scores.df = data.frame(score=scores, text=sentences)
return(scores.df)
}
# and to see if it works, there should be a score...either in German or in English
sample = c("ich liebe dich. du bist wunderbar","I hate you. Die!");sample
test.sample = score.sentiment(sample, pos.words, neg.words);test.sample
最佳答案
这可能对您有用:
readAndflattenSentiWS <- function(filename) {
words = readLines(filename, encoding="UTF-8")
words <- sub("\\|[A-Z]+\t[0-9.-]+\t?", ",", words)
words <- unlist(strsplit(words, ","))
words <- tolower(words)
return(words)
}
pos.words <- c(scan("positive-words.txt",what='character', comment.char=';', quiet=T),
readAndflattenSentiWS("SentiWS_v1.8c_Positive.txt"))
neg.words <- c(scan("negative-words.txt",what='character', comment.char=';', quiet=T),
readAndflattenSentiWS("SentiWS_v1.8c_Negative.txt"))
score.sentiment = function(sentences, pos.words, neg.words, .progress='none') {
# ... see OP ...
}
sample <- c("ich liebe dich. du bist wunderbar",
"Ich hasse dich, geh sterben!",
"i love you. you are wonderful.",
"i hate you, die.")
(test.sample <- score.sentiment(sample,
pos.words,
neg.words))
# score text
# 1 2 ich liebe dich. du bist wunderbar
# 2 -2 ich hasse dich, geh sterben!
# 3 2 i love you. you are wonderful.
# 4 -2 i hate you, die.
关于r - Twitter 情绪分析 w R 使用德语语言集 SentiWS,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22116938/
我想对德国推文进行情感分析。我使用的代码适用于英语,但是当我加载德语单词列表时,所有分数都为零。据我猜测,它一定与单词列表的不同结构有关。所以我需要知道的是,如何使我的代码适应德语单词列表的结构。有人
我是一名优秀的程序员,十分优秀!