作者热门文章
- c - 在位数组中找到第一个零
- linux - Unix 显示有关匹配两种模式之一的文件的信息
- 正则表达式替换多个文件
- linux - 隐藏来自 xtrace 的命令
我正在尝试进行照应解析,下面是我的代码。
首先,我导航到我下载 stanford 模块的文件夹。然后我在命令提示符下运行命令来初始化 stanford nlp 模块
java -mx4g -cp "*;stanford-corenlp-full-2017-06-09/*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000
然后我在 Python 中执行下面的代码
from pycorenlp import StanfordCoreNLP
nlp = StanfordCoreNLP('http://localhost:9000')
我想改变句子 Tom is a smart boy。他知道很多事情。
到 Tom 是个聪明的男孩。 Tom 知道很多事情。
并且在 Python 中没有可用的教程或任何帮助。
我所能做的就是在 Python 中通过以下代码进行注释
共指消解
output = nlp.annotate(sentence, properties={'annotators':'dcoref','outputFormat':'json','ner.useSUTime':'false'})
并通过解析 coref
coreferences = output['corefs']
我得到以下 JSON
coreferences
{u'1': [{u'animacy': u'ANIMATE',
u'endIndex': 2,
u'gender': u'MALE',
u'headIndex': 1,
u'id': 1,
u'isRepresentativeMention': True,
u'number': u'SINGULAR',
u'position': [1, 1],
u'sentNum': 1,
u'startIndex': 1,
u'text': u'Tom',
u'type': u'PROPER'},
{u'animacy': u'ANIMATE',
u'endIndex': 6,
u'gender': u'MALE',
u'headIndex': 5,
u'id': 2,
u'isRepresentativeMention': False,
u'number': u'SINGULAR',
u'position': [1, 2],
u'sentNum': 1,
u'startIndex': 3,
u'text': u'a smart boy',
u'type': u'NOMINAL'},
{u'animacy': u'ANIMATE',
u'endIndex': 2,
u'gender': u'MALE',
u'headIndex': 1,
u'id': 3,
u'isRepresentativeMention': False,
u'number': u'SINGULAR',
u'position': [2, 1],
u'sentNum': 2,
u'startIndex': 1,
u'text': u'He',
u'type': u'PRONOMINAL'}],
u'4': [{u'animacy': u'INANIMATE',
u'endIndex': 7,
u'gender': u'NEUTRAL',
u'headIndex': 4,
u'id': 4,
u'isRepresentativeMention': True,
u'number': u'SINGULAR',
u'position': [2, 2],
u'sentNum': 2,
u'startIndex': 3,
u'text': u'a lot of thing',
u'type': u'NOMINAL'}]}
有什么帮助吗?
最佳答案
这是一种可能的解决方案,它使用 CoreNLP 输出的数据结构。提供了所有信息。这并不是一个完整的解决方案,可能需要扩展来处理所有情况,但这是一个很好的起点。
from pycorenlp import StanfordCoreNLP
nlp = StanfordCoreNLP('http://localhost:9000')
def resolve(corenlp_output):
""" Transfer the word form of the antecedent to its associated pronominal anaphor(s) """
for coref in corenlp_output['corefs']:
mentions = corenlp_output['corefs'][coref]
antecedent = mentions[0] # the antecedent is the first mention in the coreference chain
for j in range(1, len(mentions)):
mention = mentions[j]
if mention['type'] == 'PRONOMINAL':
# get the attributes of the target mention in the corresponding sentence
target_sentence = mention['sentNum']
target_token = mention['startIndex'] - 1
# transfer the antecedent's word form to the appropriate token in the sentence
corenlp_output['sentences'][target_sentence - 1]['tokens'][target_token]['word'] = antecedent['text']
def print_resolved(corenlp_output):
""" Print the "resolved" output """
possessives = ['hers', 'his', 'their', 'theirs']
for sentence in corenlp_output['sentences']:
for token in sentence['tokens']:
output_word = token['word']
# check lemmas as well as tags for possessive pronouns in case of tagging errors
if token['lemma'] in possessives or token['pos'] == 'PRP$':
output_word += "'s" # add the possessive morpheme
output_word += token['after']
print(output_word, end='')
text = "Tom and Jane are good friends. They are cool. He knows a lot of things and so does she. His car is red, but " \
"hers is blue. It is older than hers. The big cat ate its dinner."
output = nlp.annotate(text, properties= {'annotators':'dcoref','outputFormat':'json','ner.useSUTime':'false'})
resolve(output)
print('Original:', text)
print('Resolved: ', end='')
print_resolved(output)
这给出了以下输出:
Original: Tom and Jane are good friends. They are cool. He knows a lot of things and so does she. His car is red, but hers is blue. It is older than hers. The big cat ate his dinner.
Resolved: Tom and Jane are good friends. Tom and Jane are cool. Tom knows a lot of things and so does Jane. Tom's car is red, but Jane's is blue. His car is older than Jane's. The big cat ate The big cat's dinner.
如您所见,当代词有句首(标题大写)先行词(“The big cat”而不是最后一句中的“the big cat”)时,此解决方案不处理纠正大小写.这取决于先行词的类别——普通名词先行词需要小写,而专有名词先行词则不需要。可能需要进行一些其他的临时处理(至于我测试句子中的所有格)。它还假定您不想重用原始输出标记,因为它们已被此代码修改。解决此问题的方法是复制原始数据结构或创建新属性并相应地更改 print_resolved
函数。纠正任何分辨率错误也是另一个挑战!
关于python - 使用 python 在 stanford-nlp 中的回指解析,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50004797/
我是一名优秀的程序员,十分优秀!