gpt4 book ai didi

python - 在 Python 中使用 Stanford Tregex

转载 作者:太空狗 更新时间:2023-10-30 02:26:59 27 4
gpt4 key购买 nike

我是 NLP 和 Python 的新手。我正在尝试使用 Tregex 工具和 Python 子进程库从 StanfordCoreNLP 的解析树中提取名词短语的子集。特别是,我正在尝试查找并提取与以下模式匹配的名词短语:'(NP[$VP]>S)|(NP[$VP]>S\n)|(NP\n[$VP] >S)|(NP\n[$VP]>S\n)' 在 Tregex 语法中。

例如下面是原文,保存在名为“text”的字符串中:

text = ('Pusheen and Smitha walked along the beach. "I want to surf", said Smitha, the CEO of Tesla. However, she fell off the surfboard')

在使用 Python 包装器运行 StanfordCoreNLP 解析器后,我得到了 3 个句子的以下 3 棵树:

output1['sentences'][0]['parse']

Out[58]: '(ROOT\n (S\n (NP (NNP Pusheen)\n (CC and)\n (NNP Smitha))\n (VP (VBD walked)\n (PP (IN along)\n (NP (DT the) (NN beach))))\n (. .)))'

output1['sentences'][1]['parse']

Out[59]: "(ROOT\n (SINV (`` ``)\n (S\n (NP (PRP I))\n (VP (VBP want)\n (PP (TO to)\n (NP (NN surf) ('' '')))))\n (, ,)\n (VP (VBD said))\n (NP\n (NP (NNP Smitha))\n (, ,)\n (NP\n (NP (DT the) (NNP CEO))\n (PP (IN of)\n (NP (NNP Tesla)))))\n (. .)))"

output1['sentences'][2]['parse']

Out[60]: '(ROOT\n (S\n (ADVP (RB However))\n (, ,)\n (NP (PRP she))\n (VP (VBD fell)\n (PRT (RP off))\n (NP (DT the) (NN surfboard)))))'

我想提取以下 3 个名词短语(每个句子一个)并将它们保存为 Python 中的变量(或标记列表):

  • (NP (NNP Pusheen)\n (CC 和)\n (NNP Smitha))
  • (NP (PRP I))
  • (NP(PRP 她))

为了您的信息,我使用了命令行中的 tregex,代码如下:

cd stanford-tregex-2016-10-31
java -cp 'stanford-tregex.jar:' edu.stanford.nlp.trees.tregex.TregexPattern -f -s '(NP[$VP]>S)|(NP[$VP]>S\n)|(NP\n[$VP]>S)|(NP\n[$VP]>S\n)' /Users/AS/stanford-tregex-2016-10-31/exampletree.txt

输出是:

Pattern string:
(NP[$VP]>S)|(NP[$VP]>S\n)|(NP\n[$VP]>S)|(NP\n[$VP]>S\n)
Parsed representation:
or
Root NP
and
$ VP
> S
Root NP
and
$ VP
> S\n
Root NP\n
and
$ VP
> S
Root NP\n
and
$ VP
> S\n
Reading trees from file(s) file path
\# /Users/AS/stanford-tregex-2016-10-31/exampletree.txt
(NP (NNP Pusheen) \n (CC and) \n (NNP Smitha))
\# /Users/AS/stanford-tregex-2016-10-31/exampletree.txt
(NP\n (NP (NNP Smitha)) \n (, ,) \n (NP\n (NP (DT the) (NN spokesperson)) \n (PP (IN of) \n (NP (DT the) (NNP CIA)))) \n (, ,))
\# /Users/AS/stanford-tregex-2016-10-31/exampletree.txt
(NP (PRP They))
There were 3 matches in total.

如何在 Python 中复制这个结果?

供您引用,我通过 Google 找到了以下帖子,该帖子与我的问题相关但已过时(https://mailman.stanford.edu/pipermail/parser-user/2010-July/000606.html):

[parser-user] Tregex 的变量输入

Christopher Manning manning at stanford.edu太平洋标准时间 2010 年 7 月 7 日星期三 17:41:32你好海阳,

不好意思回复慢了,学年末事情太忙了。

2010 年 6 月 1 日,晚上 8:56,Haiyang AI 写道:

Dear All,

I hope this is the right place to seek help.

是的,尽管我们只能在任何特定于 Python 的问题上提供非常有限的帮助......

但这似乎很简单(我认为)。

如果您希望模式在通过标准输入输入的树上运行,您需要在参数列表中的“NP”之前添加标志“-filter”。

如果在模式之后没有指定文件,并且没有给出标志“-filter”,那么它将在固定的默认句子上运行模式....

克里斯。

I'm working on a project related to Tregex. I'm trying to call Tregex from python, but I don't know how to feed data into Tregex, not from conventional file, but from a variable. For example, I'm trying to count the number of "NP" from a given variable (e.g. text, already parsed tree, using Stanford Parser), with the following code,

def tregex(text):
tregex_dir = "/root/nlp/stanford-tregex-2009-08-30/" op = Popen(["java", "-mx900m", "-cp", "stanford-tregex.jar:", "edu.stanford.nlp.trees.tregex.TregexPattern", "NP"], cwd = tregex_dir, stdout = PIPE, stdin = PIPE, stderr = STDOUT) res = op.communicate(input=text)[0] return res

The results are like the following. It didn't search the content from the variable, but somehow falling back to "using default tree". Can anyone give me a hand? I have been stuck here for quite a long time. Really appreciate your time and help. Pattern string: NP Parsed representation: Root NP using default tree (NP (NP (DT this) (NN wine)) (CC and) (NP (DT these) (NNS snails)))

(NP (DT this) (NN wine))

(NP (DT these) (NNS snails))

There were 3 matches in total.

-- Haiyang AI, Ph.D. student Department of Applied Linguistics The Pennsylvania State University


parser-user mailing list parser-user at lists.stanford.edu https://mailman.stanford.edu/mailman/listinfo/parser-user

最佳答案

为什么不使用 Stanford CoreNLP 服务器!

1.) 启动服务器!

java -Xmx4g edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 - timeout 15000

2.) 发出 python 请求!

import requests

url = "http://localhost:9000/tregex"
request_params = {"pattern": "(NP[$VP]>S)|(NP[$VP]>S\\n)|(NP\\n[$VP]>S)|(NP\\n[$VP]>S\\n)"}
text = "Pusheen and Smitha walked along the beach."
r = requests.post(url, data=text, params=request_params)
print r.json()

3.) 这是结果!

{u'sentences': [{u'0': {u'namedNodes': [], u'match': u'(NP (NNP Pusheen)\n  (CC and)\n  (NNP Smitha))\n'}}]}

关于python - 在 Python 中使用 Stanford Tregex,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42802406/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com