gpt4 book ai didi

python - 使用 Stanford NLP(StanfordNERTagger 和 StanfordPOSTagger)为西类牙语设置 NLTK

转载 作者:太空狗 更新时间:2023-10-30 01:48:10 24 4
gpt4 key购买 nike

NLTK 文档在这种集成方面相当差。我的步骤followed是:

然后在 ipython 控制台中:

在[11]中:导入nltk

In [12]: nltk.__version__
Out[12]: '3.1'

In [13]: from nltk.tag import StanfordNERTagger

然后

st = StanfordNERTagger('/home/me/stanford/stanford-postagger-full-2015-04-20.zip', '/home/me/stanford/stanford-spanish-corenlp-2015-01-08-models.jar')

但是当我尝试运行它时:

st.tag('Adolfo se la pasa corriendo'.split())
Error: no se ha encontrado o cargado la clase principal edu.stanford.nlp.ie.crf.CRFClassifier

---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-14-0c1a96b480a6> in <module>()
----> 1 st.tag('Adolfo se la pasa corriendo'.split())

/home/nanounanue/.pyenv/versions/3.4.3/lib/python3.4/site-packages/nltk/tag/stanford.py in tag(self, tokens)
64 def tag(self, tokens):
65 # This function should return list of tuple rather than list of list
---> 66 return sum(self.tag_sents([tokens]), [])
67
68 def tag_sents(self, sentences):

/home/nanounanue/.pyenv/versions/3.4.3/lib/python3.4/site-packages/nltk/tag/stanford.py in tag_sents(self, sentences)
87 # Run the tagger and get the output
88 stanpos_output, _stderr = java(cmd, classpath=self._stanford_jar,
---> 89 stdout=PIPE, stderr=PIPE)
90 stanpos_output = stanpos_output.decode(encoding)
91

/home/nanounanue/.pyenv/versions/3.4.3/lib/python3.4/site-packages/nltk/__init__.py in java(cmd, classpath, stdin, stdout, stderr, blocking)
132 if p.returncode != 0:
133 print(_decode_stdoutdata(stderr))
--> 134 raise OSError('Java command failed : ' + str(cmd))
135
136 return (stdout, stderr)

OSError: Java command failed : ['/usr/bin/java', '-mx1000m', '-cp', '/home/nanounanue/Descargas/stanford-spanish-corenlp-2015-01-08-models.jar', 'edu.stanford.nlp.ie.crf.CRFClassifier', '-loadClassifier', '/home/nanounanue/Descargas/stanford-postagger-full-2015-04-20.zip', '-textFile', '/tmp/tmp6y169div', '-outputFormat', 'slashTags', '-tokenizerFactory', 'edu.stanford.nlp.process.WhitespaceTokenizer', '-tokenizerOptions', '"tokenizeNLs=false"', '-encoding', 'utf8']

同样的事情发生在 StandfordPOSTagger

注意:我需要这将是西类牙语版本。注意:我在python 3.4.3

中运行它

最佳答案

尝试:

# StanfordPOSTagger
from nltk.tag.stanford import StanfordPOSTagger
stanford_dir = '/home/me/stanford/stanford-postagger-full-2015-04-20/'
modelfile = stanford_dir + 'models/english-bidirectional-distsim.tagger'
jarfile = stanford_dir + 'stanford-postagger.jar'

st = StanfordPOSTagger(model_filename=modelfile, path_to_jar=jarfile)


# NERTagger
stanford_dir = '/home/me/stanford/stanford-ner-2015-04-20/'
jarfile = stanford_dir + 'stanford-ner.jar'
modelfile = stanford_dir + 'classifiers/english.all.3class.distsim.crf.ser.gz'

st = StanfordNERTagger(model_filename=modelfile, path_to_jar=jarfile)

有关使用斯坦福工具的 NLTK API 的详细信息,请查看:https://github.com/nltk/nltk/wiki/Installing-Third-Party-Software#stanford-tagger-ner-tokenizer-and-parser

注意:NLTK API 适用于各个 Stanford 工具,如果您使用 Stanford Core NLP,最好遵循 http://www.eecs.qmul.ac.uk/~dm303/stanford-dependency-parser-nltk-and-anaconda.html 上的@dimazest 说明|


已编辑

至于西类牙语 NER 标记,我强烈建议您使用 Stanford Core NLP ( http://nlp.stanford.edu/software/corenlp.shtml ) 而不是使用 Stanford NER 包 ( http://nlp.stanford.edu/software/CRF-NER.shtml )。并按照@dimazest 解决方案读取 JSON 文件。

或者,如果您必须使用 NER 包,您可以尝试按照 https://github.com/alvations/nltk_cli 中的说明进行操作(免责声明:这个 repo 不隶属于 NLTK 正式)。在 unix 命令行上执行以下操作:

cd $HOME
wget http://nlp.stanford.edu/software/stanford-spanish-corenlp-2015-01-08-models.jar
unzip stanford-spanish-corenlp-2015-01-08-models.jar -d stanford-spanish
cp stanford-spanish/edu/stanford/nlp/models/ner/* /home/me/stanford/stanford-ner-2015-04-20/ner/classifiers/

然后在 python 中:

# NERTagger
stanford_dir = '/home/me/stanford/stanford-ner-2015-04-20/'
jarfile = stanford_dir + 'stanford-ner.jar'
modelfile = stanford_dir + 'classifiers/spanish.ancora.distsim.s512.crf.ser.gz'

st = StanfordNERTagger(model_filename=modelfile, path_to_jar=jarfile)

关于python - 使用 Stanford NLP(StanfordNERTagger 和 StanfordPOSTagger)为西类牙语设置 NLTK,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34037094/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com