gpt4 book ai didi

java - Stanford POS Tagger,Java堆空间内存

转载 作者:太空狗 更新时间:2023-10-30 01:22:03 27 4
gpt4 key购买 nike

我在 Ubuntu 14.04 上使用 Python3,并在 67 篇原始文本文章的语料库上运行 Stanford POSTagger,经过编辑的 python 脚本如下:

from nltk.tag.stanford import POSTagger

with open('the_file.txt','r') as file:
G=file.readlines()

stan=[]

english_postagger = POSTagger('models/english-bidirectional-distsim.tagger', 'stanford-postagger.jar')

for line in g:
stan.append(english_postagger.tag(tokenize_fast(line)))

经过多次迭代后,出现以下错误:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at edu.stanford.nlp.sequences.ExactBestSequenceFinder.bestSequence(ExactBestSequenceFinder.java:109)
at edu.stanford.nlp.sequences.ExactBestSequenceFinder.bestSequence(ExactBestSequenceFinder.java:31)
at edu.stanford.nlp.tagger.maxent.TestSentence.runTagInference(TestSentence.java:322)
at edu.stanford.nlp.tagger.maxent.TestSentence.testTagInference(TestSentence.java:312)
at edu.stanford.nlp.tagger.maxent.TestSentence.tagSentence(TestSentence.java:135)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.tagSentence(MaxentTagger.java:998)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.tagCoreLabelsOrHasWords(MaxentTagger.java:1788)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.tagAndOutputSentence(MaxentTagger.java:1798)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.runTagger(MaxentTagger.java:1709)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.runTagger(MaxentTagger.java:1770)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.runTagger(MaxentTagger.java:1543)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.runTagger(MaxentTagger.java:1499)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.main(MaxentTagger.java:1842)

我还从命令行运行了 stanford postagger:

java -mx300m -classpath stanford-postagger.jar   edu.stanford.nlp.tagger.maxent.MaxentTagger -model models/wsj-0-18-bidirectional-distsim.tagger -textFile sample-input.txt > sample-tagged.txt

有类似的错误。我什至通过了 Java 2 GB 内存,但仍然没有运气。

非常欢迎任何想法/想法或 hacky 类型的解决方案!

很好地发现@nsanglar,所以我尝试了:

java -Xmx2g -classpath stanford-postagger.jar   edu.stanford.nlp.tagger.maxent.MaxentTagger -model models/wsj-0-18-bidirectional-distsim.tagger -textFile raw_text.txt > sample-tagged.txt

我收到一条错误日志消息,标题如下:

# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 283639808 bytes for committing reserved memory.
# Possible reasons:
# The system is out of physical RAM or swap space
# In 32 bit mode, the process size limit was hit
# Possible solutions:
# Reduce memory load on the system
# Increase physical memory or swap space
# Check if swap backing store is full
# Use 64 bit Java on a 64 bit OS
# Decrease Java heap size (-Xmx/-Xms)
# Decrease number of Java threads
# Decrease Java thread stack sizes (-Xss)
# Set larger code cache with -XX:ReservedCodeCacheSize=
# This output file may be truncated or incomplete.

# Out of Memory Error (os_linux.cpp:2798), pid=25677, tid=140571167794944

# JRE version: OpenJDK Runtime Environment (7.0_65-b32) (build 1.7.0_65-b32)
# Java VM: OpenJDK 64-Bit Server VM (24.65-b04 mixed mode linux-amd64 compressed oops)
# Derivative: IcedTea 2.5.2
# Distribution: Ubuntu 14.04 LTS, package 7u65-2.5.2-3~14.04
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again

最佳答案

好吧,原来是 RAM 问题,我只是没有足够的内存来执行命令。在服务器上运行标记器就可以了。

关于java - Stanford POS Tagger,Java堆空间内存,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26463783/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com