gpt4 book ai didi

java - 安妮门,错误 : No sentences or tokens to process in document run sentence splitter and tokenizer first

转载 作者:塔克拉玛干 更新时间:2023-11-02 20:03:29 26 4
gpt4 key购买 nike

我有一个关于在 GATE 中使用 ANNIE 插件的 Gate API 的问题。我在 java 程序中使用了 GATE api,它适用于 50 多个文档。但是当我为超过 50 个文档运行它时,它给出了以下错误:

Exception in thread "main" gate.creole.ExecutionException: No sentences or tokens to process in document GATE Document_0003D
Please run a sentence splitter and tokeniser first!
at gate.creole.POSTagger.execute(POSTagger.java:257)
at gate.util.Benchmark.executeWithBenchmarking(Benchmark.java:291)
at gate.creole.ConditionalSerialController.runComponent(ConditionalSerialController.java:163)
at gate.creole.SerialController.executeImpl(SerialController.java:157)
at gate.creole.ConditionalSerialAnalyserController.executeImpl(ConditionalSerialAnalyserController.java:244)
at gate.creole.ConditionalSerialAnalyserController.execute(ConditionalSerialAnalyserController.java:139)

我尝试分别加载每个组件但它仍然给出相同的错误。我还尝试在处理过程中每 10 个文档后清理语料库,但仍然存在错误

代码是:

public class MyGate  {
private CorpusController annieController;
/**
* Initialise the ANNIE system. This creates a "corpus pipeline"
* application that can be used to run sets of documents through
* the extraction system.
*/
public void initAnnie() throws GateException, IOException {
Out.prln("Initialising ANNIE...");

// load the ANNIE application from the saved state in plugins/ANNIE
File pluginsHome = Gate.getPluginsHome();
File anniePlugin = new File(pluginsHome, "ANNIE");
File annieGapp = new File(anniePlugin, "ANNIE_with_defaults.gapp");
annieController =
(CorpusController) PersistenceManager.loadObjectFromFile(annieGapp);
Out.prln("...ANNIE loaded");
} // initAnnie()
public void cleanUp(){
Corpus corp= annieController.getCorpus();
if(!corp.isEmpty()){
for(int i=0;i<corp.size();i++){
Document doc1 = (Document)corp.remove(i);
corp.unloadDocument(doc1);
Factory.deleteResource(corp);
Factory.deleteResource(doc1);
}
}
}
/** Tell ANNIE's controller about the corpus you want to run on */
public void setCorpus(Corpus corpus) {
annieController.setCorpus(corpus);
} // setCorpus

/** Run ANNIE */
public void execute() throws GateException {

Out.prln("Running ANNIE...");

annieController.execute();
Out.prln("...ANNIE complete");
} // execute()


//////-------------------------------MAIN--------------------------------------///////
public static void main(String args[]) throws GateException, IOException {
ArrayList<CreateHashMap> train_data_list = new ArrayList<CreateHashMap>();

String workingDir = System.getProperty("user.dir");
System.out.println("Current working directory : " + workingDir);
String trainpath=workingDir+"/input/test.json/test.json";
/*********************************************/
try {
// read the json file
FileReader reader = new FileReader(trainpath);

JSONParser jsonParser = new JSONParser();


JSONArray a = (JSONArray) jsonParser.parse(new FileReader(trainpath));
int g=0;
for (Object o : a)
{
if(g<=100){
CreateHashMap new_hash_item =new CreateHashMap();
JSONObject person = (JSONObject) o;

String rid = (String) person.get("request_id");
System.out.println(rid);

double date=(Double) person.get("times_request");
java.util.Date time=new java.util.Date((long)date*1000);

int day=time.getDate();

new_hash_item.createList(rid,day);
train_data_list.add(new_hash_item);

}
g++;}

} catch (FileNotFoundException ex) {
ex.printStackTrace();
} catch (IOException ex) {
ex.printStackTrace();
} catch (ParseException ex) {
ex.printStackTrace();
} catch (NullPointerException ex) {
ex.printStackTrace();
}

/****************************************/



// initialise the GATE library
Out.prln("Initialising GATE...");
Gate.setGateHome(new File("C:/Program Files/GATE_Developer_8.0"));
Gate.init();
Out.prln("...GATE initialised");

// initialise ANNIE (this may take several minutes)
StandAloneAnnie annie = new StandAloneAnnie();
annie.initAnnie();

// create a GATE corpus and add a document for each command-line
// argument

Corpus corpus = Factory.newCorpus("StandAloneAnnie corpus");
String pathdoc=workingDir+"/input/test.json/";
SentenceSplitter sp= new SentenceSplitter();
int countdoc=0;
for(int i = 0; i < train_data_list.size()/*args.length*/; i++) {
Out.prln("here we go.............");
FeatureMap params = Factory.newFeatureMap();
String text=train_data_list.get(i).get_Request_text();
params.put(gate.Document.DOCUMENT_STRING_CONTENT_PARAMETER_NAME, text);
Document doc=(gate.Document)Factory.createResource("gate.corpora.DocumentImpl",params);

params.put("preserveOriginalContent", new Boolean(true));
params.put("collectRepositioningInfo", new Boolean(true));
corpus.add(doc);
countdoc++;

annie.setCorpus(corpus);
annie.execute();
if(countdoc==10)
{
corpus.cleanup();
System.out.println("...............cleanup....................");
}


} // for each of args


} // main


} // class MyGate

我在行中遇到错误:

annie.execute();

请帮助我。我无法弄清楚其中的问题。

最佳答案

通常这意味着“字符串文本”根本没有任何标记。只能有特殊字符或空格。打印出处理文件(或文件名)并验证它确实有一些合理的内容。

关于java - 安妮门,错误 : No sentences or tokens to process in document run sentence splitter and tokenizer first,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25341573/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com