作者热门文章
- android - RelativeLayout 背景可绘制重叠内容
- android - 如何链接 cpufeatures lib 以获取 native android 库?
- java - OnItemClickListener 不起作用,但 OnLongItemClickListener 在自定义 ListView 中起作用
- java - Android 文件转字符串
我想在 Hadoop 上为意大利语实现一个自然语言处理算法
我有两个问题;
这是我的代码
String pathSent=...tagged sentences...;
String pathChunk=....chunked train path....;
File fileSent=new File(pathSent);
File fileChunk=new File(pathChunk);
InputStream inSent=null;
InputStream inChunk=null;
inSent = new FileInputStream(fileSent);
inChunk = new FileInputStream(fileChunk);
POSModel posModel=POSTaggerME.train("it", new WordTagSampleStream((
new InputStreamReader(inSent))), ModelType.MAXENT, null, null, 3, 3);
ObjectStream stringStream =new PlainTextByLineStream(new InputStreamReader(inChunk));
ObjectStream chunkStream = new ChunkSampleStream(stringStream);
ChunkerModel chunkModel=ChunkerME.train("it",chunkStream ,1, 1);
this.tagger= new POSTaggerME(posModel);
this.chunker=new ChunkerME(chunkModel);
inSent.close();
inChunk.close();
最佳答案
你需要一个语法句子引擎:
"io voglio andare a casa"
io, sostantivo
volere, verbo
andare, verbo
a, preposizione semplice
casa, oggetto
标记句子后,您就可以教授 OpenNLP。
在 Hadoop 上创建自定义 map
public class Map extends Mapper<longwritable,
intwritable="" text,=""> {
private final static IntWritable one =
new IntWritable(1);
private Text word = new Text();
@Override public void map(LongWritable key, Text value,
Context context)
throws IOException, InterruptedException {
//your code here
}
}
在 Hadoop 上创建自定义 reduce
public class Reduce extends Reducer<text,
intwritable,="" intwritable="" text,=""> {
@Override
protected void reduce(
Text key,
java.lang.Iterable<intwritable> values,
org.apache.hadoop.mapreduce.Reducer<text,
intwritable,="" intwritable="" text,="">.Context context)
throws IOException, InterruptedException {
// your reduce here
}
}
同时配置
public static void main(String[] args)
throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "opennlp");
job.setJarByClass(CustomOpenNLP.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
关于java - 如何在 Hadoop 上的 OpenNLP 中训练意大利语模型?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30535885/
我是一名优秀的程序员,十分优秀!