gpt4 book ai didi

java - 集成对 jar 文件的调用 |切| awk 和一个 java 程序成为一个统一的进程

转载 作者:行者123 更新时间:2023-12-01 11:59:04 26 4
gpt4 key购买 nike

我目前正在执行一个相当复杂的数据预处理操作,这是:

cat large_file.txt \ | ./reverb -q | cut --fields=16,17,18 | awk -F\\t -vq="'" 'function quote(token) { gsub(q, "\\"q, token); return q token q } { print quote($2) "(" quote($3) ", " quote($1) ")." }' >> output.txt

正如你所看到的,这相当复杂,先是猫,然后是那个。/reverb ,然后是 cut,最后是 awk。

接下来我想将输出传递给java程序,即:

public static void main(String[] args) throws IOException 
{
Ontology ontology = new Ontology();
BufferedReader br = new BufferedReader(new FileReader("/home/matthias/Workbench/SUTD/2_January/Prolog/horn_data_test.pl"));
Pattern p = Pattern.compile("'(.*?)'\\('(.*?)','(.*?)'\\)\\.");
String line;
while ((line = br.readLine()) != null)
{
Matcher m = p.matcher(line);
if( m.matches() )
{
String verb = m.group(1);
String object = m.group(2);
String subject = m.group(3);
ontology.addSentence( new Sentence( verb, object, subject ) );
}
}

for( String joint: ontology.getJoints() )
{
for( Integer subind: ontology.getSubjectIndices( joint ) )
{
Sentence xaS = ontology.getSentence( subind );
for( Integer obind: ontology.getObjectIndices( joint ) )
{
Sentence yOb = ontology.getSentence( obind );
Sentence s = new Sentence( xaS.getVerb(),
xaS.getObject(),
yOb.getSubject() );
System.out.println( s );
}
}
}
}

将这一过程综合为一个连贯操作的最佳方法是什么?理想情况下,我只想指定输入文件和输出文件并运行一次。就目前的情况来看,整个过程相当困惑。

也许我可以将所有这些调用放入 bash 脚本中?这可行吗?

输入最初包含英语句子,每行一个,这是:

Oranges are delicious and contain vitamin c.
Brilliant scientists learned that we can prevent scurvy by imbibing vitamin c.
Colorless green ideas sleep furiously.
...

预处理使其看起来像这样:

'contain'('vitamin c','oranges').
'prevent'('scurvy','vitamin c').
'sleep'('furiously','ideas').
...

java程序是通过推理来学习“规则”的,所以如果处理后的数据产生'contain'('vitamin c','oranges'). & 'prevent'( 'scurvy','vitamin c'). 然后 java 代码将发出 'prevent'('scurvy','oranges').

最佳答案

我查看了混响的源代码,我认为很容易对其进行调整以产生您想要的输出。如果你看一下混响类CommandLineReverb.java,它有以下两个方法:

private void extractFromSentReader(ChunkedSentenceReader reader)
throws ExtractorException {
long start;

ChunkedSentenceIterator sentenceIt = reader.iterator();

while (sentenceIt.hasNext()) {
// get the next chunked sentence
ChunkedSentence sent = sentenceIt.next();
chunkTime += sentenceIt.getLastComputeTime();

numSents++;

// make the extractions
start = System.nanoTime();
Iterable<ChunkedBinaryExtraction> extractions = extractor
.extract(sent);
extractTime += System.nanoTime() - start;

for (ChunkedBinaryExtraction extr : extractions) {
numExtrs++;

// run the confidence function
start = System.nanoTime();
double conf = getConf(extr);
confTime += System.nanoTime() - start;

NormalizedBinaryExtraction extrNorm = normalizer
.normalize(extr);
printExtr(extrNorm, conf);
}
if (numSents % messageEvery == 0)
summary();
}
}

private void printExtr(NormalizedBinaryExtraction extr, double conf) {
String arg1 = extr.getArgument1().toString();
String rel = extr.getRelation().toString();
String arg2 = extr.getArgument2().toString();

ChunkedSentence sent = extr.getSentence();
String toks = sent.getTokensAsString();
String pos = sent.getPosTagsAsString();
String chunks = sent.getChunkTagsAsString();
String arg1Norm = extr.getArgument1Norm().toString();
String relNorm = extr.getRelationNorm().toString();
String arg2Norm = extr.getArgument2Norm().toString();

Range arg1Range = extr.getArgument1().getRange();
Range relRange = extr.getRelation().getRange();
Range arg2Range = extr.getArgument2().getRange();
String a1s = String.valueOf(arg1Range.getStart());
String a1e = String.valueOf(arg1Range.getEnd());
String rs = String.valueOf(relRange.getStart());
String re = String.valueOf(relRange.getEnd());
String a2s = String.valueOf(arg2Range.getStart());
String a2e = String.valueOf(arg2Range.getEnd());

String row = Joiner.on("\t").join(
new String[] { currentFile, String.valueOf(numSents), arg1,
rel, arg2, a1s, a1e, rs, re, a2s, a2e,
String.valueOf(conf), toks, pos, chunks, arg1Norm,
relNorm, arg2Norm });

System.out.println(row);
}

第一个方法按句子调用并进行提取。然后它调用第二个方法将制表符分隔的值打印到输出流。我想您所要做的就是实现您自己的第二个方法“printExtr()”版本。

关于java - 集成对 jar 文件的调用 |切| awk 和一个 java 程序成为一个统一的进程,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28123935/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com