gpt4 book ai didi

Java 8 句子流

转载 作者:行者123 更新时间:2023-12-02 16:20:08 25 4
gpt4 key购买 nike

我想使用 Java 8 流来获取字符串流(例如从纯文本文件中读取)并生成句子流。我假设句子可以跨越界限。

例如,我想从:

"This is the", "first sentence.  This is the", "second sentence."

至:

"This is the first sentence.", "This is the second sentence."

我可以看到可以得到句子部分的流,如下所示:

Pattern p = Pattern.compile("\\.");
Stream<String> lines
= Stream.of("This is the", "first sentence. This is the", "second sentence.");

Stream<String> result = lines.flatMap(s -> p.splitAsStream(s));

但是我不确定如何生成一个流来将片段连接成句子。我想以一种惰性的方式执行此操作,以便仅读取原始流中需要的内容。有什么想法吗?

最佳答案

将文本分解成句子并不像仅仅寻找点那​​么容易。例如,您不想在“史密斯先生”之间分开......

幸运的是,已经有一个 JRE 类来处理这个问题,即 BreakIterator 。它没有 Stream 支持,因此为了将它与流一起使用,需要一些支持代码:

public class SentenceStream extends Spliterators.AbstractSpliterator<String>
implements Consumer<CharSequence> {

public static Stream<String> sentences(Stream<? extends CharSequence> s) {
return StreamSupport.stream(new SentenceStream(s.spliterator()), false);
}
Spliterator<? extends CharSequence> source;
CharBuffer buffer;
BreakIterator iterator;

public SentenceStream(Spliterator<? extends CharSequence> source) {
super(Long.MAX_VALUE, ORDERED|NONNULL);
this.source = source;
iterator=BreakIterator.getSentenceInstance(Locale.ENGLISH);
buffer=CharBuffer.allocate(100);
buffer.flip();
}

@Override
public boolean tryAdvance(Consumer<? super String> action) {
for(;;) {
int next=iterator.next();
if(next!=BreakIterator.DONE && next!=buffer.limit()) {
action.accept(buffer.subSequence(0, next-buffer.position()).toString());
buffer.position(next);
return true;
}
if(!source.tryAdvance(this)) {
if(buffer.hasRemaining()) {
action.accept(buffer.toString());
buffer.position(0).limit(0);
return true;
}
return false;
}
iterator.setText(buffer.toString());
}
}

@Override
public void accept(CharSequence t) {
buffer.compact();
if(buffer.remaining()<t.length()) {
CharBuffer bigger=CharBuffer.allocate(
Math.max(buffer.capacity()*2, buffer.position()+t.length()));
buffer.flip();
bigger.put(buffer);
buffer=bigger;
}
buffer.append(t).flip();
}
}

有了这个支持类,您可以简单地说,例如:

Stream<String> lines = Stream.of(
"This is the ", "first sentence. This is the ", "second sentence.");
sentences(lines).forEachOrdered(System.out::println);

关于Java 8 句子流,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31148693/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com