gpt4 book ai didi

java - 从句子生成 N 元语法

转载 作者:行者123 更新时间:2023-12-02 22:33:58 24 4
gpt4 key购买 nike

如何生成字符串的 n 元语法,例如:

String Input="This is my car."

我想用这个输入生成 n-gram:

Input Ngram size = 3

输出应该是:

This
is
my
car

This is
is my
my car

This is my
is my car

用 Java 提供一些想法,如何实现它或者是否有可用的库。

我正在尝试使用this NGramTokenizer但它给出了字符序列的 n 元语法,而我想要单词序列的 n 元语法。

最佳答案

我相信这会做你想要的:

import java.util.*;

public class Test {

public static List<String> ngrams(int n, String str) {
List<String> ngrams = new ArrayList<String>();
String[] words = str.split(" ");
for (int i = 0; i < words.length - n + 1; i++)
ngrams.add(concat(words, i, i+n));
return ngrams;
}

public static String concat(String[] words, int start, int end) {
StringBuilder sb = new StringBuilder();
for (int i = start; i < end; i++)
sb.append((i > start ? " " : "") + words[i]);
return sb.toString();
}

public static void main(String[] args) {
for (int n = 1; n <= 3; n++) {
for (String ngram : ngrams(n, "This is my car."))
System.out.println(ngram);
System.out.println();
}
}
}

输出:

This
is
my
car.

This is
is my
my car.

This is my
is my car.
<小时/>

作为迭代器实现的“按需”解决方案:

class NgramIterator implements Iterator<String> {

String[] words;
int pos = 0, n;

public NgramIterator(int n, String str) {
this.n = n;
words = str.split(" ");
}

public boolean hasNext() {
return pos < words.length - n + 1;
}

public String next() {
StringBuilder sb = new StringBuilder();
for (int i = pos; i < pos + n; i++)
sb.append((i > pos ? " " : "") + words[i]);
pos++;
return sb.toString();
}

public void remove() {
throw new UnsupportedOperationException();
}
}

关于java - 从句子生成 N 元语法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46316674/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com