gpt4 book ai didi

org.languagetool.tokenizers.WordTokenizer.tokenize()方法的使用及代码示例

转载 作者:知者 更新时间:2024-03-21 16:03:05 27 4
gpt4 key购买 nike

本文整理了Java中org.languagetool.tokenizers.WordTokenizer.tokenize()方法的一些代码示例,展示了WordTokenizer.tokenize()的具体用法。这些代码示例主要来源于Github/Stackoverflow/Maven等平台,是从一些精选项目中提取出来的代码,具有较强的参考意义,能在一定程度帮忙到你。WordTokenizer.tokenize()方法的具体详情如下:
包路径:org.languagetool.tokenizers.WordTokenizer
类名称:WordTokenizer
方法名:tokenize

WordTokenizer.tokenize介绍

暂无

代码示例

代码示例来源:origin: languagetool-org/languagetool

private String tokenize(String text) {
 List<String> tokens = wordTokenizer.tokenize(text);
 return String.join("|", tokens);
}

代码示例来源:origin: languagetool-org/languagetool

@Override
 public List<String> tokenize(String text) {
  List<String> tokens = super.tokenize(text);
  String prev = null;
  Stack<String> l = new Stack<>();
  for (String token : tokens) {
   if ("'".equals(prev)) {
    if (token.equals("t")) {
     l.pop();
     l.push("'t");
    } else {
     l.push(token);
    }
   } else {
    l.push(token);
   }
   prev = token;
  }
  return l;
 }
}

代码示例来源:origin: languagetool-org/languagetool

@Test
public void testTokenize() {
 WordTokenizer wordTokenizer = new WordTokenizer();
 List <String> tokens = wordTokenizer.tokenize("This is\u00A0a test");
 assertEquals(tokens.size(), 7);
 assertEquals("[This,  , is, \u00A0, a,  , test]", tokens.toString());
 tokens = wordTokenizer.tokenize("This\rbreaks");
 assertEquals(3, tokens.size());
 assertEquals("[This, \r, breaks]", tokens.toString());
 tokens = wordTokenizer.tokenize("dev.all@languagetool.org");
 assertEquals(1, tokens.size());
 tokens = wordTokenizer.tokenize("dev.all@languagetool.org.");
 assertEquals(2, tokens.size());
 tokens = wordTokenizer.tokenize("dev.all@languagetool.org:");
 assertEquals(2, tokens.size());
 tokens = wordTokenizer.tokenize("Schreiben Sie Hr. Meier (meier@mail.com).");
 assertEquals(tokens.size(), 13);
 tokens = wordTokenizer.tokenize("Get more at languagetool.org/foo, and via twitter");
 assertEquals(14, tokens.size());
 assertTrue(tokens.contains("languagetool.org/foo"));
 tokens = wordTokenizer.tokenize("Get more at sub.languagetool.org/foo, and via twitter");
 assertEquals(14, tokens.size());
 assertTrue(tokens.contains("sub.languagetool.org/foo"));
}

代码示例来源:origin: languagetool-org/languagetool

String content = StringTools.readStream(fis, "UTF-8");
WordTokenizer wordTokenizer = new WordTokenizer();
List<String> words = wordTokenizer.tokenize(content);
String prevPrevWord = null;
String prevWord = null;

代码示例来源:origin: org.languagetool/language-eo

"(?<!')\\b([a-zA-ZĉĝĥĵŝŭĈĜĤĴŜŬ]+)'(?=[a-zA-ZĉĝĥĵŝŭĈĜĤĴŜŬ-])",
    "$1\u0001\u0001EO@APOS2\u0001\u0001 ");
List<String> tokenList = super.tokenize(replaced);
List<String> tokens = new ArrayList<>();

代码示例来源:origin: stackoverflow.com

System.out.println("load time: " + (System.currentTimeMillis() - time) + " ms");
String[] words = tokenizer.tokenize("弹道导弹");
print(words);
assertEquals(1, words.length);
words = tokenizer.tokenize("美国人的文化.dog");
print(words);
assertEquals(3, words.length);
words = tokenizer.tokenize("我是美国人");
print(words);
assertEquals(3, words.length);
words = tokenizer.tokenize("政府依照法律行使执法权,如果超出法律赋予的权限范围,就是“滥用职权”;如果没有完全行使执法权,就是“不作为”。两者都是政府的错误。");
print(words);
words = tokenizer.tokenize("国家都有自己的政府。政府是税收的主体,可以实现福利的合理利用。");
print(words);

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com