gpt4 book ai didi

edu.stanford.nlp.process.WordToSentenceProcessor.wordsToSentences()方法的使用及代码示例

转载 作者:知者 更新时间:2024-03-23 23:49:05 27 4
gpt4 key购买 nike

本文整理了Java中edu.stanford.nlp.process.WordToSentenceProcessor.wordsToSentences()方法的一些代码示例,展示了WordToSentenceProcessor.wordsToSentences()的具体用法。这些代码示例主要来源于Github/Stackoverflow/Maven等平台,是从一些精选项目中提取出来的代码,具有较强的参考意义,能在一定程度帮忙到你。WordToSentenceProcessor.wordsToSentences()方法的具体详情如下:
包路径:edu.stanford.nlp.process.WordToSentenceProcessor
类名称:WordToSentenceProcessor
方法名:wordsToSentences

WordToSentenceProcessor.wordsToSentences介绍

[英]Returns a List of Lists where each element is built from a run of Words in the input Document. Specifically, reads through each word in the input document and breaks off a sentence after finding a valid sentence boundary token or end of file. Note that for this to work, the words in the input document must have been tokenized with a tokenizer that makes sentence boundary tokens their own tokens (e.g., PTBTokenizer).
[中]返回一个列表,其中每个元素都是从输入文档中的一系列单词生成的。具体来说,阅读输入文档中的每个单词,并在找到有效的句子边界标记或文件结尾后断开一个句子。请注意,为了实现这一点,输入文档中的单词必须使用标记器进行标记,该标记器使句子边界标记成为自己的标记(例如,PTBTokenizer)。

代码示例

代码示例来源:origin: stanfordnlp/CoreNLP

/**
 * Returns a List of Lists where each element is built from a run
 * of Words in the input Document. Specifically, reads through each word in
 * the input document and breaks off a sentence after finding a valid
 * sentence boundary token or end of file.
 * Note that for this to work, the words in the
 * input document must have been tokenized with a tokenizer that makes
 * sentence boundary tokens their own tokens (e.g., {@link PTBTokenizer}).
 *
 * @param words A list of already tokenized words (must implement HasWord or be a String).
 * @return A list of sentences.
 * @see #WordToSentenceProcessor(String, String, Set, Set, String, NewlineIsSentenceBreak, SequencePattern, Set, boolean, boolean)
 */
// todo [cdm 2016]: Should really sort out generics here so don't need to have extra list copying
@Override
public List<List<IN>> process(List<? extends IN> words) {
 if (isOneSentence) {
  // put all the words in one sentence
  List<List<IN>> sentences = Generics.newArrayList();
  sentences.add(new ArrayList<>(words));
  return sentences;
 } else {
  return wordsToSentences(words);
 }
}

代码示例来源:origin: edu.stanford.nlp/corenlp

public List<List<IN>> process(List<? extends IN> words) {
 if (isOneSentence) {
  List<List<IN>> sentences = Generics.newArrayList();
  sentences.add(new ArrayList<IN>(words));
  return sentences;
 } else {
  return wordsToSentences(words);
 }
}

代码示例来源:origin: com.guokr/stan-cn-com

@Override
public List<List<IN>> process(List<? extends IN> words) {
 if (isOneSentence) {
  // put all the words in one sentence
  List<List<IN>> sentences = Generics.newArrayList();
  sentences.add(new ArrayList<IN>(words));
  return sentences;
 } else {
  return wordsToSentences(words);
 }
}

代码示例来源:origin: edu.stanford.nlp/stanford-parser

/**
 * Returns a List of Lists where each element is built from a run
 * of Words in the input Document. Specifically, reads through each word in
 * the input document and breaks off a sentence after finding a valid
 * sentence boundary token or end of file.
 * Note that for this to work, the words in the
 * input document must have been tokenized with a tokenizer that makes
 * sentence boundary tokens their own tokens (e.g., {@link PTBTokenizer}).
 *
 * @param words A list of already tokenized words (must implement HasWord or be a String).
 * @return A list of sentences.
 * @see #WordToSentenceProcessor(String, String, Set, Set, String, NewlineIsSentenceBreak, SequencePattern, Set, boolean, boolean)
 */
// todo [cdm 2016]: Should really sort out generics here so don't need to have extra list copying
@Override
public List<List<IN>> process(List<? extends IN> words) {
 if (isOneSentence) {
  // put all the words in one sentence
  List<List<IN>> sentences = Generics.newArrayList();
  sentences.add(new ArrayList<>(words));
  return sentences;
 } else {
  return wordsToSentences(words);
 }
}

代码示例来源:origin: edu.stanford.nlp/stanford-corenlp

/**
 * Returns a List of Lists where each element is built from a run
 * of Words in the input Document. Specifically, reads through each word in
 * the input document and breaks off a sentence after finding a valid
 * sentence boundary token or end of file.
 * Note that for this to work, the words in the
 * input document must have been tokenized with a tokenizer that makes
 * sentence boundary tokens their own tokens (e.g., {@link PTBTokenizer}).
 *
 * @param words A list of already tokenized words (must implement HasWord or be a String).
 * @return A list of sentences.
 * @see #WordToSentenceProcessor(String, String, Set, Set, String, NewlineIsSentenceBreak, SequencePattern, Set, boolean, boolean)
 */
// todo [cdm 2016]: Should really sort out generics here so don't need to have extra list copying
@Override
public List<List<IN>> process(List<? extends IN> words) {
 if (isOneSentence) {
  // put all the words in one sentence
  List<List<IN>> sentences = Generics.newArrayList();
  sentences.add(new ArrayList<>(words));
  return sentences;
 } else {
  return wordsToSentences(words);
 }
}

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com