edu.stanford.nlp.process.WordToSentenceProcessor.<init>()方法的使用及代码示例-6ren

edu.stanford.nlp.process.WordToSentenceProcessor.()方法的使用及代码示例

转载作者：知者更新时间：2024-03-24 00:03:05

本文整理了Java中edu.stanford.nlp.process.WordToSentenceProcessor.<init>()方法的一些代码示例，展示了WordToSentenceProcessor.<init>()的具体用法。这些代码示例主要来源于Github/Stackoverflow/Maven等平台，是从一些精选项目中提取出来的代码，具有较强的参考意义，能在一定程度帮忙到你。WordToSentenceProcessor.<init>()方法的具体详情如下：
包路径：edu.stanford.nlp.process.WordToSentenceProcessor
类名称：WordToSentenceProcessor
方法名：<init>

WordToSentenceProcessor.<init>介绍

[英]Create a WordToSentenceProcessor using a sensible default list of tokens for sentence ending for English/Latin writing systems. The default set is: {".","?","!"} and any combination of ! or ?, as in !!!?!?!?!!!?!!?!!!. A sequence of two or more consecutive line breaks is taken as a paragraph break which also splits sentences. This is the usual constructor for sentence breaking reasonable text, which uses hard-line breaking, so two blank lines indicate a paragraph break. People commonly use this constructor.
[中]为英语/拉丁语书写系统的句子结尾创建一个WordToSentenceProcessor，使用一个合理的默认标记列表。默认设置为：{”，"?","!"} 以及任何组合！或就像！！！？！？！？！！！？！！？！！！。两个或两个以上连续换行符的序列被视为一个段落换行符，也可以拆分句子。这是用于合理文本断句的常用构造器，它使用硬换行，因此两个空行表示段落中断。人们通常使用这个构造函数。

代码示例

代码示例来源：origin: stanfordnlp/CoreNLP

public WordsToSentencesAnnotator(boolean verbose, String boundaryTokenRegex,
                 Set<String> boundaryToDiscard, Set<String> htmlElementsToDiscard,
                 String newlineIsSentenceBreak, String boundaryMultiTokenRegex,
                 Set<String> tokenRegexesToDiscard) {
 this(verbose, false,
     new WordToSentenceProcessor<>(boundaryTokenRegex, null,
         boundaryToDiscard, htmlElementsToDiscard,
         WordToSentenceProcessor.stringToNewlineIsSentenceBreak(newlineIsSentenceBreak),
         (boundaryMultiTokenRegex != null) ? TokenSequencePattern.compile(boundaryMultiTokenRegex) : null, tokenRegexesToDiscard));
}

代码示例来源：origin: stanfordnlp/CoreNLP

/** Return a WordsToSentencesAnnotator that never splits the token stream. You just get one sentence.
 *
 *  @return A WordsToSentenceAnnotator.
 */
public static WordsToSentencesAnnotator nonSplitter() {
 WordToSentenceProcessor<CoreLabel> wts = new WordToSentenceProcessor<>(true);
 return new WordsToSentencesAnnotator(false, false, wts);
}

代码示例来源：origin: stanfordnlp/CoreNLP

wts = new WordToSentenceProcessor<>();

代码示例来源：origin: stanfordnlp/CoreNLP

/**
  * For internal debugging purposes only.
  */
 public static void main(String[] args) {
  new BasicDocument<String>();
  Document<String, Word, Word> htmlDoc = BasicDocument.init("top text <h1>HEADING text</h1> this is <p>new paragraph<br>next line<br/>xhtml break etc.");
  System.out.println("Before:");
  System.out.println(htmlDoc);
  Document<String, Word, Word> txtDoc = new StripTagsProcessor<String, Word>(true).processDocument(htmlDoc);
  System.out.println("After:");
  System.out.println(txtDoc);
  Document<String, Word, List<Word>> sentences = new WordToSentenceProcessor<Word>().processDocument(txtDoc);
  System.out.println("Sentences:");
  System.out.println(sentences);
 }
}

代码示例来源：origin: stanfordnlp/CoreNLP

/** Return a WordsToSentencesAnnotator that splits on newlines (only), which are then deleted.
 *  This constructor counts the lines by putting in empty token lists for empty lines.
 *  It tells the underlying splitter to return empty lists of tokens
 *  and then treats those empty lists as empty lines.  We don't
 *  actually include empty sentences in the annotation, though. But they
 *  are used in numbering the sentence. Only this constructor leads to
 *  empty sentences.
 *
 *  @param  nlToken Zero or more new line tokens, which might be a {@literal \n} or the fake
 *                 newline tokens returned from the tokenizer.
 *  @return A WordsToSentenceAnnotator.
 */
public static WordsToSentencesAnnotator newlineSplitter(String... nlToken) {
 // this constructor will keep empty lines as empty sentences
 WordToSentenceProcessor<CoreLabel> wts =
     new WordToSentenceProcessor<>(ArrayUtils.asImmutableSet(nlToken));
 return new WordsToSentencesAnnotator(false, true, wts);
}

代码示例来源：origin: stanfordnlp/CoreNLP

public static void addEnhancedSentences(Annotation doc) {
 //for every sentence that begins a paragraph: append this sentence and the previous one and see if sentence splitter would make a single sentence out of it. If so, add as extra sentence.
 //for each sieve that potentially uses augmentedSentences in original:
 List<CoreMap> sentences = doc.get(CoreAnnotations.SentencesAnnotation.class);
 WordToSentenceProcessor wsp =
     new WordToSentenceProcessor(WordToSentenceProcessor.NewlineIsSentenceBreak.NEVER); //create SentenceSplitter that never splits on newline
 int prevParagraph = 0;
 for(int i = 1; i < sentences.size(); i++) {
  CoreMap sentence = sentences.get(i);
  CoreMap prevSentence = sentences.get(i-1);
  List<CoreLabel> tokensConcat = new ArrayList<>();
  tokensConcat.addAll(prevSentence.get(CoreAnnotations.TokensAnnotation.class));
  tokensConcat.addAll(sentence.get(CoreAnnotations.TokensAnnotation.class));
  List<List<CoreLabel>> sentenceTokens = wsp.process(tokensConcat);
  if(sentenceTokens.size() == 1) { //wsp would have put them into a single sentence --> add enhanced sentence.
   sentence.set(EnhancedSentenceAnnotation.class, constructSentence(sentenceTokens.get(0), prevSentence, sentence));
  }
 }
}

代码示例来源：origin: stanfordnlp/CoreNLP

new WordToSentenceProcessor<>(ArrayUtils.asImmutableSet(new String[]{"\n"}));
  this.countLineNumbers = true;
  this.wts = wts1;
      new WordToSentenceProcessor<>(ArrayUtils.asImmutableSet(new String[]{System.lineSeparator(), "\n"}));
  this.countLineNumbers = true;
  this.wts = wts1;
     new WordToSentenceProcessor<>(ArrayUtils.asImmutableSet(new String[]{PTBTokenizer.getNewlineToken()}));
 this.countLineNumbers = true;
 this.wts = wts1;
if (Boolean.parseBoolean(isOneSentence)) { // this method treats null as false
 WordToSentenceProcessor<CoreLabel> wts1 = new WordToSentenceProcessor<>(true);
 this.countLineNumbers = false;
 this.wts = wts1;
 this.wts = new WordToSentenceProcessor<>(boundaryTokenRegex, boundaryFollowersRegex,
   boundariesToDiscard, htmlElementsToDiscard,
   WordToSentenceProcessor.stringToNewlineIsSentenceBreak(nlsb),

代码示例来源：origin: edu.stanford.nlp/corenlp

public WordsToSentencesAnnotator(boolean verbose) {
 VERBOSE = verbose;
 wts = new WordToSentenceProcessor<CoreLabel>();
}

代码示例来源：origin: com.guokr/stan-cn-com

public WordsToSentencesAnnotator(boolean verbose) {
 this(verbose, false, new WordToSentenceProcessor<CoreLabel>());
}

代码示例来源：origin: com.guokr/stan-cn-com

/** Return a WordsToSentencesAnnotator that never splits the token stream. You just get one sentence.
 *
 *  @param verbose Whether it is verbose.
 *  @return A WordsToSentenceAnnotator.
 */
public static WordsToSentencesAnnotator nonSplitter(boolean verbose) {
 WordToSentenceProcessor<CoreLabel> wts = new WordToSentenceProcessor<CoreLabel>(true);
 return new WordsToSentencesAnnotator(verbose, false, wts);
}

代码示例来源：origin: edu.stanford.nlp/stanford-corenlp

public WordsToSentencesAnnotator(boolean verbose, String boundaryTokenRegex,
                 Set<String> boundaryToDiscard, Set<String> htmlElementsToDiscard,
                 String newlineIsSentenceBreak, String boundaryMultiTokenRegex,
                 Set<String> tokenRegexesToDiscard) {
 this(verbose, false,
     new WordToSentenceProcessor<>(boundaryTokenRegex, null,
         boundaryToDiscard, htmlElementsToDiscard,
         WordToSentenceProcessor.stringToNewlineIsSentenceBreak(newlineIsSentenceBreak),
         (boundaryMultiTokenRegex != null) ? TokenSequencePattern.compile(boundaryMultiTokenRegex) : null, tokenRegexesToDiscard));
}

代码示例来源：origin: edu.stanford.nlp/stanford-corenlp

/** Return a WordsToSentencesAnnotator that never splits the token stream. You just get one sentence.
 *
 *  @return A WordsToSentenceAnnotator.
 */
public static WordsToSentencesAnnotator nonSplitter() {
 WordToSentenceProcessor<CoreLabel> wts = new WordToSentenceProcessor<>(true);
 return new WordsToSentencesAnnotator(false, false, wts);
}

代码示例来源：origin: com.guokr/stan-cn-com

public WordsToSentencesAnnotator(boolean verbose, String boundaryTokenRegex,
                 Set<String> boundaryToDiscard, Set<String> htmlElementsToDiscard,
                 String newlineIsSentenceBreak) {
 this(verbose, false,
    new WordToSentenceProcessor<CoreLabel>(boundaryTokenRegex,
        boundaryToDiscard, htmlElementsToDiscard,
        WordToSentenceProcessor.stringToNewlineIsSentenceBreak(newlineIsSentenceBreak)));
}

代码示例来源：origin: edu.stanford.nlp/corenlp

public static WordsToSentencesAnnotator newlineSplitter(boolean verbose) {
 WordToSentenceProcessor<CoreLabel> wts = 
  new WordToSentenceProcessor<CoreLabel>("", 
                      Collections.<String>emptySet(),
                      Collections.singleton("\n"));
 return new WordsToSentencesAnnotator(wts, verbose);
}

代码示例来源：origin: stackoverflow.com

//split via PTBTokenizer (PTBLexer)
   List<CoreLabel> tokens = PTBTokenizer.coreLabelFactory().getTokenizer(new StringReader(text)).tokenize();
   //do the processing using stanford sentence splitter (WordToSentenceProcessor)
   WordToSentenceProcessor processor = new WordToSentenceProcessor();
   List<List<CoreLabel>> splitSentences = processor.process(tokens);
   //for each sentence
   for (List<CoreLabel> s : splitSentences) {                
     //for each word
     for (CoreLabel token : s) {
       //here you can get the token value and position like;
       //token.value(), token.beginPosition(), token.endPosition()
     }    
   }

代码示例来源：origin: com.guokr/stan-cn-com

public WordsToSentencesAnnotator(boolean verbose, String boundaryTokenRegex,
                 Set<String> boundaryToDiscard, Set<String> htmlElementsToDiscard,
                 String newlineIsSentenceBreak, String boundaryMultiTokenRegex,
                 Set<String> tokenRegexesToDiscard) {
 this(verbose, false,
     new WordToSentenceProcessor<CoreLabel>(boundaryTokenRegex,
         boundaryToDiscard, htmlElementsToDiscard,
         WordToSentenceProcessor.stringToNewlineIsSentenceBreak(newlineIsSentenceBreak),
         (boundaryMultiTokenRegex != null)? TokenSequencePattern.compile(boundaryMultiTokenRegex):null, tokenRegexesToDiscard));
}

代码示例来源：origin: edu.stanford.nlp/corenlp

/**
  * For internal debugging purposes only.
  */
 public static void main(String[] args) {
  new BasicDocument<String>();
  Document<String, Word, Word> htmlDoc = BasicDocument.init("top text <h1>HEADING text</h1> this is <p>new paragraph<br>next line<br/>xhtml break etc.");
  System.out.println("Before:");
  System.out.println(htmlDoc);
  Document<String, Word, Word> txtDoc = new StripTagsProcessor<String, Word>(true).processDocument(htmlDoc);
  System.out.println("After:");
  System.out.println(txtDoc);
  Document<String, Word, List<Word>> sentences = new WordToSentenceProcessor<Word>().processDocument(txtDoc);
  System.out.println("Sentences:");
  System.out.println(sentences);
 }
}

代码示例来源：origin: edu.stanford.nlp/stanford-corenlp

/**
  * For internal debugging purposes only.
  */
 public static void main(String[] args) {
  new BasicDocument<String>();
  Document<String, Word, Word> htmlDoc = BasicDocument.init("top text <h1>HEADING text</h1> this is <p>new paragraph<br>next line<br/>xhtml break etc.");
  System.out.println("Before:");
  System.out.println(htmlDoc);
  Document<String, Word, Word> txtDoc = new StripTagsProcessor<String, Word>(true).processDocument(htmlDoc);
  System.out.println("After:");
  System.out.println(txtDoc);
  Document<String, Word, List<Word>> sentences = new WordToSentenceProcessor<Word>().processDocument(txtDoc);
  System.out.println("Sentences:");
  System.out.println(sentences);
 }
}

代码示例来源：origin: edu.stanford.nlp/stanford-parser

/**
  * For internal debugging purposes only.
  */
 public static void main(String[] args) {
  new BasicDocument<String>();
  Document<String, Word, Word> htmlDoc = BasicDocument.init("top text <h1>HEADING text</h1> this is <p>new paragraph<br>next line<br/>xhtml break etc.");
  System.out.println("Before:");
  System.out.println(htmlDoc);
  Document<String, Word, Word> txtDoc = new StripTagsProcessor<String, Word>(true).processDocument(htmlDoc);
  System.out.println("After:");
  System.out.println(txtDoc);
  Document<String, Word, List<Word>> sentences = new WordToSentenceProcessor<Word>().processDocument(txtDoc);
  System.out.println("Sentences:");
  System.out.println(sentences);
 }
}

代码示例来源：origin: com.guokr/stan-cn-com

/**
  * For internal debugging purposes only.
  */
 public static void main(String[] args) {
  new BasicDocument<String>();
  Document<String, Word, Word> htmlDoc = BasicDocument.init("top text <h1>HEADING text</h1> this is <p>new paragraph<br>next line<br/>xhtml break etc.");
  System.out.println("Before:");
  System.out.println(htmlDoc);
  Document<String, Word, Word> txtDoc = new StripTagsProcessor<String, Word>(true).processDocument(htmlDoc);
  System.out.println("After:");
  System.out.println(txtDoc);
  Document<String, Word, List<Word>> sentences = new WordToSentenceProcessor<Word>().processDocument(txtDoc);
  System.out.println("Sentences:");
  System.out.println(sentences);
 }
}

javascript - jQuery 表单验证，仅允许 .EDU 和 .edu.nn(其中 nn 是 2 位国家/地区代码)电子邮件地址
我使用此方法进行了 .edu 电子邮件验证 - jQuery Form Validation, Only Allow .EDU Email Addresses 但我不想只使用 .edu 或 .edu.
scala - 类型不匹配;找到 : edu. stanford.nlp.util.CoreMap => 需要单位 : java. util.function.Consumer[_> : edu. stanford.nlp.util.CoreMap]
我不明白它要我做什么。分配给 sentence正在工作: val sentences : java.util.List[CoreMap] = document.get(classOf[Sentence
powershell - 无法通过Powershell连接到live @ edu
嗨，我正在尝试通过以下命令使用powershell连接到live @ edu。发射命令: $ SessionNew = New-PSSession -ConfigurationName Micro
edu.stanford.nlp.process.WordToSentenceProcessor类的使用及代码示例
本文整理了Java中edu.stanford.nlp.process.WordToSentenceProcessor类的一些代码示例，展示了WordToSentenceProcessor类的具体用法。
jQuery 表单验证，仅允许 .EDU 电子邮件地址
我正在使用位于此处的 jquery 验证函数和插件。 http://docs.jquery.com/Plugins/Validation 我正在检查 js 文件，并且找到了电子邮件验证 block ，
javascript - xxxx@my.csun.edu 正则表达式的自定义电子邮件验证
大家好，美好的一天。我有 JS 代码可以验证所有类型的电子邮件，但我想将电子邮件的验证限制为一种类型的电子邮件，例如:示例 @my.csun.edu 例如，我希望使用 @my.csun.edu 作为
PHP 检查正在注册的电子邮件域是一个 'school.edu' 地址
我需要为我正在从事的一个项目编写一个函数，我们正在为这个项目制作一个仅供机构的学生、教职员工和校友访问的网站。假设学校网站是:school.edu。我在编写用于检查提交的电子邮件地址是否具有“sc
edu.isi.karma.controller.update.WorksheetListUpdate类的使用及代码示例
本文整理了Java中edu.isi.karma.controller.update.WorksheetListUpdate类的一些代码示例，展示了WorksheetListUpdate类的具体用法。这
edu.umd.cs.findbugs.ba.XField类的使用及代码示例
本文整理了Java中edu.umd.cs.findbugs.ba.XField类的一些代码示例，展示了XField类的具体用法。这些代码示例主要来源于Github/Stackoverflow/Mave
pl.edu.icm.yadda.service2.YaddaError类的使用及代码示例
本文整理了Java中pl.edu.icm.yadda.service2.YaddaError类的一些代码示例，展示了YaddaError类的具体用法。这些代码示例主要来源于Github/Stackov
pl.edu.icm.yadda.common.YaddaException类的使用及代码示例
本文整理了Java中pl.edu.icm.yadda.common.YaddaException类的一些代码示例，展示了YaddaException类的具体用法。这些代码示例主要来源于Github/S
pl.edu.icm.ceon.commons.YaddaCollectionsUtils类的使用及代码示例
本文整理了Java中pl.edu.icm.ceon.commons.YaddaCollectionsUtils类的一些代码示例，展示了YaddaCollectionsUtils类的具体用法。这些代码示
pl.edu.icm.model.bwmeta.YElement类的使用及代码示例
本文整理了Java中pl.edu.icm.model.bwmeta.YElement类的一些代码示例，展示了YElement类的具体用法。这些代码示例主要来源于Github/Stackoverflow
pl.edu.icm.model.bwmeta.YContributor类的使用及代码示例
本文整理了Java中pl.edu.icm.model.bwmeta.YContributor类的一些代码示例，展示了YContributor类的具体用法。这些代码示例主要来源于Github/Stack
javascript - 使用以@pin.edu.sh 结尾的正则表达式验证电子邮件
所以我一直在思考如何正确使用正则表达式，我正在创建一个注册表单，其中使用的电子邮件必须包含 @pin.edu.sh。例如，如果用户决定使用。 johndoe@gmail.com，它不会接受，但是如果用
java - IntelliJ IDEA Edu-使用Gradle编译Java会在Win 10上变慢
我在PC(Win 10 Edu，AMD 5 3600X 3.80 GHz，16 GB RAM，5700XT 8 GB GDDR)和PC笔记本Huawai Matebook X Pro(Win 10 H
php - 验证 .edu 或 .ac 电子邮件地址
我是一个菜鸟，但正在大力尝试简单地验证仅以“.edu”或“.ac”结尾的电子邮件地址，是否有一个简单的函数/脚本/解决方案来解决这个看似简单的问题？能够使用php、javascript或jquery。
python - 根据顶级域(edu、com、org、in)对列表进行排序
给定一个列表， url = ["www.annauniv.edu", "www.google.com", "www.ndtv.com", "www.website.org", "www.bis.org
java - hidden.edu.emory.mathcs.backport*
在应用程序线程转储中，我可以看到具有五个线程的线程池，如下所示: "pool-1-thread-5" prio=10 tid=0x000000000101a000 nid=0xe1f in Objec
edu.illinois.cs.cogcomp.sl.util.WeightVector类的使用及代码示例
本文整理了Java中edu.illinois.cs.cogcomp.sl.util.WeightVector类的一些代码示例，展示了WeightVector类的具体用法。这些代码示例主要来源于Gith

知者

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

edu.stanford.nlp.process.WordToSentenceProcessor.()方法的使用及代码示例

WordToSentenceProcessor.<init>介绍

代码示例