- 使用 Spring Initializr 创建 Spring Boot 应用程序
- 在Spring Boot中配置Cassandra
- 在 Spring Boot 上配置 Tomcat 连接池
- 将Camel消息路由到嵌入WildFly的Artemis上
本文整理了Java中edu.stanford.nlp.process.WordToSentenceProcessor.<init>()
方法的一些代码示例,展示了WordToSentenceProcessor.<init>()
的具体用法。这些代码示例主要来源于Github
/Stackoverflow
/Maven
等平台,是从一些精选项目中提取出来的代码,具有较强的参考意义,能在一定程度帮忙到你。WordToSentenceProcessor.<init>()
方法的具体详情如下:
包路径:edu.stanford.nlp.process.WordToSentenceProcessor
类名称:WordToSentenceProcessor
方法名:<init>
[英]Create a WordToSentenceProcessor using a sensible default list of tokens for sentence ending for English/Latin writing systems. The default set is: {".","?","!"} and any combination of ! or ?, as in !!!?!?!?!!!?!!?!!!. A sequence of two or more consecutive line breaks is taken as a paragraph break which also splits sentences. This is the usual constructor for sentence breaking reasonable text, which uses hard-line breaking, so two blank lines indicate a paragraph break. People commonly use this constructor.
[中]为英语/拉丁语书写系统的句子结尾创建一个WordToSentenceProcessor,使用一个合理的默认标记列表。默认设置为:{”,"?","!"} 以及任何组合!或就像!!!?!?!?!!!?!!?!!!。两个或两个以上连续换行符的序列被视为一个段落换行符,也可以拆分句子。这是用于合理文本断句的常用构造器,它使用硬换行,因此两个空行表示段落中断。人们通常使用这个构造函数。
代码示例来源:origin: stanfordnlp/CoreNLP
public WordsToSentencesAnnotator(boolean verbose, String boundaryTokenRegex,
Set<String> boundaryToDiscard, Set<String> htmlElementsToDiscard,
String newlineIsSentenceBreak, String boundaryMultiTokenRegex,
Set<String> tokenRegexesToDiscard) {
this(verbose, false,
new WordToSentenceProcessor<>(boundaryTokenRegex, null,
boundaryToDiscard, htmlElementsToDiscard,
WordToSentenceProcessor.stringToNewlineIsSentenceBreak(newlineIsSentenceBreak),
(boundaryMultiTokenRegex != null) ? TokenSequencePattern.compile(boundaryMultiTokenRegex) : null, tokenRegexesToDiscard));
}
代码示例来源:origin: stanfordnlp/CoreNLP
/** Return a WordsToSentencesAnnotator that never splits the token stream. You just get one sentence.
*
* @return A WordsToSentenceAnnotator.
*/
public static WordsToSentencesAnnotator nonSplitter() {
WordToSentenceProcessor<CoreLabel> wts = new WordToSentenceProcessor<>(true);
return new WordsToSentencesAnnotator(false, false, wts);
}
代码示例来源:origin: stanfordnlp/CoreNLP
wts = new WordToSentenceProcessor<>();
代码示例来源:origin: stanfordnlp/CoreNLP
/**
* For internal debugging purposes only.
*/
public static void main(String[] args) {
new BasicDocument<String>();
Document<String, Word, Word> htmlDoc = BasicDocument.init("top text <h1>HEADING text</h1> this is <p>new paragraph<br>next line<br/>xhtml break etc.");
System.out.println("Before:");
System.out.println(htmlDoc);
Document<String, Word, Word> txtDoc = new StripTagsProcessor<String, Word>(true).processDocument(htmlDoc);
System.out.println("After:");
System.out.println(txtDoc);
Document<String, Word, List<Word>> sentences = new WordToSentenceProcessor<Word>().processDocument(txtDoc);
System.out.println("Sentences:");
System.out.println(sentences);
}
}
代码示例来源:origin: stanfordnlp/CoreNLP
/** Return a WordsToSentencesAnnotator that splits on newlines (only), which are then deleted.
* This constructor counts the lines by putting in empty token lists for empty lines.
* It tells the underlying splitter to return empty lists of tokens
* and then treats those empty lists as empty lines. We don't
* actually include empty sentences in the annotation, though. But they
* are used in numbering the sentence. Only this constructor leads to
* empty sentences.
*
* @param nlToken Zero or more new line tokens, which might be a {@literal \n} or the fake
* newline tokens returned from the tokenizer.
* @return A WordsToSentenceAnnotator.
*/
public static WordsToSentencesAnnotator newlineSplitter(String... nlToken) {
// this constructor will keep empty lines as empty sentences
WordToSentenceProcessor<CoreLabel> wts =
new WordToSentenceProcessor<>(ArrayUtils.asImmutableSet(nlToken));
return new WordsToSentencesAnnotator(false, true, wts);
}
代码示例来源:origin: stanfordnlp/CoreNLP
public static void addEnhancedSentences(Annotation doc) {
//for every sentence that begins a paragraph: append this sentence and the previous one and see if sentence splitter would make a single sentence out of it. If so, add as extra sentence.
//for each sieve that potentially uses augmentedSentences in original:
List<CoreMap> sentences = doc.get(CoreAnnotations.SentencesAnnotation.class);
WordToSentenceProcessor wsp =
new WordToSentenceProcessor(WordToSentenceProcessor.NewlineIsSentenceBreak.NEVER); //create SentenceSplitter that never splits on newline
int prevParagraph = 0;
for(int i = 1; i < sentences.size(); i++) {
CoreMap sentence = sentences.get(i);
CoreMap prevSentence = sentences.get(i-1);
List<CoreLabel> tokensConcat = new ArrayList<>();
tokensConcat.addAll(prevSentence.get(CoreAnnotations.TokensAnnotation.class));
tokensConcat.addAll(sentence.get(CoreAnnotations.TokensAnnotation.class));
List<List<CoreLabel>> sentenceTokens = wsp.process(tokensConcat);
if(sentenceTokens.size() == 1) { //wsp would have put them into a single sentence --> add enhanced sentence.
sentence.set(EnhancedSentenceAnnotation.class, constructSentence(sentenceTokens.get(0), prevSentence, sentence));
}
}
}
代码示例来源:origin: stanfordnlp/CoreNLP
new WordToSentenceProcessor<>(ArrayUtils.asImmutableSet(new String[]{"\n"}));
this.countLineNumbers = true;
this.wts = wts1;
new WordToSentenceProcessor<>(ArrayUtils.asImmutableSet(new String[]{System.lineSeparator(), "\n"}));
this.countLineNumbers = true;
this.wts = wts1;
new WordToSentenceProcessor<>(ArrayUtils.asImmutableSet(new String[]{PTBTokenizer.getNewlineToken()}));
this.countLineNumbers = true;
this.wts = wts1;
if (Boolean.parseBoolean(isOneSentence)) { // this method treats null as false
WordToSentenceProcessor<CoreLabel> wts1 = new WordToSentenceProcessor<>(true);
this.countLineNumbers = false;
this.wts = wts1;
this.wts = new WordToSentenceProcessor<>(boundaryTokenRegex, boundaryFollowersRegex,
boundariesToDiscard, htmlElementsToDiscard,
WordToSentenceProcessor.stringToNewlineIsSentenceBreak(nlsb),
代码示例来源:origin: edu.stanford.nlp/corenlp
public WordsToSentencesAnnotator(boolean verbose) {
VERBOSE = verbose;
wts = new WordToSentenceProcessor<CoreLabel>();
}
代码示例来源:origin: com.guokr/stan-cn-com
public WordsToSentencesAnnotator(boolean verbose) {
this(verbose, false, new WordToSentenceProcessor<CoreLabel>());
}
代码示例来源:origin: com.guokr/stan-cn-com
/** Return a WordsToSentencesAnnotator that never splits the token stream. You just get one sentence.
*
* @param verbose Whether it is verbose.
* @return A WordsToSentenceAnnotator.
*/
public static WordsToSentencesAnnotator nonSplitter(boolean verbose) {
WordToSentenceProcessor<CoreLabel> wts = new WordToSentenceProcessor<CoreLabel>(true);
return new WordsToSentencesAnnotator(verbose, false, wts);
}
代码示例来源:origin: edu.stanford.nlp/stanford-corenlp
public WordsToSentencesAnnotator(boolean verbose, String boundaryTokenRegex,
Set<String> boundaryToDiscard, Set<String> htmlElementsToDiscard,
String newlineIsSentenceBreak, String boundaryMultiTokenRegex,
Set<String> tokenRegexesToDiscard) {
this(verbose, false,
new WordToSentenceProcessor<>(boundaryTokenRegex, null,
boundaryToDiscard, htmlElementsToDiscard,
WordToSentenceProcessor.stringToNewlineIsSentenceBreak(newlineIsSentenceBreak),
(boundaryMultiTokenRegex != null) ? TokenSequencePattern.compile(boundaryMultiTokenRegex) : null, tokenRegexesToDiscard));
}
代码示例来源:origin: edu.stanford.nlp/stanford-corenlp
/** Return a WordsToSentencesAnnotator that never splits the token stream. You just get one sentence.
*
* @return A WordsToSentenceAnnotator.
*/
public static WordsToSentencesAnnotator nonSplitter() {
WordToSentenceProcessor<CoreLabel> wts = new WordToSentenceProcessor<>(true);
return new WordsToSentencesAnnotator(false, false, wts);
}
代码示例来源:origin: com.guokr/stan-cn-com
public WordsToSentencesAnnotator(boolean verbose, String boundaryTokenRegex,
Set<String> boundaryToDiscard, Set<String> htmlElementsToDiscard,
String newlineIsSentenceBreak) {
this(verbose, false,
new WordToSentenceProcessor<CoreLabel>(boundaryTokenRegex,
boundaryToDiscard, htmlElementsToDiscard,
WordToSentenceProcessor.stringToNewlineIsSentenceBreak(newlineIsSentenceBreak)));
}
代码示例来源:origin: edu.stanford.nlp/corenlp
public static WordsToSentencesAnnotator newlineSplitter(boolean verbose) {
WordToSentenceProcessor<CoreLabel> wts =
new WordToSentenceProcessor<CoreLabel>("",
Collections.<String>emptySet(),
Collections.singleton("\n"));
return new WordsToSentencesAnnotator(wts, verbose);
}
代码示例来源:origin: stackoverflow.com
//split via PTBTokenizer (PTBLexer)
List<CoreLabel> tokens = PTBTokenizer.coreLabelFactory().getTokenizer(new StringReader(text)).tokenize();
//do the processing using stanford sentence splitter (WordToSentenceProcessor)
WordToSentenceProcessor processor = new WordToSentenceProcessor();
List<List<CoreLabel>> splitSentences = processor.process(tokens);
//for each sentence
for (List<CoreLabel> s : splitSentences) {
//for each word
for (CoreLabel token : s) {
//here you can get the token value and position like;
//token.value(), token.beginPosition(), token.endPosition()
}
}
代码示例来源:origin: com.guokr/stan-cn-com
public WordsToSentencesAnnotator(boolean verbose, String boundaryTokenRegex,
Set<String> boundaryToDiscard, Set<String> htmlElementsToDiscard,
String newlineIsSentenceBreak, String boundaryMultiTokenRegex,
Set<String> tokenRegexesToDiscard) {
this(verbose, false,
new WordToSentenceProcessor<CoreLabel>(boundaryTokenRegex,
boundaryToDiscard, htmlElementsToDiscard,
WordToSentenceProcessor.stringToNewlineIsSentenceBreak(newlineIsSentenceBreak),
(boundaryMultiTokenRegex != null)? TokenSequencePattern.compile(boundaryMultiTokenRegex):null, tokenRegexesToDiscard));
}
代码示例来源:origin: edu.stanford.nlp/corenlp
/**
* For internal debugging purposes only.
*/
public static void main(String[] args) {
new BasicDocument<String>();
Document<String, Word, Word> htmlDoc = BasicDocument.init("top text <h1>HEADING text</h1> this is <p>new paragraph<br>next line<br/>xhtml break etc.");
System.out.println("Before:");
System.out.println(htmlDoc);
Document<String, Word, Word> txtDoc = new StripTagsProcessor<String, Word>(true).processDocument(htmlDoc);
System.out.println("After:");
System.out.println(txtDoc);
Document<String, Word, List<Word>> sentences = new WordToSentenceProcessor<Word>().processDocument(txtDoc);
System.out.println("Sentences:");
System.out.println(sentences);
}
}
代码示例来源:origin: edu.stanford.nlp/stanford-corenlp
/**
* For internal debugging purposes only.
*/
public static void main(String[] args) {
new BasicDocument<String>();
Document<String, Word, Word> htmlDoc = BasicDocument.init("top text <h1>HEADING text</h1> this is <p>new paragraph<br>next line<br/>xhtml break etc.");
System.out.println("Before:");
System.out.println(htmlDoc);
Document<String, Word, Word> txtDoc = new StripTagsProcessor<String, Word>(true).processDocument(htmlDoc);
System.out.println("After:");
System.out.println(txtDoc);
Document<String, Word, List<Word>> sentences = new WordToSentenceProcessor<Word>().processDocument(txtDoc);
System.out.println("Sentences:");
System.out.println(sentences);
}
}
代码示例来源:origin: edu.stanford.nlp/stanford-parser
/**
* For internal debugging purposes only.
*/
public static void main(String[] args) {
new BasicDocument<String>();
Document<String, Word, Word> htmlDoc = BasicDocument.init("top text <h1>HEADING text</h1> this is <p>new paragraph<br>next line<br/>xhtml break etc.");
System.out.println("Before:");
System.out.println(htmlDoc);
Document<String, Word, Word> txtDoc = new StripTagsProcessor<String, Word>(true).processDocument(htmlDoc);
System.out.println("After:");
System.out.println(txtDoc);
Document<String, Word, List<Word>> sentences = new WordToSentenceProcessor<Word>().processDocument(txtDoc);
System.out.println("Sentences:");
System.out.println(sentences);
}
}
代码示例来源:origin: com.guokr/stan-cn-com
/**
* For internal debugging purposes only.
*/
public static void main(String[] args) {
new BasicDocument<String>();
Document<String, Word, Word> htmlDoc = BasicDocument.init("top text <h1>HEADING text</h1> this is <p>new paragraph<br>next line<br/>xhtml break etc.");
System.out.println("Before:");
System.out.println(htmlDoc);
Document<String, Word, Word> txtDoc = new StripTagsProcessor<String, Word>(true).processDocument(htmlDoc);
System.out.println("After:");
System.out.println(txtDoc);
Document<String, Word, List<Word>> sentences = new WordToSentenceProcessor<Word>().processDocument(txtDoc);
System.out.println("Sentences:");
System.out.println(sentences);
}
}
我使用此方法进行了 .edu 电子邮件验证 - jQuery Form Validation, Only Allow .EDU Email Addresses 但我不想只使用 .edu 或 .edu.
我不明白它要我做什么。分配给 sentence正在工作: val sentences : java.util.List[CoreMap] = document.get(classOf[Sentence
嗨, 我正在尝试通过以下命令使用powershell连接到live @ edu。 发射命令: $ SessionNew = New-PSSession -ConfigurationName Micro
本文整理了Java中edu.stanford.nlp.process.WordToSentenceProcessor类的一些代码示例,展示了WordToSentenceProcessor类的具体用法。
我正在使用位于此处的 jquery 验证函数和插件。 http://docs.jquery.com/Plugins/Validation 我正在检查 js 文件,并且找到了电子邮件验证 block ,
大家好,美好的一天。 我有 JS 代码可以验证所有类型的电子邮件,但我想将电子邮件的验证限制为一种类型的电子邮件,例如:示例 @my.csun.edu 例如,我希望使用 @my.csun.edu 作为
我需要为我正在从事的一个项目编写一个函数,我们正在为这个项目制作一个仅供机构的学生、教职员工和校友访问的网站。 假设学校网站是:school.edu。 我在编写用于检查提交的电子邮件地址是否具有“sc
本文整理了Java中edu.isi.karma.controller.update.WorksheetListUpdate类的一些代码示例,展示了WorksheetListUpdate类的具体用法。这
本文整理了Java中edu.umd.cs.findbugs.ba.XField类的一些代码示例,展示了XField类的具体用法。这些代码示例主要来源于Github/Stackoverflow/Mave
本文整理了Java中pl.edu.icm.yadda.service2.YaddaError类的一些代码示例,展示了YaddaError类的具体用法。这些代码示例主要来源于Github/Stackov
本文整理了Java中pl.edu.icm.yadda.common.YaddaException类的一些代码示例,展示了YaddaException类的具体用法。这些代码示例主要来源于Github/S
本文整理了Java中pl.edu.icm.ceon.commons.YaddaCollectionsUtils类的一些代码示例,展示了YaddaCollectionsUtils类的具体用法。这些代码示
本文整理了Java中pl.edu.icm.model.bwmeta.YElement类的一些代码示例,展示了YElement类的具体用法。这些代码示例主要来源于Github/Stackoverflow
本文整理了Java中pl.edu.icm.model.bwmeta.YContributor类的一些代码示例,展示了YContributor类的具体用法。这些代码示例主要来源于Github/Stack
所以我一直在思考如何正确使用正则表达式,我正在创建一个注册表单,其中使用的电子邮件必须包含 @pin.edu.sh。例如,如果用户决定使用。 johndoe@gmail.com,它不会接受,但是如果用
我在PC(Win 10 Edu,AMD 5 3600X 3.80 GHz,16 GB RAM,5700XT 8 GB GDDR)和PC笔记本Huawai Matebook X Pro(Win 10 H
我是一个菜鸟,但正在大力尝试简单地验证仅以“.edu”或“.ac”结尾的电子邮件地址,是否有一个简单的函数/脚本/解决方案来解决这个看似简单的问题?能够使用php、javascript或jquery。
给定一个列表, url = ["www.annauniv.edu", "www.google.com", "www.ndtv.com", "www.website.org", "www.bis.org
在应用程序线程转储中,我可以看到具有五个线程的线程池,如下所示: "pool-1-thread-5" prio=10 tid=0x000000000101a000 nid=0xe1f in Objec
本文整理了Java中edu.illinois.cs.cogcomp.sl.util.WeightVector类的一些代码示例,展示了WeightVector类的具体用法。这些代码示例主要来源于Gith
我是一名优秀的程序员,十分优秀!