作者热门文章
- Java 双重比较
- java - 比较器与 Apache BeanComparator
- Objective-C 完成 block 导致额外的方法调用?
- database - RESTful URI 是否应该公开数据库主键?
我有一大堆文字。例如
I want to split a paragraph into sentences. But, there is a problem. My paragraph includes dates like Jan.13, 2014 , words like U.A.E and numbers like 2.2. How do i split this.**
输出:
I want to split a paragraph into sentences.
But, there is a problem.
My paragraph includes dates like Jan.13, 2014 , words like U.A.E and numbers like 2.2.
How do i split this.
这是我想要的输出。谁能指导我在 Swift 中做到这一点。
谢谢。
最佳答案
使用 NSLinguisticTagger。它会根据您给定的输入得到正确的句子,因为它会根据实际的语言术语进行分析。
这是一个粗略的草稿(Swift 1.2,它不会在 Swift 2.0 中编译):
let s = "I want to split a paragraph into sentences. But, there is a problem. My paragraph includes dates like Jan.13, 2014 , words like U.A.E and numbers like 2.2. How do i split this."
var r = [Range<String.Index>]()
let t = s.linguisticTagsInRange(
indices(s), scheme: NSLinguisticTagSchemeLexicalClass,
options: nil, tokenRanges: &r)
var result = [String]()
let ixs = Array(enumerate(t)).filter {
$0.1 == "SentenceTerminator"
}.map {r[$0.0].startIndex}
var prev = s.startIndex
for ix in ixs {
let r = prev...ix
result.append(
s[r].stringByTrimmingCharactersInSet(
NSCharacterSet.whitespaceCharacterSet()))
prev = advance(ix,1)
}
这是一个 Swift 2.0 版本(更新到 Xcode 7 beta 6):
let s = "I want to split a paragraph into sentences. But, there is a problem. My paragraph includes dates like Jan.13, 2014 , words like U.A.E and numbers like 2.2. How do i split this."
var r = [Range<String.Index>]()
let t = s.linguisticTagsInRange(
s.characters.indices, scheme: NSLinguisticTagSchemeLexicalClass,
tokenRanges: &r)
var result = [String]()
let ixs = t.enumerate().filter {
$0.1 == "SentenceTerminator"
}.map {r[$0.0].startIndex}
var prev = s.startIndex
for ix in ixs {
let r = prev...ix
result.append(
s[r].stringByTrimmingCharactersInSet(
NSCharacterSet.whitespaceCharacterSet()))
prev = ix.advancedBy(1)
}
此处针对 Swift 3 进行了更新:
let s = "I want to split a paragraph into sentences. But, there is a problem. My paragraph includes dates like Jan.13, 2014 , words like U.A.E and numbers like 2.2. How do i split this."
var r = [Range<String.Index>]()
let t = s.linguisticTags(
in: s.startIndex..<s.endIndex,
scheme: NSLinguisticTagSchemeLexicalClass,
tokenRanges: &r)
var result = [String]()
let ixs = t.enumerated().filter {
$0.1 == "SentenceTerminator"
}.map {r[$0.0].lowerBound}
var prev = s.startIndex
for ix in ixs {
let r = prev...ix
result.append(
s[r].trimmingCharacters(
in: NSCharacterSet.whitespaces))
prev = s.index(after: ix)
}
result
是一个包含四个字符串的数组,每个字符串一个句子:
["I want to split a paragraph into sentences.",
"But, there is a problem.",
"My paragraph includes dates like Jan.13, 2014 , words like U.A.E and numbers like 2.2.",
"How do i split this."]
关于swift - 将段落拆分成句子,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32168581/
我是一名优秀的程序员,十分优秀!