gpt4 book ai didi

swift - 将段落拆分成句子

转载 作者:搜寻专家 更新时间:2023-11-01 05:53:52 29 4
gpt4 key购买 nike

我有一大堆文字。例如

I want to split a paragraph into sentences. But, there is a problem. My paragraph includes dates like Jan.13, 2014 , words like U.A.E and numbers like 2.2. How do i split this.**

输出:

I want to split a paragraph into sentences.

But, there is a problem.

My paragraph includes dates like Jan.13, 2014 , words like U.A.E and numbers like 2.2.

How do i split this.

这是我想要的输出。谁能指导我在 Swift 中做到这一点。

谢谢。

最佳答案

使用 NSLinguisticTagger。它会根据您给定的输入得到正确的句子,因为它会根据实际的语言术语进行分析。

这是一个粗略的草稿(Swift 1.2,它不会在 Swift 2.0 中编译):

let s = "I want to split a paragraph into sentences. But, there is a problem. My paragraph includes dates like Jan.13, 2014 , words like U.A.E and numbers like 2.2. How do i split this."
var r = [Range<String.Index>]()
let t = s.linguisticTagsInRange(
indices(s), scheme: NSLinguisticTagSchemeLexicalClass,
options: nil, tokenRanges: &r)
var result = [String]()
let ixs = Array(enumerate(t)).filter {
$0.1 == "SentenceTerminator"
}.map {r[$0.0].startIndex}
var prev = s.startIndex
for ix in ixs {
let r = prev...ix
result.append(
s[r].stringByTrimmingCharactersInSet(
NSCharacterSet.whitespaceCharacterSet()))
prev = advance(ix,1)
}

这是一个 Swift 2.0 版本(更新到 Xcode 7 beta 6):

let s = "I want to split a paragraph into sentences. But, there is a problem. My paragraph includes dates like Jan.13, 2014 , words like U.A.E and numbers like 2.2. How do i split this."
var r = [Range<String.Index>]()
let t = s.linguisticTagsInRange(
s.characters.indices, scheme: NSLinguisticTagSchemeLexicalClass,
tokenRanges: &r)
var result = [String]()
let ixs = t.enumerate().filter {
$0.1 == "SentenceTerminator"
}.map {r[$0.0].startIndex}
var prev = s.startIndex
for ix in ixs {
let r = prev...ix
result.append(
s[r].stringByTrimmingCharactersInSet(
NSCharacterSet.whitespaceCharacterSet()))
prev = ix.advancedBy(1)
}

此处针对 Swift 3 进行了更新:

let s = "I want to split a paragraph into sentences. But, there is a problem. My paragraph includes dates like Jan.13, 2014 , words like U.A.E and numbers like 2.2. How do i split this."
var r = [Range<String.Index>]()
let t = s.linguisticTags(
in: s.startIndex..<s.endIndex,
scheme: NSLinguisticTagSchemeLexicalClass,
tokenRanges: &r)
var result = [String]()
let ixs = t.enumerated().filter {
$0.1 == "SentenceTerminator"
}.map {r[$0.0].lowerBound}
var prev = s.startIndex
for ix in ixs {
let r = prev...ix
result.append(
s[r].trimmingCharacters(
in: NSCharacterSet.whitespaces))
prev = s.index(after: ix)
}

result 是一个包含四个字符串的数组,每个字符串一个句子:

["I want to split a paragraph into sentences.", 
"But, there is a problem.",
"My paragraph includes dates like Jan.13, 2014 , words like U.A.E and numbers like 2.2.",
"How do i split this."]

关于swift - 将段落拆分成句子,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32168581/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com