gpt4 book ai didi

ios - 如何检测 iOS 中的文本(字符串)语言?

转载 作者:IT王子 更新时间:2023-10-29 05:33:43 24 4
gpt4 key购买 nike

例如,给定以下字符串:

let textEN = "The quick brown fox jumps over the lazy dog"
let textES = "El zorro marrón rápido salta sobre el perro perezoso"
let textAR = "الثعلب البني السريع يقفز فوق الكلب الكسول"
let textDE = "Der schnelle braune Fuchs springt über den faulen Hund"

我想检测每个人使用的语言。

让我们假设已实现函数的签名是:

func detectedLanguage<T: StringProtocol>(_ forString: T) -> String?

在未检测到语言的情况下返回一个可选字符串。

因此适当的结果是:

let englishDetectedLanguage = detectedLanguage(textEN) // => English
let spanishDetectedLanguage = detectedLanguage(textES) // => Spanish
let arabicDetectedLanguage = detectedLanguage(textAR) // => Arabic
let germanDetectedLanguage = detectedLanguage(textDE) // => German

有没有简单的方法来实现它?

最佳答案

最新版本(iOS 12+)

简要说明:

您可以使用 NLLanguageRecognizer 来实现它,如:

import NaturalLanguage

func detectedLanguage(for string: String) -> String? {
let recognizer = NLLanguageRecognizer()
recognizer.processString(string)
guard let languageCode = recognizer.dominantLanguage?.rawValue else { return nil }
let detectedLanguage = Locale.current.localizedString(forIdentifier: languageCode)
return detectedLanguage
}

旧版本(iOS 11+)

简要说明:

您可以使用 NSLinguisticTagger 来实现它,如:

func detectedLanguage<T: StringProtocol>(for string: T) -> String? {
let recognizer = NLLanguageRecognizer()
recognizer.processString(String(string))
guard let languageCode = recognizer.dominantLanguage?.rawValue else { return nil }
let detectedLanguage = Locale.current.localizedString(forIdentifier: languageCode)
return detectedLanguage
}

详情:

首先,你应该知道你问的主要是关于的世界Natural language processing (NLP) .

由于 NLP 不仅仅是文本语言检测,因此答案的其余部分不会包含具体的 NLP 信息。

显然,实现这样的功能并不容易,尤其是当开始关心过程的细节时,例如拆分成句子甚至单词,然后识别名称和标点符号等......我打赌你会想“多么痛苦的过程!我自己做这件事甚至不合逻辑”;幸运的是,iOS 确实支持 NLP(实际上,NLP API 可用于所有 Apple 平台,而不仅仅是 iOS),使您的目标易于实现。您将使用的核心组件是 NSLinguisticTagger :

Analyze natural language text to tag part of speech and lexical class, identify names, perform lemmatization, and determine the language and script.

NSLinguisticTagger provides a uniform interface to a variety of natural language processing functionality with support for many different languages and scripts. You can use this class to segment natural language text into paragraphs, sentences, or words, and tag information about those segments, such as part of speech, lexical class, lemma, script, and language.

如类文档中所述,您正在寻找的方法 - 在确定主要语言和正字法部分 - 是 dominantLanguage(for:) :

Returns the dominant language for the specified string.

.

.

Return Value

The BCP-47 tag identifying the dominant language of the string, or the tag "und" if a specific language cannot be determined.

您可能会注意到 NSLinguisticTagger 从 iOS 5 开始就存在了。但是,支持 dominantLanguage(for:) 方法iOS 11 及更高版本,那是因为它是在 Core ML Framework 之上开发的:

. . .

Core ML is the foundation for domain-specific frameworks and functionality. Core ML supports Vision for image analysis, Foundation for natural language processing (for example, the NSLinguisticTagger class), and GameplayKit for evaluating learned decision trees. Core ML itself builds on top of low-level primitives like Accelerate and BNNS, as well as Metal Performance Shaders.

enter image description here

基于通过传递“The quick brown fox jumps over the lazy dog”调用 dominantLanguage(for:) 的返回值:

NSLinguisticTagger.dominantLanguage(for: "The quick brown fox jumps over the lazy dog")

将是“en”可选字符串。但是,到目前为止,这不是所需的输出,期望的是获得“英语”!好吧,这正是您通过调用 localizedString(forLanguageCode:) 应该得到的结果。方法来自 Locale构造并传递获取的语言代码:

Locale.current.localizedString(forIdentifier: "en") // English

放在一起:

如“快速解答”代码片段中所述,该函数将是:

func detectedLanguage<T: StringProtocol>(_ forString: T) -> String? {
guard let languageCode = NSLinguisticTagger.dominantLanguage(for: String(forString)) else {
return nil
}

let detectedLanguage = Locale.current.localizedString(forIdentifier: languageCode)

return detectedLanguage
}

输出:

正如预期的那样:

let englishDetectedLanguage = detectedLanguage(textEN) // => English
let spanishDetectedLanguage = detectedLanguage(textES) // => Spanish
let arabicDetectedLanguage = detectedLanguage(textAR) // => Arabic
let germanDetectedLanguage = detectedLanguage(textDE) // => German

注意:

仍然存在无法获取给定字符串的语言名称的情况,例如:

let textUND = "SdsOE"
let undefinedDetectedLanguage = detectedLanguage(textUND) // => Unknown language

或者它甚至可以是nil:

let rubbish = "000747322"
let rubbishDetectedLanguage = detectedLanguage(rubbish) // => nil

仍然觉得提供有用的输出是一个不错的结果...


此外:

关于 NSLinguisticTagger:

尽管我不会深入研究 NSLinguisticTagger 的用法,但我想指出其中有几个非常酷的功能,而不仅仅是简单地检测给定文本的语言;作为一个非常简单的示例:在枚举标签时使用引理 在使用Information retrieval 时非常有用。 ,因为您可以识别“驾驶”一词传递“驾驶”一词。

官方资源

Apple 视频 session :

此外,为了熟悉 CoreML:

关于ios - 如何检测 iOS 中的文本(字符串)语言?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47890747/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com