作者热门文章
- xml - AJAX/Jquery XML 解析
- 具有多重继承的 XML 模式
- .net - 枚举序列化 Json 与 XML
- XML 简单类型、简单内容、复杂类型、复杂内容
我正在使用 ruby 来计算我拥有的一些内容的 Gunning Fog Index,我可以成功地实现此处描述的算法:
我正在使用以下方法来计算每个单词中的音节数:
Tokenizer = /([aeiouy]{1,3})/
def count_syllables(word)
len = 0
if word[-3..-1] == 'ing' then
len += 1
word = word[0...-3]
end
got = word.scan(Tokenizer)
len += got.size()
if got.size() > 1 and got[-1] == ['e'] and
word[-1].chr() == 'e' and
word[-2].chr() != 'l' then
len -= 1
end
return len
end
它有时会将只有 2 个音节的单词识别为有 3 个音节。任何人都可以提供任何建议或知道更好的方法吗?
text = "The word logorrhoea is often used pejoratively to describe prose that is highly abstract and contains little concrete language. Since abstract writing is hard to visualize, it often seems as though it makes no sense and all the words are excessive. Writers in academic fields that concern themselves mostly with the abstract, such as philosophy and especially postmodernism, often fail to include extensive concrete examples of their ideas, and so a superficial examination of their work might lead one to believe that it is all nonsense."
# used to get rid of any puncuation
text = text.gsub!(/\W+/, ' ')
word_array = text.split(' ')
word_array.each do |word|
puts word if count_syllables(word) > 2
end
“themselves”被算作 3 但它只有 2
最佳答案
我之前给你的功能是基于概述的这些简单规则 here :
Each vowel (a, e, i, o, u, y) in a word counts as one syllable subject to the following sub-rules:
- Ignore final -ES, -ED, -E (except for -LE)
- Words of three letters or less count as one syllable
- Consecutive vowels count as one syllable.
代码如下:
def new_count(word)
word.downcase!
return 1 if word.length <= 3
word.sub!(/(?:[^laeiouy]es|ed|[^laeiouy]e)$/, '')
word.sub!(/^y/, '')
word.scan(/[aeiouy]{1,2}/).size
end
显然,这也不是完美的,但您将获得的只是一种启发式方法。
我稍微更改了代码以处理前导“y”并修复了正则表达式以更好地处理“les”结尾(例如在“candles”中)。
这是使用问题中的文本进行的比较:
# used to get rid of any puncuation
text = text.gsub!(/\W+/, ' ')
words = text.split(' ')
words.each do |word|
old = count_syllables(word.dup)
new = new_count(word.dup)
puts "#{word}: \t#{old}\t#{new}" if old != new
end
输出是:
logorrhoea: 3 4
used: 2 1
makes: 2 1
themselves: 3 2
所以这似乎是一个改进。
关于Ruby,数音节,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/1271918/
我正在尝试在图像上打印缅甸字符(准确地说是Myanmar3.ttf)以进行数据生成-OCR。与其他语言不同,缅甸语中的单词/字符是使用音节构成的,并且音节有顺序。因此,缅甸语的 unicode 使用复
我正在尝试以编程方式创建韩语句子,但要正确执行此操作意味着我需要一种方法来确定哪些 Hangul Jamo unicode 字符构成了每个 Hangul Syllable unicode 字符。更具体
我是一名优秀的程序员,十分优秀!