gpt4 book ai didi

Ruby,数音节

转载 作者:数据小太阳 更新时间:2023-10-29 06:52:35 26 4
gpt4 key购买 nike

我正在使用 ruby​​ 来计算我拥有的一些内容的 Gunning Fog Index,我可以成功地实现此处描述的算法:

Gunning Fog Index

我正在使用以下方法来计算每个单词中的音节数:

Tokenizer = /([aeiouy]{1,3})/

def count_syllables(word)

len = 0

if word[-3..-1] == 'ing' then
len += 1
word = word[0...-3]
end

got = word.scan(Tokenizer)
len += got.size()

if got.size() > 1 and got[-1] == ['e'] and
word[-1].chr() == 'e' and
word[-2].chr() != 'l' then
len -= 1
end

return len

end

它有时会将只有 2 个音节的单词识别为有 3 个音节。任何人都可以提供任何建议或知道更好的方法吗?

text = "The word logorrhoea is often used pejoratively to describe prose that is highly abstract and contains little concrete language. Since abstract writing is hard to visualize, it often seems as though it makes no sense and all the words are excessive. Writers in academic fields that concern themselves mostly with the abstract, such as philosophy and especially postmodernism, often fail to include extensive concrete examples of their ideas, and so a superficial examination of their work might lead one to believe that it is all nonsense."

# used to get rid of any puncuation
text = text.gsub!(/\W+/, ' ')

word_array = text.split(' ')

word_array.each do |word|
puts word if count_syllables(word) > 2
end

“themselves”被算作 3 但它只有 2

最佳答案

我之前给你的功能是基于概述的这些简单规则 here :

Each vowel (a, e, i, o, u, y) in a word counts as one syllable subject to the following sub-rules:

  • Ignore final -ES, -ED, -E (except for -LE)
  • Words of three letters or less count as one syllable
  • Consecutive vowels count as one syllable.

代码如下:

def new_count(word)
word.downcase!
return 1 if word.length <= 3
word.sub!(/(?:[^laeiouy]es|ed|[^laeiouy]e)$/, '')
word.sub!(/^y/, '')
word.scan(/[aeiouy]{1,2}/).size
end

显然,这也不是完美的,但您将获得的只是一种启发式方法。

编辑:

我稍微更改了代码以处理前导“y”并修复了正则表达式以更好地处理“les”结尾(例如在“candles”中)。

这是使用问题中的文本进行的比较:

# used to get rid of any puncuation
text = text.gsub!(/\W+/, ' ')

words = text.split(' ')

words.each do |word|
old = count_syllables(word.dup)
new = new_count(word.dup)
puts "#{word}: \t#{old}\t#{new}" if old != new
end

输出是:

logorrhoea:     3   4
used: 2 1
makes: 2 1
themselves: 3 2

所以这似乎是一个改进。

关于Ruby,数音节,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/1271918/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com