Ruby，数音节-6ren

Ruby，数音节

转载作者：数据小太阳更新时间：2023-10-29 06:52:35

26

4

我正在使用 ruby 来计算我拥有的一些内容的 Gunning Fog Index，我可以成功地实现此处描述的算法:

Gunning Fog Index

我正在使用以下方法来计算每个单词中的音节数:

Tokenizer = /([aeiouy]{1,3})/

def count_syllables(word)

  len = 0

  if word[-3..-1] == 'ing' then
    len += 1
    word = word[0...-3]
  end

  got = word.scan(Tokenizer)
  len += got.size()

  if got.size() > 1 and got[-1] == ['e'] and
      word[-1].chr() == 'e' and
      word[-2].chr() != 'l' then
    len -= 1
  end

  return len

end

它有时会将只有 2 个音节的单词识别为有 3 个音节。任何人都可以提供任何建议或知道更好的方法吗？

text = "The word logorrhoea is often used pejoratively to describe prose that is highly abstract and contains little concrete language. Since abstract writing is hard to visualize, it often seems as though it makes no sense and all the words are excessive. Writers in academic fields that concern themselves mostly with the abstract, such as philosophy and especially postmodernism, often fail to include extensive concrete examples of their ideas, and so a superficial examination of their work might lead one to believe that it is all nonsense."

# used to get rid of any puncuation
text = text.gsub!(/\W+/, ' ')

word_array = text.split(' ')

word_array.each do |word|
    puts word if count_syllables(word) > 2
end

“themselves”被算作 3 但它只有 2

最佳答案

我之前给你的功能是基于概述的这些简单规则 here :

Each vowel (a, e, i, o, u, y) in a word counts as one syllable subject to the following sub-rules:

Ignore final -ES, -ED, -E (except for -LE)

Words of three letters or less count as one syllable

Consecutive vowels count as one syllable.

代码如下:

def new_count(word)
  word.downcase!
  return 1 if word.length <= 3
  word.sub!(/(?:[^laeiouy]es|ed|[^laeiouy]e)$/, '')
  word.sub!(/^y/, '')
  word.scan(/[aeiouy]{1,2}/).size
end

显然，这也不是完美的，但您将获得的只是一种启发式方法。

编辑:

我稍微更改了代码以处理前导“y”并修复了正则表达式以更好地处理“les”结尾(例如在“candles”中)。

这是使用问题中的文本进行的比较:

# used to get rid of any puncuation
text = text.gsub!(/\W+/, ' ')

words = text.split(' ')

words.each do |word|
  old = count_syllables(word.dup)
  new = new_count(word.dup)
  puts "#{word}: \t#{old}\t#{new}" if old != new
end

输出是:

logorrhoea:     3   4
used:   2   1
makes:  2   1
themselves:     3   2

所以这似乎是一个改进。

关于Ruby，数音节，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/1271918/

26

4

0

文章推荐： ruby - "k.send :hello"- 如果 k 是 "receiver"，谁是发件人？

文章推荐： flutter - 如何在 Flutter 中自定义 Slider 小部件？

文章推荐： ruby - 可以在 Ruby 中重新发明数组吗？

python - 字符(音节)在枕头中未按正确的顺序呈现
我正在尝试在图像上打印缅甸字符(准确地说是Myanmar3.ttf)以进行数据生成-OCR。与其他语言不同，缅甸语中的单词/字符是使用音节构成的，并且音节有顺序。因此，缅甸语的 unicode 使用复
javascript - 在 JavaScript 中将 Hangul Jamo 转换为 Hangul 音节
我正在尝试以编程方式创建韩语句子，但要正确执行此操作意味着我需要一种方法来确定哪些 Hangul Jamo unicode 字符构成了每个 Hangul Syllable unicode 字符。更具体

首页

博学

6Ren·AI

商城

Ruby，数音节

编辑: