gpt4 book ai didi

ruby - 如何编写一个方法来计算 ruby​​ 中字符串中最常见的子字符串?

转载 作者:太空宇宙 更新时间:2023-11-03 17:21:10 24 4
gpt4 key购买 nike

我的程序具有类 DNA。该程序计算字符串中出现频率最高的 k-mer。因此,它正在寻找长度为 k 的字符串中最常见的子字符串。

一个例子是创建一个带有 AACCAATCCG 字符串的 dna1 对象。 count k-mer 方法将查找长度为 k 的子串并输出最常见的答案。所以,如果我们设置 k = 1 那么 'A' 和 'C' 将是字符串中出现次数最多的,因为它出现了四次。请参见下面的示例:

 dna1 = DNA.new('AACCAATCCG')
=> AACCAATCCG
>> dna1.count_kmer(1)
=> [#<Set: {"A", "C"}>, 4]
>> dna1.count_kmer(2)
=> [#<Set: {"AA", "CC"}>, 2]

这是我的 DNA 类:

   class DNA
def initialize (nucleotide)
@nucleotide = nucleotide
end
def length
@nucleotide.length
end
protected
attr_reader :nucleotide
end

这是我尝试实现的 count kmer 方法:

# I have k as my only parameter because I want to pass the nucleotide string in the method
def count_kmer(k)

# I created an array as it seems like a good way to split up the nucleotide string.
counts = []

#this tries to count how many kmers of length k there are
num_kmers = self.nucleotide.length- k + 1

#this should try and look over the kmer start positions
for i in num_kmers

#Slice the string, so that way we can get the kmer
kmer = self.nucleotide.split('')
end

#add kmer if its not present
if !kmer = counts
counts[kmer] = 0

#increment the count for kmer
counts[kmer] +=1
end

#return the final count
return counts
end

#end dna class
end

我不确定我的方法哪里出了问题。

最佳答案

是这样的吗?

  require 'set'

def count_kmer(k)
max_kmers = kmers(k)
.each_with_object(Hash.new(0)) { |value, count| count[value] += 1 }
.group_by { |_,v| v }
.max
[Set.new(max_kmers[1].map { |e| e[0] }), max_kmers[0]]
end

def kmers(k)
nucleotide.chars.each_cons(k).map(&:join)
end

编辑:这是类(class)的全文:

require 'set'

class DNA
def initialize (nucleotide)
@nucleotide = nucleotide
end

def length
@nucleotide.length
end

def count_kmer(k)
max_kmers = kmers(k)
.each_with_object(Hash.new(0)) { |value, count| count[value] += 1 }
.group_by { |_,v| v }
.max
[Set.new(max_kmers[1].map { |e| e[0] }), max_kmers[0]]
end

def kmers(k)
nucleotide.chars.each_cons(k).map(&:join)
end

protected
attr_reader :nucleotide
end

这会产生以下输出,使用 Ruby 2.2.1,使用您指定的类和方法:

>> dna1 = DNA.new('AACCAATCCG')
=> #<DNA:0x007fe15205bc30 @nucleotide="AACCAATCCG">
>> dna1.count_kmer(1)
=> [#<Set: {"A", "C"}>, 4]
>> dna1.count_kmer(2)
=> [#<Set: {"AA", "CC"}>, 2]

作为奖励,您还可以:

>> dna1.kmers(2)
=> ["AA", "AC", "CC", "CA", "AA", "AT", "TC", "CC", "CG"]

关于ruby - 如何编写一个方法来计算 ruby​​ 中字符串中最常见的子字符串?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40412559/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com