gpt4 book ai didi

python - 找到最短的子串,其替换使得字符串中每个字符的数量相等

转载 作者:行者123 更新时间:2023-12-01 21:17:38 26 4
gpt4 key购买 nike

我有一个长度为 n 的字符串由字母 A 组成, G , CT 。如果字符串包含相同数量的 A,则该字符串是稳定的。 , G , CT (每 n/4 次)。我需要找到替换后使其稳定的子字符串的最小长度。这是 link问题的完整描述。

假设s1=AAGAAGAA

n=8开始理想情况下它应该有 2 A s, 2 T s, 2 G s 和 2 C s。它有4个过多的A s。因此我们需要一个至少包含 4 A 的子字符串s。

我首先从左侧取一个 4 个字符的子字符串,如果没有找到,则增加一个变量 mnum (即查找 5 个可变子字符串等)。

我们得到AAGAA作为答案。 但是太慢了。

 from collections import Counter
import sys
n=int(input()) #length of string
s1=input()
s=Counter(s1)
le=int(n/4) #ideal length of each element
comp={'A':le,'G':le,'C':le,'T':le} #dictionary containing equal number of all elements
s.subtract(comp) #Finding by how much each element ('A','G'...) is in excess or loss
a=[]
b=[]
for x in s.values(): #storing frequency(s.values--[4,2]) of elements which are in excess
if(x>0):
a.append(x)
for x in s.keys(): #storing corresponding elements(s.keys--['A','G'])
if(s[x]>0):
b.append(x)
mnum=sum(a) #minimum substring length to start with
if(mnum==0):
print(0)
sys.exit
flag=0
while(mnum<=n): #(when length 4 substring with all the A's and G's is not found increasing to 5 and so on)
for i in range(n-mnum+1): #Finding substrings with length mnum in s1
for j in range(len(a)): #Checking if all of excess elements are present
if(s1[i:i+mnum].count(b[j])==a[j]):
flag=1
else:
flag=0

if(flag==1):
print(mnum)
sys.exit()
mnum+=1

最佳答案

最小子串可以在O(N)时间和O(N)空间中找到。

首先计算长度n的输入中每个字符的频率fr[i]。现在,最重要的是要认识到,子字符串被视为最小的充分必要条件是,它必须包含频率至少为 fr[i] - n/4 的每个多余字符。否则,将无法替换丢失的字符。因此,我们的任务是遍历每个这样的子字符串并选择长度最小的一个。

但是如何有效地找到所有这些呢?

开始时,minLengthn。我们引入 2 指针索引 - leftright(最初为 0),它们定义了 left 的子字符串 到原始字符串 str 中的 right。然后,我们增加 right 直到 str[left:right] 中每个多余字符的频率至少为 fr[i] - n/4。但这还不是全部,因为 str[left : right] 可能在左侧包含不必要的字符(例如,它们不是过多,因此可以删除)。因此,只要 str[left : right] 仍然包含足够的多余元素,我们就递增 left。完成后,如果 minLength 大于 right - left,我们将更新它。我们重复该过程,直到 right >= n

让我们考虑一个例子。让 GAAAAAAA 为输入字符串。那么,算法步骤如下:

1.统计每个字符出现的频率:

['G'] = 1, ['A'] = 6, ['T'] = 0, ['C'] = 0 ('A' is excessive here)

2.现在迭代原始字符串:

Step#1: |G|AAAAAAA
substr = 'G' - no excessive chars (left = 0, right = 0)
Step#2: |GA|AAAAAA
substr = 'GA' - 1 excessive char, we need 5 (left = 0, right = 1)
Step#3: |GAA|AAAAA
substr = 'GAA' - 2 excessive chars, we need 5 (left = 0, right = 2)
Step#4: |GAAA|AAAA
substr = 'GAAA' - 3 excessive chars, we need 5 (left = 0, right = 3)
Step#5: |GAAAA|AAA
substr = 'GAAAA' - 4 excessive chars, we need 5 (left = 0, right = 4)
Step#6: |GAAAAA|AA
substr = 'GAAAAA' - 5 excessive chars, nice but can we remove something from left? 'G' is not excessive anyways. (left = 0, right = 5)
Step#7: G|AAAAA|AA
substr = 'AAAAA' - 5 excessive chars, wow, it's smaller now. minLength = 5 (left = 1, right = 5)
Step#8: G|AAAAAA|A
substr = 'AAAAAA' - 6 excessive chars, nice, but can we reduce the substr? There's a redundant 'A'(left = 1, right = 6)
Step#9: GA|AAAAA|A
substr = 'AAAAA' - 5 excessive chars, nice, minLen = 5 (left = 2, right = 6)
Step#10: GA|AAAAAA|
substr = 'AAAAAA' - 6 excessive chars, nice, but can we reduce the substr? There's a redundant 'A'(left = 2, right = 7)
Step#11: GAA|AAAAA|
substr = 'AAAAA' - 5 excessive chars, nice, minLen = 5 (left = 3, right = 7)
Step#12: That's it as right >= 8

或者下面的完整代码:

from collections import Counter

n = int(input())
gene = raw_input()
char_counts = Counter()
for i in range(n):
char_counts[gene[i]] += 1

n_by_4 = n / 4
min_length = n
left = 0
right = 0

substring_counts = Counter()
while right < n:
substring_counts[gene[right]] += 1
right += 1

has_enough_excessive_chars = True
for ch in "ACTG":
diff = char_counts[ch] - n_by_4
# the char cannot be used to replace other items
if (diff > 0) and (substring_counts[ch] < diff):
has_enough_excessive_chars = False
break

if has_enough_excessive_chars:
while left < right and substring_counts[gene[left]] > (char_counts[gene[left]] - n_by_4):
substring_counts[gene[left]] -= 1
left += 1

min_length = min(min_length, right - left)

print (min_length)

关于python - 找到最短的子串,其替换使得字符串中每个字符的数量相等,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37579917/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com