gpt4 book ai didi

python - 在线性时间内寻找包含特定字符的最短子串

转载 作者:塔克拉玛干 更新时间:2023-11-03 05:02:53 24 4
gpt4 key购买 nike

Goal: implementing an algorithm that, given strings a and b, returns the shortest substring of a containing all characters of b. The string b can contain duplicates.

算法基本上就是这个:
http://www.geeksforgeeks.org/find-the-smallest-window-in-a-string-containing-all-characters-of-another-string/

在链接的文章中,该算法只找到最短子串的长度,但这是一个很小的变化。

这是我的实现:

导入集合

def issubset(c1, c2):
'''Return True if c1 is a subset of c2, False otherwise.'''
return not c1 - (c1 & c2)


def min_idx(seq, target):
'''Least index of seq such that seq[idx] is contained in target.'''
for idx, elem in enumerate(seq):
if elem in target:
return idx


def minsub(a, b):
target_hist = collections.Counter(b)
current_hist = collections.Counter()
# Skip all the useless characters
idx = min_idx(a, target_hist)
if idx is None:
return []
a = a[idx:]
# Build a base substring
i = iter(a)
current = []
while not issubset(target_hist, current_hist):
t = next(i)
current.append(t)
current_hist[t] += 1
minlen = len(current)
shortest = current
for t in i:
current.append(t)
# Shorten the substring from the front as much as possible
if t == current[0]:
idx = min_idx(current[1:], target_hist) + 1
current = current[idx:]
if len(current) < minlen:
minlen = len(current)
shortest = current
return current

不幸的是,它不起作用。例如,

>>> minsub('this is a test string', 'tist')
['s', ' ', 'i', 's', ' ', 'a', ' ', 't', 'e', 's', 't', ' ', 's', 't', 'r', 'i', 'n', 'g'

我错过了什么?
旁注:我不太确定我的实现是 O(n),但这是一个不同的问题。至于现在,我正在寻求修复我的实现。

编辑: 看似有效的解决方案:

import collections


def issubset(c1, c2):
'''Return True if c1 is a subset of c2, False otherwise.'''
return not c1 - (c1 & c2)


def min_idx(seq, target):
'''Least index of seq such that seq[idx] is contained in target.'''
for idx, elem in enumerate(seq):
if elem in target:
return idx


def minsub(a, b):
target_hist = collections.Counter(b)
current_hist = collections.Counter()
# Skip all the useless characters
idx = min_idx(a, target_hist)
if idx is None:
return []
a = a[idx:]
# Build a base substring
i = iter(a)
current = []
while not issubset(target_hist, current_hist):
t = next(i)
current.append(t)
current_hist[t] += 1
minlen = len(current)
shortest = current[:]
for t in i:
current.append(t)
# Shorten the substring from the front as much as possible
if t == current[0]:
current_hist = collections.Counter(current)
for idx, elem in enumerate(current[1:], 1):
if not current_hist[elem] - target_hist[elem]:
break
current_hist[elem] -= 1
current = current[idx:]
if len(current) < minlen:
minlen = len(current)
shortest = current[:]
return shortest

最佳答案

问题出在这一步,当我们向 current 添加一个字符并且它匹配第一个字符时:

remove the leftmost character and all other extra characters after left most character.

idx的这个值

            idx = min_idx(current[1:], target_hist) + 1

有时低于预期:只要 current_histtarget_hist 的子集,idx 就应该增加。因此,我们需要使 current_hist 保持最新,以便为 idx 计算正确的值。此外,minsub 应该返回 shortest 而不是 current

def minsub(a, b):
target_hist = collections.Counter(b)
current_hist = collections.Counter()
# Skip all the useless characters
idx = min_idx(a, target_hist)
if idx is None:
return []
a = a[idx:]
# Build a base substring
i = iter(a)
current = []
while not issubset(target_hist, current_hist):
t = next(i)
current.append(t)
if t in target_hist:
current_hist[t] += 1
minlen = len(current)
shortest = current
#current = []
for t in i:
current.append(t)
current_hist[t] += 1
# Shorten the substring from the front as much as possible
if t == current[0]:
#idx = min_idx(current[1:], target_hist) + 1
idx = 0
while issubset(target_hist, current_hist):
u = current[idx]
current_hist[u] -= 1
idx += 1
idx -= 1
u = current[idx]
current_hist[u] += 1
current = current[idx:]
if len(current) < minlen:
minlen = len(current)
shortest = current[:]
return shortest
In [9]: minsub('this is a test string', 'tist')
Out[9]: ['t', ' ', 's', 't', 'r', 'i']

关于python - 在线性时间内寻找包含特定字符的最短子串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31608133/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com