gpt4 book ai didi

python - 查找公共(public)列表序列

转载 作者:太空狗 更新时间:2023-10-30 00:01:17 26 4
gpt4 key购买 nike

我有一本列表字典。每个列表都是一个数字序列。没有两个列表是相等的,但两个或更多列表可能以相同的数字序列开头(请参阅下面的示例输入)。我想做的是找到这些常见的序列,并使它们成为字典中的新元素。

示例输入:

sequences = {
18: [1, 3, 5, 6, 8, 12, 15, 17, 18],
19: [1, 3, 5, 6, 9, 13, 14, 16, 19],
25: [1, 3, 5, 6, 9, 13, 14, 20, 25],
11: [0, 2, 4, 7, 11],
20: [0, 2, 4, 10, 20],
26: [21, 23, 26],
}

示例输出:

expected_output = {
6: [1, 3, 5, 6],
18: [8, 12, 15, 17, 18],
14: [9, 13, 14],
19: [16, 19],
25: [20, 25],
4: [0, 2, 4],
11: [7, 11],
20: [10, 20],
26: [21, 23, 26],
}

每个列表的键是它的最后一个元素。顺序无关紧要。

我有一个工作代码。但是,它非常困惑。有人可以建议更简单/更清洁的解决方案吗?

from collections import Counter

def split_lists(sequences):
# get first elem from each sequence
firsts = list(map(lambda s: s[0], sequences))

# get non-duplicate first elements
not_duplicates = list(map(lambda c: c[0], filter(lambda c: c[1] == 1, Counter(firsts).items())))

# start the new_sequences with the non-duplicate lists
new_sequences = dict(map(lambda s: (s[-1], s), filter(lambda s: s[0] in not_duplicates, sequences)))

# get duplicate first elements
duplicates = list(map(lambda c: c[0], filter(lambda c: c[1] > 1, Counter(firsts).items())))
for duplicate in duplicates:
# get all lists that start with the duplicate element
duplicate_lists = list(filter(lambda s: s[0] == duplicate, sequences))

# get the common elements from the duplicate lists and make it a new
# list to add to our new_sequences dict
repeated_sequence = sorted(list(set.intersection(*list(map(set, duplicate_lists)))))
new_sequences[repeated_sequence[-1]] = repeated_sequence

# get lists from where I left of
i = len(repeated_sequence)
sub_lists = list(filter(lambda s: len(s) > 0, map(lambda s: s[i:], duplicate_lists)))

# recursively split them and store them in new_sequences
new_sequences.update(split_lists(sub_lists))

return new_sequences

另外,你能帮我弄清楚我的算法的复杂性吗?递归让我头晕。我最好的猜测是 O(n*m),其中 n 是列表的数量,m 是最长列表的长度。

最佳答案

将其分解为逻辑函数:

  • 找出哪些序列以相同元素开头
  • 找到共同的元素

同样的开始:

可以使用 defaultdict 轻松完成

from collections import defaultdict
def same_start(sequences):
same_start = defaultdict(list)
for seq in sequences:
same_start[seq[0]].append(seq)
return same_start.values()
list(same_start(sequences.values()))
[[[1, 3, 5, 6, 8, 12, 15, 17, 18],
[1, 3, 5, 6, 9, 13, 14, 16, 19],
[1, 3, 5, 6, 9, 13, 14, 20, 25]],
[[0, 2, 4, 7, 11], [0, 2, 4, 10, 20]],
[[21, 23, 26]]]

找到共同的元素:

一个简单的生成器,只要它们都相同就生成值

def get_beginning(sequences):
for values in zip(*sequences):
v0 = values[0]
if not all(i == v0 for i in values):
return
yield v0

聚合

def aggregate(same_start):
for seq in same_start:
if len(seq) < 2:
yield seq[0]
continue
start = list(get_beginning(seq))
yield start
yield from (i[len(start):] for i in seq)
list(aggregate(same_start(sequences.values())))
[[1, 3, 5, 6],
[8, 12, 15, 17, 18],
[9, 13, 14, 16, 19],
[9, 13, 14, 20, 25],
[0, 2, 4],
[7, 11],
[10, 20],
[21, 23, 26]]

进一步

如果你想组合序列 1825,那么你可以这样做

def combine(sequences):
while True:
s = same_start(sequences)
if all(len(i) == 1 for i in s):
return sequences
sequences = tuple(aggregate(s))
{i[-1]: i for i in combine(sequences.values())}
{4: [0, 2, 4],
6: [1, 3, 5, 6],
11: [7, 11],
14: [9, 13, 14],
18: [8, 12, 15, 17, 18],
19: [16, 19],
20: [10, 20],
25: [20, 25],
26: [21, 23, 26]}

关于python - 查找公共(public)列表序列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48881297/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com