给定:n 个迭代器,以及一个为每个迭代器获取一个项的键的函数
假设:
- 迭代器提供按键排序的项目
- 来自任何迭代器的键都是唯一的
我想遍历由键连接的它们。例如,给定以下 2 个列表:
[('a', {type:'x', mtime:Datetime()}), ('b', {type='y', mtime:Datetime()})]
[('b', Datetime()), ('c', Datetime())]
以每个元组中的第一项为键,我想得到:
(('a', {type:'x', mtime:Datetime()}), None)
(('b', {type:'y', mtime:Datetime()}), ('b', Datetime()),)
(None, ('c', Datetime()),)
所以我破解了这个方法:
def iter_join(*iterables_and_key_funcs):
iterables_len = len(iterables_and_key_funcs)
keys_funcs = tuple(key_func for iterable, key_func in iterables_and_key_funcs)
iters = tuple(iterable.__iter__() for iterable, key_func in iterables_and_key_funcs)
current_values = [None] * iterables_len
current_keys= [None] * iterables_len
iters_stoped = [False] * iterables_len
def get_nexts(iters_needing_fetch):
for i, fetch in enumerate(iters_needing_fetch):
if fetch and not iters_stoped[i]:
try:
current_values[i] = iters[i].next()
current_keys[i] = keys_funcs[i](current_values[i])
except StopIteration:
iters_stoped[i] = True
current_values[i] = None
current_keys[i] = None
get_nexts([True] * iterables_len)
while not all(iters_stoped):
min_key = min(key
for key, iter_stoped in zip(current_keys, iters_stoped)
if not iter_stoped)
keys_equal_to_min = tuple(key == min_key for key in current_keys)
yield tuple(value if key_eq_min else None
for key_eq_min, value in zip(keys_equal_to_min, current_values))
get_nexts(keys_equal_to_min)
并测试它:
key_is_value = lambda v: v
a = ( 2, 3, 4, )
b = (1, )
c = ( 5,)
d = (1, 3, 5,)
l = list(iter_join(
(a, key_is_value),
(b, key_is_value),
(c, key_is_value),
(d, key_is_value),
))
import pprint; pprint.pprint(l)
哪些输出:
[(None, 1, None, 1),
(2, None, None, None),
(3, None, None, 3),
(4, None, None, None),
(None, None, 5, 5)]
是否有现成的方法来做到这一点?我检查了 itertools,但找不到任何东西。
有什么方法可以改进我的方法吗?让它更简单、更快等。
更新:使用的解决方案
我决定通过要求迭代器生成 tuple(key, value) 或 tuple(key, *values) 来简化此函数的契约。以 agf 的回答为起点,我想到了这个:
def join_items(*iterables):
iters = tuple(iter(iterable) for iterable in iterables)
current_items = [next(itr, None) for itr in iters]
while True:
try:
key = min(item[0] for item in current_items if item != None)
except ValueError:
break
yield tuple(item if item != None and item[0]==key else None
for item in current_items)
for i, (item, itr) in enumerate(zip(current_items, iters)):
if item != None and item[0] == key:
current_items[i] = next(itr, None)
a = ( (2,), (3,), (4,), )
b = ((1,), )
c = ( (5,),)
d = ((1,), (3,), (5,),)
e = ( )
import pprint; pprint.pprint(list(join_items(a, b, c, d, e)))
[(None, (1,), None, (1,), None),
((2,), None, None, None, None),
((3,), None, None, (3,), None),
((4,), None, None, None, None),
(None, None, (5,), (5,), None)]
问题开头的示例与末尾的示例不同。
对于第一个例子,我会这样做:
x = [('a', {}), ('b', {})]
y = [('b', {}), ('c', {})]
xd, yd = dict(x), dict(y)
combined = []
for k in sorted(set(xd.keys()+yd.keys())):
row = []
for d in (xd, yd):
row.append((k, d[k]) if k in d else None)
combined.append(tuple(row))
for row in combined:
print row
给予
(('a', {}), None)
(('b', {}), ('b', {}))
(None, ('c', {}))
对于第二个例子
a = ( 2, 3, 4, )
b = (1, )
c = ( 5,)
d = (1, 3, 5,)
abcd = map(set, [a,b,c,d])
values = sorted(set(a+b+c+d))
print [tuple(v if v in row else None for row in abcd) for v in values]
给予
[(None, 1, None, 1),
(2, None, None, None),
(3, None, None, 3),
(4, None, None, None),
(None, None, 5, 5)]
但是你想要完成什么?也许您需要不同的数据结构。
我是一名优秀的程序员,十分优秀!