gpt4 book ai didi

python - 在条件下延迟拆分 Iterable

转载 作者:行者123 更新时间:2023-12-04 08:24:43 25 4
gpt4 key购买 nike

我在尝试创建一个函数时遇到问题,该函数在给定条件的情况下将可迭代对象懒惰地分离为子可迭代对象。
这是我到目前为止所拥有的

def split_on_condition(
seq: t.Iterable[T], condition: t.Callable[[T], bool]
) -> t.Iterable[t.Iterable[T]]:
curr = []
for i in seq:
if condition(i):
curr.append(i)
else:
yield curr
curr = []
yield curr


foo = ('a', 'b', 'c', '', '...rest')
print(list(split_on_condition(foo, bool)))

# [['a', 'b', 'c'], ['...rest']]
所以这是有效的,这很好;但问题是 curr是一个列表,而我希望它是惰性求值,以便我可以使用任意大的序列作为输入。
from itertools import repeat

for _ in split_on_condition(repeat('oops, infinite loop'), bool):
pass
# Program will crash here :(

最佳答案

对于实现 __getitem__ 的可下标迭代, 保存范围然后返回一个迭代给定范围的生成器。

def split_on_condition_gen(seq, condition):
def inner_gen(range_: range):
for n in range_:
yield seq[n]

last = 0
current = 0

for item in seq:
if condition(item):
current += 1
else:
yield inner_gen(range(last, current))
last = current
当它检查直到条件中断时,它并不是完全懒惰,但至少它可以处理大量输入,因为它不会消耗额外的内存,直到您真正需要它们为止。
>>> from this import s as seq_
~~ ZEN OF PYTHON GOES HERE ~~

>>> condition = lambda x: x != ','
>>> output = split_on_condition_gen(seq_, condition)
>>> output
<generator object split_on_condition_gen at 0x000002602C1ECEB0>

>>> list_output = list(output)
>>> list_output[2]
<generator object split_on_condition_gen.<locals>.inner_gen at 0x000002602C1ECEB0>

>>> list(list_output[2])
['l', ',', ' ', 'e', 'r', 's', 'h', 'f', 'r', ' ', ...]

但是,对于没有 __getitem__ 的迭代实现 - 对于这样的人:
class NonIndexAble:
def __init__(self, initial: str):
self.source = initial
self.list_ = list(initial)

def __iter__(self):
return self

def __next__(self):
try:
return self.list_.pop(0)
except IndexError:
raise StopIteration()
会引发这样的错误:
>>> non_index_able = NonIndexAble('aaabaaaaba')
>>> condition = lambda x: x == "b"

>>> for part in split_on_condition_gen(non_index_able, condition):
... for item in part:
... pass

Traceback (most recent call last):
File "<input>", line 2, in <module>
File "<input>", line 11, in inner_gen
TypeError: 'NonIndexAble' object is not subscriptable
首先我认为我们可以使用 itertools.islice ,但发现它消耗了传递的迭代器,如下所示:
except StopIteration:
# Consume to *stop*.
for i, element in zip(range(i + 1, stop), iterable):
pass
相反,因为这个生成器的大多数用例都是嵌套的 for循环,它将按他们创建的顺序运行内部生成器。
在这种情况下,我们可以只有单个生成器实例并具有 counter “光标”位置的变量。然后我们可以通过从 counter 跳过来获取值至 start并从 start 获得 yield 至 end索引并保持生成器原样,不像 islice 那样完全消耗这样做下一个“块”可以继续使用相同的生成器实例。
counter = 0

def inner_gen(seq_, start, end):
nonlocal counter

for _ in range(counter, start): # Skip from counter to start
next(seq_)

for _ in range(start, end): # yield from start to end
yield next(seq_)

counter = end
全面实现:
def split_on_condition_gen(seq, condition):
seq_copy, seq_current = tee(seq)
counter = 0

def inner_gen(seq_, start, end):
nonlocal counter

for _ in range(counter, start): # Skip from counter to start
next(seq_)

for _ in range(start, end): # yield from start to end
yield next(seq_)

counter = end

last = 0

for idx, item in enumerate(seq_current):
if condition(item):
continue

yield inner_gen(seq_copy, last, idx)
last = idx + 1

yield inner_gen(seq_copy, last, idx + 1)
测试:
>>> non_index_able = NonIndexAble('asdf adsfa ds sdfdf adf')
>>> condition = lambda x: x != ' '

>>> for part in split_on_condition_gen(non_index_able, condition):
... print("Part start")
... for item in part:
... print(item, end=' ')
... print("\nEnd")
Part start
a s d f
End
Part start
a d s f a
End
Part start
d s
End
Part start
s d f d f
End
Part start
a d f
End

关于python - 在条件下延迟拆分 Iterable,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65324776/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com