产量也没关系使用yield语句的类的实例方法?例如,我可以从一个实例方法
# Similar to itertools.islice
class Nth(object):
def __init__(self, n):
self.n = n
self.i = 0
self.nout = 0
def itervalues(self, x):
for xi in x:
self.i += 1
if self.i == self.n:
self.i = 0
self.nout += 1
yield self.nout, xi
Python不会抱怨,简单的情况下似乎工作。但是,我只看到了使用常规函数的例子。
我开始有问题,当我尝试使用itertools函数使用它。例如,假设我有两个存储在多个文件中的大数据流X和Y,并且我想通过数据只计算一个循环的总和和差。我可以用itertools.tee
和itertools.izip
像下图中
在代码中它会是这样的(不好意思,这是长)
from itertools import izip_longest, izip, tee
import random
def add(x,y):
for xi,yi in izip(x,y):
yield xi + yi
def sub(x,y):
for xi,yi in izip(x,y):
yield xi - yi
class NthSumDiff(object):
def __init__(self, n):
self.nthsum = Nth(n)
self.nthdiff = Nth(n)
def itervalues(self, x, y):
xadd, xsub = tee(x)
yadd, ysub = tee(y)
gen_sum = self.nthsum.itervalues(add(xadd, yadd))
gen_diff = self.nthdiff.itervalues(sub(xsub, ysub))
# Have to use izip_longest here, but why?
#for (i,nthsum), (j,nthdiff) in izip_longest(gen_sum, gen_diff):
for (i,nthsum), (j,nthdiff) in izip(gen_sum, gen_diff):
assert i==j, "sum row %d != diff row %d" % (i,j)
yield nthsum, nthdiff
nskip = 12
ns = Nth(nskip)
nd = Nth(nskip)
nsd = NthSumDiff(nskip)
nfiles = 10
for i in range(nfiles):
# Generate some data.
# If the block length is a multiple of nskip there's no problem.
#n = random.randint(5000, 10000) * nskip
n = random.randint(50000, 100000)
print 'file %d n=%d' % (i, n)
x = range(n)
y = range(100,n+100)
# Independent processing is no problem but requires two loops.
for i, nthsum in ns.itervalues(add(x,y)):
pass
for j, nthdiff in nd.itervalues(sub(x,y)):
pass
assert i==j
# Trying to do both with one loops causes problems.
for nthsum, nthdiff in nsd.itervalues(x,y):
# If izip_longest is necessary, why don't I ever get a fillvalue?
assert nthsum is not None
assert nthdiff is not None
# After each block of data the two iterators should have the same state.
assert nsd.nthsum.nout == nsd.nthdiff.nout, \
"sum nout %d != diff nout %d" % (nsd.nthsum.nout, nsd.nthdiff.nout)
但这种失败,除非我换itertools.izip
出来即使迭代器具有相同的长度,也可以使用itertools.izip_longest
。这是最后assert
那被击中,具有输出像
file 0 n=58581
file 1 n=87978
Traceback (most recent call last):
File "test.py", line 71, in <module>
"sum nout %d != diff nout %d" % (nsd.nthsum.nout, nsd.nthdiff.nout)
AssertionError: sum nout 12213 != diff nout 12212
编辑:我想这是不是从我写的例子明显的,但输入数据X和Y仅在块可用的(在我的真正的问题他们在文件中分块)。这很重要,因为我需要维护块之间的状态。在上面的玩具例如,这意味着Nth
需要产生的
>>> x1 = range(0,10)
>>> x2 = range(10,20)
>>> (x1 + x2)[::3]
[0, 3, 6, 9, 12, 15, 18]
不是
>>> x1[::3] + x2[::3]
[0, 3, 6, 9, 10, 13, 16, 19]
相当于我可以用itertools.chain
提前加入的时间块,然后将相当于打一个电话,给Nth.itervalues
,但我想了解什么是错的,在调用之间的Nth
类保持状态(我真正的应用程序是一个涉及多个保存的状态,而不是简单的第N /加/减图像处理)。
我不明白我的Nth
情况下如何结束在不同的状态时,它们的长度是相同的。例如,如果我给相等长度
>>> [''.join(x) for x in izip('ABCD','abcd')]
['Aa', 'Bb', 'Cc', 'Dd']
我得到同样长度的结果的izip
两个字符串;为什么我的Nth.itervalues
发电机似乎得到数量不等的next()
调用,即使每一个产生相同数量的结果?
要回答标题问题:是的,从实例方法产生'yield'ing很好。它实际上是实现'__iter__'自定义'Iterable'类型的最简单的Pythonic方式。 – ShadowRanger
难道你不能用'def Nth(x,n):return enumerate(x [:: n])'替换'class Nth'吗?哦,还是你需要将'x'切片成为一个迭代器,出于性能原因? – Harvey
'def Nth(x,n):return enumerate(xi for i,xi in enumerate(x)if i%n == 0)' – Harvey