在一个列表中,在每个项目(另一个或同一个)列表进行比较的每个项目的过程在数学上被称为Cartesian product。 Python有一个内置函数来做到这一点:itertools.product这相当于嵌套的for循环:
假设A和B是列表:
for x in A:
for y in B:
print (x,y)
可以写成一个generator expression为:
for pair in ((x,y) for x in A for y in B):
print pair
,或者更简洁:
from itertools import product
for pair in product(A, B):
print pair
在你的情况你将列表中的所有项目与自身进行比较,因此您可以编写product(texts, texts)
,但产品在此情况下具有可选的关键字参数repeat
:product(A, repeat=4)
的含义与product(A, A, A, A)
相同。
您现在可以重写代码是这样的:
from itertools import product
caesar = """BOOK I
I.--All Gaul is divided into three parts, one of which the Belgae
inhabit, the Aquitani another, those who in their own language are
called Celts, in ours Gauls, the third. All these differ from each other
in language, customs and laws."""
hamlet = """Who's there?"
"Nay, answer me. Stand and unfold yourself."
"Long live the King!"
"Barnardo!"
"He." (I.i.1-5)"""
macbeth = """ACT I SCENE I A desert place. Thunder and lightning.
[Thunder and lightning. Enter three Witches]
First Witch When shall we three meet again
In thunder, lightning, or in rain?
Second Witch When the hurlyburly's done,
When the battle's lost and won."""
texts = [caesar, hamlet, macbeth]
def similarity(x, y):
"""similarity based on length of the text,
substitute with similarity function from Natural Language Toolkit"""
return float(len(x))/len(y)
for pair in product(texts, repeat=2):
print "{}".format(similarity(*pair))
非常感谢。对此,我真的非常感激!我使用round(),因为我的similarity()函数输出一个浮点数。 – 2012-02-28 14:48:06
@Adam_G:我知道你为什么使用'round()',但如上所述,'round()'并不意味着用于输出格式。有关输出格式的更多信息,请参阅Python教程中的[Fancier输出格式化]一节(http://docs.python.org/tutorial/inputoutput.html#fancier-output-formatting),并参见[浮点运算:问题和局限性](http://docs.python.org/tutorial/floatingpoint.html)为什么使用'round()'来达到这个目的是个坏主意。 – 2012-02-28 14:55:46