0
我有这段代码。如何增加元组值并在python循环中搜索字符串
arfffile = []
inputed = raw_input("Enter Evaluation for name including file extension...")
reader = open(inputed, 'r')
verses = []
for line in reader:
verses.append(line)
for line in verses:
if line.split('@') == "@":
verses.pop(line)
numclusters = int(raw_input("Enter the number of clusters"))
clusters = {}
for i in range(1,numclusters+1):
clusters["cluster"+str(i)] = 0
print clusters
# If verse belongs to a cluster, increment the cluster count by one in the dictionary value.
for verse in verses:
for k in clusters:
if k in verse:
clusters[k] += 1
else:
print "not in"
print clusters
yeslist = []
for verse in verses:
for k in clusters:
if k not in yeslist:
yeslist.append((k,0))
elif k in yeslist:
print "already in" + k
for verse in verses:
for k in clusters:
if k in verse and "Yes" in verse:
yeslist.append(yeslist.index(k), +1)
# iterate through dictionary and iterate through the lines
# need to read in file line by line,
# if "yes" and cluster x increment cluster
# need to work out percentage of possitive verses in each cluster.
的ARFF文件的一个例子是
@relation tester999.arff_clustered
@attribute Instance_number numeric
@attribute allah numeric
@attribute day numeric
@attribute lord numeric
@attribute people numeric
@attribute earth numeric
@attribute men numeric
@attribute truth numeric
@attribute verily numeric
@attribute chapter numeric
@attribute verse numeric
@attribute CLASS {Yes,No}
@attribute Cluster {cluster1,cluster2,cluster3}
@data
0,1,0,0,0,0,0,0,0,1,1,No,cluster3
1,1,0,0,0,0,0,0,0,1,2,No,cluster3
2,0,0,0,0,0,0,0,0,1,3,No,cluster3
3,0,1,0,0,0,1,0,0,1,4,No,cluster3
4,0,0,0,0,0,0,0,0,1,5,No,cluster3
5,0,0,0,0,0,0,0,0,1,6,No,cluster3
6,0,0,0,0,0,0,0,0,1,7,No,cluster3
7,0,0,0,0,0,0,0,0,2,1,No,cluster3
8,1,0,0,0,0,0,0,0,2,2,No,cluster3
9,0,0,0,0,0,0,0,0,2,3,No,cluster3
10,0,0,0,0,0,0,0,0,2,4,No,cluster3
11,0,0,1,0,0,0,0,0,2,5,No,cluster2
既然这样的程序读取中的数据线,例如
0,1,0,0,0,0,0,0,0,1,1,No,cluster3
和我已经建立,其检测多少簇的字典在数据文件中。在这个例子中有3. cluster1 cluster2和cluster3。然后代码将每个群集附加为字典“群集”中表示为字符串的键值。然后,我遍历所有经文并对每行进行计数,以查看它属于哪个群集。
我的下一步是尝试对每个群集计数其中出现“是”的行的次数。所以说数据中每行有10行字符串为“是”,代码应该能够计算出现的次数。
到目前为止,我已经做了代码是在这里
for verse in verses:
for k in clusters:
if k in verse and "Yes" in verse:
yeslist.append(yeslist.index(k), +1)
我真的basicaly创建的元组称为 “yeslist” 与价值观像这样的列表[(cluster1中,0),(Cluster2中,3)]
因此,对于每一行(表示为一个字符串),检查其中是否存在“是”,如果检查它属于哪个集群,则将该元组值加1。
我很难想出如何做到这一点的逻辑...任何人都可以帮忙吗?
谢谢。
和问题的短变体是什么? – 2011-04-11 17:45:58
我很确定元组是不可变的。 – DTing 2011-04-11 18:22:43