2010-08-29 73 views
1

嗨根据earlier post找到列表的共同元素

考虑以下列表:

['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 'Moon', 'to', 'rise.', '']

我想指望有多少次出现每一个以大写字母开头的字,并显示前3名。

我不感兴趣的话做不以资本开始。

如果一个单词出现多次,有时以大写字母开头,有时不是,只计算它对大写字母所做的时间。

这是我的代码看起来像现在:

words = "" 
for word in open('novel.txt', 'rU'): 
     words += word 
words = words.split(' ') 
words= list(words) 
words = ('\n'.join(words)).split('\n') 

word_counter = {} 

for word in words: 

     if word in word_counter: 
      word_counter[word] += 1 
     else: 
      word_counter[word] = 1  
popular_words = sorted(word_counter, key = word_counter.get, reverse = True) 
top_3 = popular_words[:3] 

matches = [] 

for i in range(3): 

     print word_counter[top_3[i]], top_3[i] 
+0

为什么在使用计数器? (顺便说一句,请接受一个答案,如果这对你最有帮助的话)。 – kennytm 2010-08-29 12:41:42

+1

这是功课吗? – Johnsyweb 2010-08-29 21:44:37

+0

如果从文件中读取单词,则此问题顶部的Python列表无关紧要。 – Johnsyweb 2010-08-29 21:45:48

回答

1

一般来说,字[0] .isupper()将电话你,如果一个词以大写字母开头。结合这到一个列表理解(或者你的循环)

[x for x in my_list if x[0].isupper()] 

(假设没有空字符串)

,你会得到启动以大写字母开头的所有单词。

+0

我不确定如何将其添加到我的程序中以使其正常工作 – user434180 2010-08-29 13:08:51

+0

@ user434180:您尝试过什么? – Johnsyweb 2010-08-29 23:37:16

7
#uncomment to produce the word file 
##words = ['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 'Moon', 'to', 'rise.', ''] 
##open('novel.txt','w').write('\n'.join(words)) 

import string 
cap_words = [word.strip(string.punctuation) for word in open('novel.txt').read().split() if word.istitle()] 
##print(cap_words) # debug 
try: 
    from collections import Counter # Python >= 2.7 
    print('Counter') 
    print(Counter(cap_words).most_common(3)) 
except ImportError: 
    print('Normal dict') 
    wordcount= dict() 
    for word in cap_words: 
     wordcount[word] = (wordcount[word] + 1 
          if word in wordcount 
          else 1) 
    print(sorted(wordcount.items(), key = lambda x: x[1], reverse = True)[:3]) 

我不明白你为什么想用'rU'模式保持不同种类的线路终端。正如我在上面编辑的代码中所写的那样,我通常会正常使用。 编辑:你有话标点符号一起,所以清理那些带()

+0

当我尝试这个我得到的错误: 回溯(最近最后调用最后): 文件“C:/用户/亚当/桌面/亚当的工作/ 2010年/ IST/python compt/f.py”,第1行,在 从集合进口计数器 ImportError:无法导入名称计数器 – user434180 2010-08-29 12:59:32

+2

如前所述,您需要python 2.7 for collections.counter工作 – 2010-08-29 13:29:34

2

这里有一些补充意见:

text = open('novel.txt', 'rU').read() # read everything 
wordlist = text.split() # split on all whitespace 


words = "" 
for word in open('novel.txt', 'rU'): 
     words += word 
words = words.split(' ') 
words= list(words) 
words = ('\n'.join(words)).split('\n') 

可以替换

但是你不用你的“必须以大写字母开头”的要求。及时补充:

capwordlist = (word for word in wordlist if word.istitle()) 

istitle()意味着word[0].isupper() and word[1:].islower()。这意味着'SO'.istitle() -> False

这可能适合你,但也许你只是想word[0].isupper()来代替。


这部分是好的,如果你不能使用collections.Counter(new in 2。7)

word_counter = {} 

for word in capwordlist: 

     if word in word_counter: 
      word_counter[word] += 1 
     else: 
      word_counter[word] = 1  
popular_words = sorted(word_counter, key = word_counter.get, reverse = True) 
top_3 = popular_words[:3] 

否则这简单地变为:

from collections import Counter 

word_counter = Counter(capwords) 
top_3 = word_counter.most_common(3) # gives `word, count` pairs! 

这:

for i in range(3): 
     print word_counter[top_3[i]], top_3[i] 

可以是这样的:

for word in top_3: 
    print word_counter[word], word 
+0

'istitle()'很好,但'isupper()似乎符合OP的要求。从上一个问题来看,似乎Python 2.6是所有可用的(因此不是Counter)。 – Johnsyweb 2010-08-29 21:43:53

0

硅NCE不使用Python2.7并没有Counter

from collections import defaultdict 
counter = defaultdict(int) 
words = ['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 'Moon', 'to', 'rise.', ''] 
for word in (word for word in words if word[0].isupper()): 
    counter[word]+=1 
print counter 
3
print "\n".join(sorted(["%d %s" % (lst.count(i), i) \ 
      for i in set(lst) if i.istitle()])[-3:]) 
2 And 
5 Cats 
6 Jellicle 
2

有一件事我会避免在阅读完所有词语的前处理。它会工作,但恕我直言,最好不要这样做,如果你不需要,而你不这样做。这里是我的解决方案(从以前的慷慨被盗元素!),用做2.6.2:

import sys 

# a generator function which iterates over the words in a file 
def words(f): 
    for line in f: 
     for word in line.split(): 
      yield word 

# returns a generator expression filtering an iterator down to titlecase words 
def titles(s): 
    return (word for word in s if word.istitle()) 

# count the titlecase words in the file 
count = {} 
for word in titles(words(file(sys.argv[1]))): 
    count[word] = count.get(word, 0) + 1 

# build a list of tuples with the count for each word 
countsAndWords = [(kv[1], kv[0]) for kv in count.iteritems()] 

# put them in decreasing order 
countsAndWords.sort() 
countsAndWords.reverse() 

# print the top three 
for count, word in countsAndWords[:3]: 
    print word, count 

我做了排序上的计数装饰排序,去除装饰,而不是做那种有比较这确实在计数字典中查找;它不太优雅,但我相信它会更快。这可能是一件罪恶的事情。

0

你可以使用itertools

import itertools 

words = ['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 'Moon', 'to', 'rise.', ''] 
capwords = (word for word in words if len(word) > 1 and word[0].isupper()) 
capwordssorted = sorted(capwords) 
wordswithcounts = ((k,len(list(g))) for (k,g) in itertools.groupby(capwordssorted)) 
print sorted(wordswithcounts,key=lambda x:x[1],reverse=True)[:3]