找到列表的共同元素

考虑以下列表：

['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 'Moon', 'to', 'rise.', '']

我想指望有多少次出现每一个以大写字母开头的字，并显示前3名。

我不感兴趣的话做不以资本开始。

如果一个单词出现多次，有时以大写字母开头，有时不是，只计算它对大写字母所做的时间。

这是我的代码看起来像现在：

words = "" 
for word in open('novel.txt', 'rU'): 
     words += word 
words = words.split(' ') 
words= list(words) 
words = ('\n'.join(words)).split('\n') 

word_counter = {} 

for word in words: 

     if word in word_counter: 
      word_counter[word] += 1 
     else: 
      word_counter[word] = 1  
popular_words = sorted(word_counter, key = word_counter.get, reverse = True) 
top_3 = popular_words[:3] 

matches = [] 

for i in range(3): 

     print word_counter[top_3[i]], top_3[i]

来源

2010-08-29 user434180

为什么在使用计数器？（顺便说一句，请接受一个答案，如果这对你最有帮助的话）。 – kennytm 2010-08-29 12:41:42

这是功课吗？ – Johnsyweb 2010-08-29 21:44:37

如果从文件中读取单词，则此问题顶部的Python列表无关紧要。 – Johnsyweb 2010-08-29 21:45:48

一般来说，字[0] .isupper（）将电话你，如果一个词以大写字母开头。结合这到一个列表理解（或者你的循环）

[x for x in my_list if x[0].isupper()]

（假设没有空字符串）

，你会得到启动以大写字母开头的所有单词。

来源

2010-08-29 12:38:26

我不确定如何将其添加到我的程序中以使其正常工作 – user434180 2010-08-29 13:08:51

@ user434180：您尝试过什么？ – Johnsyweb 2010-08-29 23:37:16

#uncomment to produce the word file 
##words = ['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 'Moon', 'to', 'rise.', ''] 
##open('novel.txt','w').write('\n'.join(words)) 

import string 
cap_words = [word.strip(string.punctuation) for word in open('novel.txt').read().split() if word.istitle()] 
##print(cap_words) # debug 
try: 
    from collections import Counter # Python >= 2.7 
    print('Counter') 
    print(Counter(cap_words).most_common(3)) 
except ImportError: 
    print('Normal dict') 
    wordcount= dict() 
    for word in cap_words: 
     wordcount[word] = (wordcount[word] + 1 
          if word in wordcount 
          else 1) 
    print(sorted(wordcount.items(), key = lambda x: x[1], reverse = True)[:3])

我不明白你为什么想用'rU'模式保持不同种类的线路终端。正如我在上面编辑的代码中所写的那样，我通常会正常使用。编辑：你有话标点符号一起，所以清理那些带（）

来源

2010-08-29 12:53:25

当我尝试这个我得到的错误：回溯（最近最后调用最后）：文件“C：/用户/亚当/桌面/亚当的工作/ 2010年/ IST/python compt/f.py”，第1行，在从集合进口计数器 ImportError：无法导入名称计数器 – user434180 2010-08-29 12:59:32

如前所述，您需要python 2.7 for collections.counter工作 – 2010-08-29 13:29:34

这里有一些补充意见：

text = open('novel.txt', 'rU').read() # read everything 
wordlist = text.split() # split on all whitespace

：

words = "" 
for word in open('novel.txt', 'rU'): 
     words += word 
words = words.split(' ') 
words= list(words) 
words = ('\n'.join(words)).split('\n')

可以替换

但是你不用你的“必须以大写字母开头”的要求。及时补充：

capwordlist = (word for word in wordlist if word.istitle())

istitle()意味着word[0].isupper() and word[1:].islower()。这意味着'SO'.istitle() -> False。

这可能适合你，但也许你只是想word[0].isupper()来代替。

这部分是好的，如果你不能使用collections.Counter（new in 2。7）

word_counter = {} 

for word in capwordlist: 

     if word in word_counter: 
      word_counter[word] += 1 
     else: 
      word_counter[word] = 1  
popular_words = sorted(word_counter, key = word_counter.get, reverse = True) 
top_3 = popular_words[:3]

否则这简单地变为：

from collections import Counter 

word_counter = Counter(capwords) 
top_3 = word_counter.most_common(3) # gives `word, count` pairs!

这：

for i in range(3): 
     print word_counter[top_3[i]], top_3[i]

可以是这样的：

for word in top_3: 
    print word_counter[word], word

来源

2010-08-29 13:09:21

'istitle（）'很好，但'isupper（）似乎符合OP的要求。从上一个问题来看，似乎Python 2.6是所有可用的（因此不是Counter）。 – Johnsyweb 2010-08-29 21:43:53

硅NCE不使用Python2.7并没有Counter

from collections import defaultdict 
counter = defaultdict(int) 
words = ['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 'Moon', 'to', 'rise.', ''] 
for word in (word for word in words if word[0].isupper()): 
    counter[word]+=1 
print counter

来源

2010-08-29 13:15:34

print "\n".join(sorted(["%d %s" % (lst.count(i), i) \ 
      for i in set(lst) if i.istitle()])[-3:]) 
2 And 
5 Cats 
6 Jellicle

来源

2010-08-29 19:02:06 killown

有一件事我会避免在阅读完所有词语的前处理。它会工作，但恕我直言，最好不要这样做，如果你不需要，而你不这样做。这里是我的解决方案（从以前的慷慨被盗元素！），用做2.6.2：

import sys 

# a generator function which iterates over the words in a file 
def words(f): 
    for line in f: 
     for word in line.split(): 
      yield word 

# returns a generator expression filtering an iterator down to titlecase words 
def titles(s): 
    return (word for word in s if word.istitle()) 

# count the titlecase words in the file 
count = {} 
for word in titles(words(file(sys.argv[1]))): 
    count[word] = count.get(word, 0) + 1 

# build a list of tuples with the count for each word 
countsAndWords = [(kv[1], kv[0]) for kv in count.iteritems()] 

# put them in decreasing order 
countsAndWords.sort() 
countsAndWords.reverse() 

# print the top three 
for count, word in countsAndWords[:3]: 
    print word, count

我做了排序上的计数装饰排序，去除装饰，而不是做那种有比较这确实在计数字典中查找;它不太优雅，但我相信它会更快。这可能是一件罪恶的事情。

来源

2010-08-29 22:16:20

你可以使用itertools

import itertools 

words = ['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 'Moon', 'to', 'rise.', ''] 
capwords = (word for word in words if len(word) > 1 and word[0].isupper()) 
capwordssorted = sorted(capwords) 
wordswithcounts = ((k,len(list(g))) for (k,g) in itertools.groupby(capwordssorted)) 
print sorted(wordswithcounts,key=lambda x:x[1],reverse=True)[:3]

来源

2010-08-30 11:39:59 user196636

找到列表的共同元素

回答

相关问题