使用多个拆分选择文本

我已经开始学习python，并被困在有关操作文本数据的赋值上。文本行的一个例子，我需要处理：使用多个拆分选择文本

From [email protected] Sat Jan 5 09:14:16 2008

我需要从每一行提取小时（在这种情况下，09），然后找到最常见小时电子邮件发送的。

基本上，我需要做的是建立一个for循环，通过结肠

split(':')

将每个文本，然后通过空间分割：

split()

我试过好几个小时，但似乎无法弄清楚。我的代码看起来像至今：

name = raw_input("Enter file:") 
if len(name) < 1 : name = "mbox-short.txt" 
handle = open(name) 
counts = dict() 
lst = list() 
temp = list() 
for line in handle: 
    if not "From " in line: continue 
    words = line.split(':') 
    for word in words: 
     counts[word] = counts.get(word,0) + 1 

for key, val in counts.items(): 
    lst.append((val, key)) 
lst.sort(reverse = True) 

for val, key in lst: 
print key, val

上面的代码只做1分，但我一直尝试多种方法来再次分裂的文本。我不断收到一个列表属性错误，说“列表对象没有属性拆分”。希望对此有所帮助。再次感谢

来源

2016-09-26 Dick Thompson

'line.split（“：”）[0] .split（“”）[ - 1]'？ – L3viathan

通常，为了开发，尤其是共享代码，请将示例数据放入程序本身。然后其他人可以运行并修改你的代码。在这种情况下，'handle = <行列表>'只需几行。 FWIW，我相信@ L3viathan snippet会解决你的特殊问题。 –

感谢您的帮助！然而，由于某些原因，代码只输出一位数字，这使数字1和0在计数中显示很多（因为它们是第一位数字）。我如何计算两位数？我试图使它'line.split（“：”）[0] .split（“”）（0：2）'，但这给了一个错误 –

首先，

import re

然后更换

words = line.split(':') 
for word in words: 
    counts[word] = counts.get(word,0) + 1

通过

line = re.search("[0-9]{2}:[0-9]{2}:[0-9]{2}", line).group(0) 
words = line.split(':') 
hour = words[0] 
counts[hour] = counts.get(hour, 0) + 1

输入：

From [email protected] Sat Jan 5 09:14:16 2008 
From [email protected] Sat Jan 5 12:14:16 2008 
From [email protected] Sat Jan 5 09:14:16 2008 
From [email protected] Sat Jan 5 09:14:16 2008 
From [email protected] Sat Jan 5 15:14:16 2008 
From [email protected] Sat Jan 5 12:14:16 2008 
From [email protected] Sat Jan 5 09:14:16 2008 
From [email protected] Sat Jan 5 13:14:16 2008 
From [email protected] Sat Jan 5 12:14:16 2008

输出：

来源

2016-09-26 00:44:26

使用相同的测试文件作为马塞尔雅克马查多：

>>> from collections import Counter 
>>> Counter(line.split(' ')[-2].split(':')[0] for line in open('input')).items() 
[('12', 3), ('09', 4), ('15', 1), ('13', 1)]

这表明，虽然13只发生一次09发生的4倍。

如果我们想要更漂亮的输出，我们可以做一些格式化。这显示了从最常见到最不常见的小时和他们的计数：

>>> print('\n'.join('{} {}'.format(hh, n) for hh,n in Counter(line.split(' ')[-2].split(':')[0] for line in open('input')).most_common())) 
09 4 
12 3 
15 1 
13 1

来源

2016-09-26 00:57:16 John1024

使用多个拆分选择文本

回答

相关问题