2015-02-24 124 views
-3

试图从在Python中设置的列表中删除重复的用户。问题是,它不删除重复的用户:使用set()从列表中删除重复的用户

with open ('live.txt') as file: 
     for line in file.readlines(): 
       word = line.split() 
       users = (word[word.index('user')+1]) 
         l = users.split() 
         l = set(l) 
         l = sorted(l) 
         print " ".join(l) 

这里是live.txt内容:

Sep 15 04:34:24 li146-252 sshd[13320]: Failed password for invalid user ronda from 212.58.111.170 port 42201 ssh2 
Sep 15 04:34:26 li146-252 sshd[13322]: Failed password for invalid user ronda from 212.58.111.170 port 42330 ssh2 
Sep 15 04:34:28 li146-252 sshd[13324]: Failed password for invalid user ronda from 212.58.111.170 port 42454 ssh2 
Sep 15 04:34:31 li146-252 sshd[13326]: Failed password for invalid user ronda from 212.58.111.170 port 42579 ssh2 
Sep 15 04:34:33 li146-252 sshd[13328]: Failed password for invalid user romero from 212.58.111.170 port 42715 ssh2 
Sep 15 04:34:36 li146-252 sshd[13330]: Failed password for invalid user romero from 212.58.111.170 port 42838 ssh2 
+1

这应该一次性activity.There不应该需要一个循环 – vks 2015-02-24 08:47:42

+0

为'users'请新增样本值! – 2015-02-24 08:47:43

+1

你介意在这里添加你的用户吗?预计输出什么 – 2015-02-24 08:52:39

回答

1

这里是代码,你想:

with open ('live.txt') as file: 
    users = [] 
    for line in file.readlines(): 
     word = line.split() 
     users.append(word[word.index('user') + 1]) 
    unique_users = list(set(users)) 
print " ".join(unique_users) 

输出:

romero ronda 
+0

如果您想使用词典来计算用户出现次数,它将会如何? – user3270211 2015-02-24 09:42:36

+0

@ user3270211:请不要在file.readlines()中使用'for line,而是使用'for line in file'。顺便说一下,“单词”有误导性 - 它应该是“单词”。你不需要在这里调用list()。 – jfs 2015-02-24 10:04:56

+0

@ user3270211只要用户在字典中,就可以进行控制。如果他们不在字典中,请为[dict] [用户]添加值1。如果它们已经在词典中,请将该值更改为dict [user] + 1。 – Noyan 2015-02-24 12:14:59

1

你可以尝试一个更简单的方式

list(set(<Your user list>)) 

这将返回列表中没有重复。 Python的数据类型为set,它是唯一元素的集合。因此,只要通过类型转换你的listset会自动删除重复的

例子:

>>> users = ['john', 'mike', 'ross', 'john','obama','mike'] 
>>> list(set(users)) 
['mike', 'john', 'obama', 'ross'] 
>>> 

我希望这将解决您的问题:

import re 
def remove_me(): 
    all_users = [] 
    with open ('live.txt') as file: 
     for line in file.readlines(): 
      pattern = re.compile('(.*user\s*)([a-zA-Z0-9]*)') 
      stmt = pattern.match(line) 
      all_users.append(stmt.groups()[1]) 
    unique_users = list(set(all_users)) 
    print unique_users 

if __name__ == "__main__": 
    remove_me() 
+0

这就是我得到的回报:['a','e','k','m','3','p','s','t'] ['a', 'e','k','m','3','p','s','t'] ['a','e','k','m','3', '','s','t'] ['a','e','k','m','3','p','s','t'] ['a ','e','k','m','3','p','s','t'] – user3270211 2015-02-24 08:49:49

+0

@ user3270211好悲痛,_say_你的输入数据是什么! – 2015-02-24 08:51:24

+0

我忘了提及每个用户都在自己的列表中。 – user3270211 2015-02-24 09:01:43

0

如果重复的用户行是连续的;你可以使用itertools.groupby()删除重复:

#!/usr/bin/env python 
from itertools import groupby 
from operator import itemgetter 

def extract_user(line): 
    return line.partition('user')[2].partition('from')[0].strip() 

with open('live.txt') as file: 
    print(" ".join(map(itemgetter(0), groupby(file, key=extract_user)))) 
    # -> ronda romero