使用python mapreduce识别虚假警报

有人可以帮我解决以下问题。我正在尝试分析安全日志以发现虚假警报。错误警报是包含“未创建TXT”的错误警报，并且“txt未创建”时为true。如何从数据源中提取特定的“未创建的txt”（下面给出的示例输入数据）。使用python mapreduce识别虚假警报

from mrjob.job import MRJob 

class MRWordFrequencyCount(MRJob): 

def mapper(self, _, line): 
    words = line.split() 
    for word in words: 
     word = unicode(word, "utf-8", errors="ignore") 
     yield word, 1 

def reducer(self, key, values): 
    yield key, sum(values) 

if __name__ == '__main__': 
    MRWordFrequencyCount.run()

样本输入在这里给出：

Mon Feb 1 12:13:59 EST 2016 virtual user etransactiondev started to upload file 
/export/home/pub/etransactiondev/uploads/etransactionenvironment/ABC/rrd/in/WCWT.SMR.XYZ0002.PLSE.INPUT01.LFEP_APOL_D_M_20160201171358.TXT 
/export/home/pub/etransactiondev/uploads/etransactionenvironment/ABC/rrd/in/WCWT.SMR.XYZ0002.PLSE.INPUT01.LFEP_APOL_D_M_20160201171358.txt was not created

来源

2016-04-24 Shiv

>“TXT未创建”，并且“txt未创建”时为true。有没有错误或差异真的只是'TXT'和'TXT'这两个字的情况？ – DAXaholic

你能只检查的第一个字？

word = word.split(' ') 
if word[0] == 'TXT': 
    do something...

来源

2016-04-28 06:20:38 kermitvomit

感谢呕吐物的答案。截至目前，我正试图从输入文件中提取用户名。你可以帮助我像输入行中提取用户名：Mon Feb 1 12:13:59 EST 2016虚拟用户etransactiondev开始上传文件。我需要提取etransactiondev – Shiv

使用python mapreduce识别虚假警报

回答

相关问题