0
有人可以帮我解决以下问题。我正在尝试分析安全日志以发现虚假警报。错误警报是包含“未创建TXT”的错误警报,并且“txt未创建”时为true。如何从数据源中提取特定的“未创建的txt”(下面给出的示例输入数据)。使用python mapreduce识别虚假警报
from mrjob.job import MRJob
class MRWordFrequencyCount(MRJob):
def mapper(self, _, line):
words = line.split()
for word in words:
word = unicode(word, "utf-8", errors="ignore")
yield word, 1
def reducer(self, key, values):
yield key, sum(values)
if __name__ == '__main__':
MRWordFrequencyCount.run()
样本输入在这里给出:
Mon Feb 1 12:13:59 EST 2016 virtual user etransactiondev started to upload file
/export/home/pub/etransactiondev/uploads/etransactionenvironment/ABC/rrd/in/WCWT.SMR.XYZ0002.PLSE.INPUT01.LFEP_APOL_D_M_20160201171358.TXT
/export/home/pub/etransactiondev/uploads/etransactionenvironment/ABC/rrd/in/WCWT.SMR.XYZ0002.PLSE.INPUT01.LFEP_APOL_D_M_20160201171358.txt was not created
>“TXT未创建”,并且“txt未创建”时为true。 有没有错误或差异真的只是'TXT'和'TXT'这两个字的情况? – DAXaholic