regex
  • apache
  • parsing
  • logging
  • 2015-06-20 78 views 2 likes 
    2

    嗨我试图解析Apache的请求使用Python的正则表达式并将其分配给单独的变量。使用正则表达式分析日志

    ACCESS_LOG_PATTERN = '^(\S+) (\S+) (\S+) \[([\w:/]+\s[+\-]\d{4})\] "(\S+) (\S+)\s*(\S+)\s*" (\d{3}) (\S+)' 
    
    logLine='127.0.0.1 - - [01/Jul/1995:00:00:01 -0400] "GET /images/launch-logo.gif HTTP/1.0" 200 1839' 
    

    我将解析,并将其组入下面的变量:

    match = re.search(APACHE_ACCESS_LOG_PATTERN, logLine) 
    
    
    
        host   = match.group(1) 
    
        client_identd = match.group(2) 
    
        user_id  = match.group(3) 
    
        date_time  = match.group(4) 
    
        method  = match.group(5) 
    
        endpoint  = match.group(6) 
    
        protocol  = match.group(7) 
    
        response_code = int(match.group(8)) 
    
        content_size = match.group(9) 
    

    上述正则表达式模式被用于上述日志行工作正常,但以下情况下,解析/正则表达式匹配失败:

    '127.0.0.1 - - [01/Jul/1995:00:00:01 -0400] "GET /" 200 1839' 
    
    '127.0.0.1 - - [01/Jul/1995:00:00:01 -0400] "GET/" 200 1839' 
    

    请帮帮我?并请提供我一些解决方案:)

    +0

    p [解析apache日志文件]的可能副本(http://stackoverflow.com/questions/12544510/parsing-apache-log-files) – Wolph

    +0

    没有我需要的特定于日志行的请求部分的要求。而且它不是一般化的日志解析 –

    回答

    1

    你需要让你的group 7可选加入了?,使用下面的正则表达式:

    ^(\S+) (\S+) (\S+) \[([\w:/]+\s[+\-]\d{4})\] "(\S+) (\S+)\s*(\S+)?\s*" (\d{3}) (\S+) 
                       ↑ 
    

    DEMO

    0
    import re 
    
    
    HOST = r'^(?P<host>.*?)' 
    SPACE = r'\s' 
    IDENTITY = r'\S+' 
    USER = r'\S+' 
    TIME = r'(?P<time>\[.*?\])' 
    REQUEST = r'\"(?P<request>.*?)\"' 
    STATUS = r'(?P<status>\d{3})' 
    SIZE = r'(?P<size>\S+)' 
    
    REGEX = HOST+SPACE+IDENTITY+SPACE+USER+SPACE+TIME+SPACE+REQUEST+SPACE+STATUS+SPACE+SIZE+SPACE 
    
    def parser(log_line): 
        match = re.search(REGEX,log_line) 
        return ((match.group('host'), 
          match.group('time'), 
             match.group('request') , 
             match.group('status') , 
             match.group('size') 
            ) 
            ) 
    
    
    logLine = """180.76.15.30 - - [24/Mar/2017:19:37:57 +0000] "GET /shop/page/32/?count=15&orderby=title&add_to_wishlist=4846 HTTP/1.1" 404 10202 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)""" 
    result = parser(logLine) 
    print(result) 
    

    结果

    ('180.76.15.30', '[24/Mar/2017:19:37:57 +0000]', 'GET /shop/page/32/?count=15&orderby=title&add_to_wishlist=4846 HTTP/1.1', '404', '10202') 
    
    相关问题