2015-09-27 62 views
2

Google Analytics将增量浏览器版本视为不同的版本,因此我的报告无法用于绘制任何有用的结论。例如Chrome 45.0.2454.93被认为是与45.0.2454.85不同的浏览器。解析并汇总Google Analytics浏览器版本CSV数据

我想写一个Python 2应用程序,它抓取Google Analytics CSV并聚合主要浏览器版本的会话信息。

我是新来的Python,但这里是我的尝试......

from __future__ import division 
import csv 
from collections import defaultdict 

RAWFile = 'somefile.csv' 

def default_val(): 
    return [0, 0] 

def aggregateaway(): 
    with open(RAWFile, 'r') as inf: 
     has_header = csv.Sniffer().has_header(inf.read(1024)) 
     inf.seek(0) # rewind 
     incsv = csv.reader(inf) 
     if has_header: 
      next(incsv) # skip header row 

    reader = csv.DictReader(incsv, 'r') 

    BrowserVersion = defaultdict(default_val) 
    for row in reader: 
     Sessions = int(row["Sessions"]) 
     BrowserVersion[row["BrowserVersion"]][0] += Sessions 

    writer = csv.writer(open('out.csv', 'w')) 
    writer.writerow(["BrowserVersion", "Sessions"]) 
    writer.writerows([BrowserVersion] + BrowserVersion[BrowserVersion] for BrowserVersion in BrowserVersion) 

我有两个问题,我知道:

  1. 我得到ValueError('I/O operation on closed file',) - 我想这是因为逻辑我用它来跳过数据前面的行。
  2. 我不确定如何以编程方式将主要浏览器版本分组。是left(BrowserVersion, 2)?即使如此,由于其他浏览器版本控制规则,这也是有缺陷的。也许我可以搜索第一个.,然后应用左边的x个字符。我将如何添加到上面的代码?

编辑:一些样本CSV数据:

# ---------------------------------------- 
# My Site 
# Web Browsers 
# 20150828-20150927 
# ---------------------------------------- 

Browser,Operating System,Browser Version,Sessions,Bounce Rate 
Safari,iOS,8.0,"1,681",68.91% 
Chrome,Windows,45.0.2454.85,"1,200",40.98% 
Chrome,Windows,45.0.2454.93,"2,273",40.98% 

回答

2

这是我结束了从同事很多的帮助使用。希望谷歌决定增加这个功能(分析)一段时间很快:)

#!/usr/bin/env python 
import csv 
import operator 
import pprint 

inputfilename = 'input.csv' 
outputfilename = 'output.csv' 

values = [] 
with open(inputfilename, 'rb') as csvfile: #Open file 
    reader = csv.DictReader(filter(lambda row: row[0]!='#', csvfile)) #Skip rows with # 
    header = reader.next().values()[0] #Gives a list of field names 
    for rows in reader: 
     row = rows.values()[0] 
     values.append({header[i]: row[i] for i in range(len(header))}) #Creates list of csv data in a dictionary 

report = {} #Define empty dictionary to aggregate data into 

for value in values: 
    browserstring = value["Operating System"] + " - " + value["Browser"] + " - " + value["Browser Version"].split('.')[0] #Split browser version by '.' to get major version release 
    if value["Browser"] <> '': #Skip to next to avoid GA column totals in output (i.e. those with a blank browser value) 
     if browserstring in report: 
      report [browserstring] += int(value["Sessions"].replace(',','')) #Remove number comma formatting, sum data 
     else: 
      report [browserstring] = int(value["Sessions"].replace(',','')) #Remove number formatting and add new reecord (if it does not exist already already) 
    else: 
     next 

sorted_report = sorted(report.items(), reverse=True, key=operator.itemgetter(1)) #Convert dictionary to tuple to sort values in descending order 

#pprint.pprint(sorted_report) #for debugging 

with open(outputfilename,'w') as out: #Let's print this to file 
    csv_out=csv.writer(out) 
    csv_out.writerow(['Aggregated Browser Version - Major']) #Title 
    csv_out.writerow(['Browser','Sessions']) #Column headers 
    for row in sorted_report: #Data from ordered tuple list 
     csv_out.writerow(row) 

输出CSV例如两行:

enter image description here