2014-10-29 135 views
0

我有一个包含100列的csv文件。我想计算第4列至n的总和。我可以为单个列生成总和,但是当我尝试为所有列尝试失败时。以下是我迄今为止汇总CSV文件中的特定列

import decimal 
import numpy as np 
import os as os 
import csv as csv 
import re as re 
import sys 

col=10 
values=[] 
with open('test.csv', 'r') as f: 
    reader = csv.reader(f) 
    headers = reader.next() 
    for line in reader: 
    #print line 
     line = [int(i) for i in line] 
    col_totals = [sum(result) for result in zip(*line)] 
    print col_totals 
     #values.append(int(line[col])) 
     #csum=sum(values) 
    #print csum 

感谢,

+0

你要计算4-10列的总和,对于每一行?或者你想要计算所有行的第4列和所有行的第5列的和? – inspectorG4dget 2014-10-29 19:39:40

+0

你试过'减少',因为它[这里]解释(https://docs.python.org/2/library/functions.html#reduce)。 – 2014-10-29 19:47:30

+1

是的我想要计算所有行的第4列和所有行的第5列的总和等? – learningcurve 2014-10-29 21:02:09

回答

0

如果要在连续的线来概括,这会做

i, j = 3, 5 

with open('test.csv', 'r') as f: 
    reader = csv.reader(f) 
    headers = reader.next() 
    table = list(reader) 
    sums = [sum(float(elt) for elt in col) for col in zip(*table)[i:j]] 

尝试还包括以下

requested = [4, 7, 12, 13, 21, 81] 

with open('test.csv', 'r') as f: 
    reader = csv.reader(f) 
    headers = reader.next() 
    table = list(reader) 
    sums = [sum(float(elt) for elt in col) for i, col in enumerate(zip(*table)) if i in requested] 
+0

谢谢你的回复。但是,当我尝试这样我得到TypeError:不支持的操作数类型为+:'int'和'str' – learningcurve 2014-10-29 21:05:43

+0

呃,编辑中... – gboffi 2014-10-29 21:19:04

+0

@learnincurve我已经测试我的解决方案与_synthetic_ tabelle发布之前,愉快地忘记了悲伤的真相,'csv.reader'只返回字符串。哦,我的坏!我会在一天或其他时间学习'熊猫'... – gboffi 2014-10-29 21:34:24

1

这在熊猫中是非常非常容易的:

import pandas as pd 
df = pd.read_csv(filename) 
df[df.columns[4:]].sum() 

,如果你想列的每行之,那就是:

df[df.columns[4:]].sum(1) 
+0

谢谢你的工作.. – learningcurve 2014-10-29 21:50:30

+0

谢谢,介意接受答案? – acushner 2014-10-30 18:21:34