对来自CSV文件的元组列表进行平均

对于一个个人项目，我有一个相当大的苹果过去股票数据的.CSV文件。我已经使用csv模块来读取这些数据，并打印出日期和月份的收盘价有一个函数：对来自CSV文件的元组列表进行平均

这里的元组的格式的例子：

('2012-03-24' , '122.10')

我现在所期待平均每个月的数据并重新生成元组列表。

有没有人有任何建议？我是一名Python开始的学生。

def get_list_data(file_obj, column_number):  
    with open("table.csv", "r") as f: 
     reader = csv.reader(f, delimiter=',') 
     for row in reader: 
      data = row[0] , row[column_number] #Data and column data 
      list_of_tuples = tuple(data) 
      print(list_of_tuples) 

    return list_of_tuples 

def average_data(list_of_tuples): #This is where I am stuck

来源

2013-02-25 j_3459

我们得到堆栈溢出！看起来你希望我们为你写一些代码。尽管许多用户愿意为遇险的编码人员编写代码，但他们通常只在海报已尝试自行解决问题时才提供帮助。证明这一努力的一个好方法是包含迄今为止编写的代码，示例输入（如果有的话），期望的输出和实际获得的输出（控制台输出，堆栈跟踪，编译器错误 - 无论是适用）。您提供的细节越多，您可能会收到的答案就越多。 – 2013-02-25 21:12:07

我的歉意，我会发布我到目前为止。 – 2013-02-25 21:13:03

我强烈建议查看[pandas]（http://pandas.pydata.org/pandas-docs/stable/timeseries.html）库，以快速有效地聚合时间序列。 – Trevor 2013-02-25 21:16:46

你需要进行如下操作：

首先，你需要将字符串从每个元组（“122.1”）成浮动的第二部分转换。你可以用float()方法做到这一点。其次，需要使用sum()方法和list comprehension来计算元组的所有第二部分的总和。
除以函数len()返回的列表的长度。

代码示例：

def average_data(list_of_tuples): 

    stock_data = [float(t[1]) for t in list_of_tuples] 
    stock_sum = sum(stock_data) 
    return stock_sum/len(list_of_tuples)

例子：

list_of_tuples = [('2012-03-24' , '122.10'), ('2012-03-25' , '117.30'), ('2012-03-26' , '126.9')] 

print average_data(list_of_tuples) 
>>> 122.1

来源

2013-02-25 22:22:17

Suzana，我真的很感谢你的回答。非常感谢！ – 2013-02-26 02:46:03

如果你所教自己蟒蛇，继续前进，与csv实现自己的阅读器，然后自己制定出平均计算。这是一个很好的练习。

但是，如果您希望减少编码并花费更多时间进行分析，请使用，如pandas（或至少numpy）。 pandas库擅长于这种类型的数据分析。

下面的ipython会话显示了这些类型的计算与pandas是多么容易。（如果您不使用ipython，那么这是另一个工具我强烈建议您学习。）在本次会议中，我读取了一个包含Apple股票数据的CSV文件。数据文件'aapl。CSV”看起来是这样的：

Date,Open,High,Low,Close,Volume,Adj Close 
2013-02-25,453.85,455.12,442.57,442.80,13276100,442.80 
2013-02-22,449.25,451.60,446.60,450.81,11798600,450.81 
2013-02-21,446.00,449.17,442.82,446.06,15970800,446.06 
2013-02-20,457.69,457.69,448.80,448.85,17010800,448.85 
2013-02-19,461.10,462.73,453.85,459.99,15563700,459.99 
2013-02-15,468.85,470.16,459.92,460.16,13990900,460.16 
2013-02-14,464.52,471.64,464.02,466.59,12688400,466.59 
... 
1984-09-14,27.62,28.50,27.62,27.87,8826400,3.13 
1984-09-13,27.50,27.62,27.50,27.50,7429600,3.09 
1984-09-12,26.87,27.00,26.12,26.12,4773600,2.94 
1984-09-11,26.62,27.37,26.62,26.87,5444000,3.02 
1984-09-10,26.50,26.62,25.87,26.37,2346400,2.97 
1984-09-07,26.50,26.87,26.25,26.50,2981600,2.98

导入大熊猫库：

In [1]: import pandas as pd

将数据读入一个数据帧，使用 '日期' 一栏为指标：

In [2]: aapl = pd.read_csv('aapl.csv', index_col=0, parse_dates=True)

排序索引按升序排列：

In [3]: aapl = aapl.sort()

查看前几条记录：

In [4]: aapl.head() 
Out[4]: 
      Open High Low Close Volume Adj Close 
Date              
1984-09-07 26.50 26.87 26.25 26.50 2981600  2.98 
1984-09-10 26.50 26.62 25.87 26.37 2346400  2.97 
1984-09-11 26.62 27.37 26.62 26.87 5444000  3.02 
1984-09-12 26.87 27.00 26.12 26.12 4773600  2.94 
1984-09-13 27.50 27.62 27.50 27.50 7429600  3.09

将数据重新采样到每月。默认情况下，每天的平均值用于：

In [5]: monthly = aapl.resample('1M') 

In [6]: monthly.head() 
Out[6]: 
       Open  High  Low  Close   Volume Adj Close 
Date                    
1984-09-30 26.981250 27.333125 26.606250 26.738750 4807300.000000 3.007500 
1984-10-31 25.035652 25.313478 24.780435 24.806957 5559408.695652 2.788696 
1984-11-30 24.545238 24.782857 24.188095 24.236190 5749561.904762 2.724286 
1984-12-31 27.060000 27.378500 26.841000 26.947500 6195360.000000 3.031500 
1985-01-31 29.520000 29.855909 29.140000 29.253182 10353818.181818 3.289091

情节月度数据的“关闭”栏：

In [7]: monthly.plot(y='Close') 
Out[7]: <matplotlib.axes.AxesSubplot at 0x45ff4d0>

看看“关闭”列：

In [8]: monthly['Close'] 
Out[8]: 
Date 
1984-09-30 26.738750 
1984-10-31 24.806957 
1984-11-30 24.236190 
1984-12-31 26.947500 
1985-01-31 29.253182 
1985-02-28 28.089474 
1985-03-31 22.741429 
1985-04-30 21.425238 
1985-05-31 19.656818 
1985-06-30 16.399000 
1985-07-31 17.185455 
1985-08-31 15.098636 
1985-09-30 15.738500 
1985-10-31 16.940000 
1985-11-30 19.460000 
... 
2011-12-31 392.930476 
2012-01-31 428.578000 
2012-02-29 497.571000 
2012-03-31 577.507727 
2012-04-30 606.003000 
2012-05-31 564.673182 
2012-06-30 574.562381 
2012-07-31 601.068095 
2012-08-31 642.696087 
2012-09-30 681.568421 
2012-10-31 634.714286 
2012-11-30 564.345714 
2012-12-31 532.055000 
2013-01-31 497.822381 
2013-02-28 459.026875 
Freq: M, Name: Close, Length: 342

这里是由plot方法生成的情节： Monthly average of 'Close'

来源

2013-02-26 02:52:12

谢谢，沃伦！我想在完成这个项目之后，我会尝试一下你的建议。 – 2013-02-26 03:06:06

对来自CSV文件的元组列表进行平均

回答

相关问题