2015-08-19 11 views
-1

我有一个csv,它包含伦敦地铁站的名称和lat/lng位置信息。它看起来像这样:Python中的CSV操作起源目标矩阵制定

Station Lat Lng 
Abbey Road 51.53195199 0.003737786 
Abbey Wood 51.49078408 0.120286371 
Acton 51.51688696 -0.267675543 
Acton Central 51.50875781 -0.263415792 
Acton Town 51.50307148 -0.280288296 

我希望将这个csv转换为创建这些站的所有可能组合的原点目标矩阵。有270个电台,因此有72,900个可能的组合。

最终我希望把这个矩阵到CSV格式如下

O_Station O_lat O_lng D_Station D_lat D_lng 
Abbey Road 51.53195199 0.003737786 Abbey Wood 51.49078408 0.120286371 
Abbey Road 51.53195199 0.003737786 Acton 51.51688696 -0.267675543 
Abbey Road 51.53195199 0.003737786 Acton Central 51.50875781 -0.263415792 
Abbey Wood 51.49078408 0.120286371 Abbey Road 51.53195199 0.003737786 
Abbey Wood 51.49078408 0.120286371 Acton 51.51688696 -0.267675543 
Abbey Wood 51.49078408 0.120286371 Acton Central 51.50875781 -0.263415792 
Acton 51.51688696 -0.267675543 Abbey Road 51.53195199 0.003737786 
Acton 51.51688696 -0.267675543 Abbey Wood 51.49078408 0.120286371 
Acton 51.51688696 -0.267675543 Acton Central 51.50875781 -0.263415792 

的第一步是使用一个循环与所有其它可能的站的配对任何站。然后我需要删除起源和目的地是同一个电台的0个组合。

我试过使用NumPy函数column_stack。然而这给出了一个奇怪的结果。

import csv 
import numpy 
from pprint import pprint 
numpy.set_printoptions(threshold='nan') 

with open('./London stations.csv', 'rU') as csvfile: 
    reader = csv.DictReader(csvfile) 
    Stations = ['{O_Station}'.format(**row) for row in reader] 
print(Stations) 
O_D = numpy.column_stack(([Stations],[Stations])) 
pprint(O_D) 

输出

站=

['Abbey Road', 'Abbey Wood', 'Acton', 'Acton Central', 'Acton Town'] 

O_D =

array([['Abbey Road', 'Abbey Wood', 'Acton', 'Acton Central', 'Acton Town', 
     'Abbey Road', 'Abbey Wood', 'Acton', 'Acton Central', 'Acton Town']], 
     dtype='|S13') 

我的理想寻找更适合的功能和发现很难找到它的NumPy的手册。

回答

0

这是一个不完整的答案,但我会跳过numpy的和头部右转入pandas

csv_file = '''Station Lat Lng 
Abbey Road 51.53195199 0.003737786 
Abbey Wood 51.49078408 0.120286371 
Acton 51.51688696 -0.267675543 
Acton Central 51.50875781 -0.263415792 
Acton Town 51.50307148 -0.280288296''' 

这是艰难的,因为它是不是真的用逗号分隔的,否则我们可能只需要调用pandas.read_csv()

names = [' '.join(x.split()[:-2]) for x in stations] 
lats = [x.split()[-2] for x in stations] 
lons = [x.split()[-1] for x in stations] 

stations_dict = {names[i]: (lats[i], lons[i]) for i, _ in enumerate(stations)} 

df = pd.DataFrame(stations_dict).T # Transpose it 
df.columns = ['Lat', 'Lng'] 
df.index.name = 'Station' 

所以我们最终df.head()产生:

     Lat   Lng 
Station 
Abbey Road  51.53195199 0.003737786 
Abbey Wood  51.49078408 0.120286371 
Acton   51.51688696 -0.267675543 
Acton Central 51.50875781 -0.263415792 
Acton Town  51.50307148 -0.280288296 

获取排列可能意味着我们不需要站点作为索引...暂时不确定。希望这会有所帮助!

0

使用像这样的表格数据时,我更喜欢使用熊猫。它使控制你的数据结构变得简单。

import pandas as pd 

#read in csv 
stations = pd.read_csv('london stations.csv', index_col = 0) 

#create new dataframe 
O_D = pd.DataFrame(columns = ['O_Station','O_lat','O_lng','D_Station','D_lat','D_lng']) 

#iterate through the stations 

new_index= 0 
for o_station in stations.index: 
    for d_station in stations.index: 
     ls = [o_station,stations.Lat.loc[o_station],stations.Lng.loc[o_station],d_station, stations.Lat.loc[d_station], stations.Lng.loc[d_station]] 
     O_D.loc[new_index] = ls 
     new_index+=1 

#remove double stations 
O_D = O_D[O_D.O_Station != O_D.D_Station] 

这应该为您的数据转换做好准备。

+0

Thanks @rgalbo。但是,我得到一个IndexError - IndexError:iloc不能放大其目标对象在线“O_D.O_Station.iloc [new_index] = o_station” – LearningSlowly

+1

@LearningSlowlyThis应该更好地作为修复 – rgalbo

+0

非常好!非常感谢。现在。让我的头了解这是如何工作的。 – LearningSlowly