2016-12-16 73 views
1

我学习的Python愚蠢的问题很抱歉..添加行时第二CSV具有相同的第一排

我有两个文件:

list.csv

john 
mary 
joanna 
lucas 
kate 

db.csv

john^chief^portland 
mary^secretary^ny 
joanna^supervisor^washington 

我想达到什么是比较这两个文件和输出都 alphabeticaly通过第一列ABD排序在名称不以dB为单位在第二列中添加None这样的:

output.csv

joanna^supervisor^washington 
john^chief^portland 
kate^None 
lucas^None 
Mary^secretary^ny 

我开始用它从这个代码,我发现SO开始打:

masterlist = list(reader22) 

for hosts_row in reader21: 
    row = 1 
    found = False 
    for master_row in masterlist: 
     results_row = hosts_row 
     if hosts_row[0] == master_row[0]: 
      results_row.append('FOUNDTHISLINE in master list (row ' 
           + str(row) + ')') 
      found = True 
      break 
     row = row + 1 
    if not found: 
     results_row.append('THISLINENOTFOUND in master list') 
    writer23.writerow(results_row) 

请帮助理解它是如何应该做的最好的方式。

+0

哪个值变为第三列呢? – MMF

+0

当你说“同一首行”时,你的意思是_column_? –

+0

对不起列。 MMF:没有第三列 – Lucas

回答

2

它很容易和有效的做你想做的仅使用csv模块,什么Python的内置数据结构,如列表和字典:

import csv 

with open('list.csv', 'rb') as csvfile: 
    masterlist = sorted(row[0] for row in csv.reader(csvfile)) 

with open('db.csv', 'rb') as csvfile: 
    db = {row[0]: row[1:] for row in csv.reader(csvfile, delimiter='^')} 

with open('output.csv', 'wb') as csvfile: 
    writer = csv.writer(csvfile, delimiter='^') 
    for name in masterlist: 
     writer.writerow([name] + db[name] if name in db else [name, 'None', '']) 

output.csv的内容创建:

joanna^supervisor^washington 
john^chief^portland 
kate^None^ 
lucas^None^ 
mary^secretary^ny 
+0

伙计纠正我,如果我错了,但在python <2.7这一行:'db = {row [0]:row [1:] for csv.reader(csvfile,delimiter ='^')}'应该看起来像这样? 'db = dict((row [0],row [1:])for csv.reader(csvfile,delimiter ='^'))' – Lucas

+0

Lucas:我不知道什么版本的Python [字典显示](https://docs.python.org/2/reference/expressions.html#dictionary-displays)被引入,但它已经存在了很长一段时间。也就是说,您显示的[alternative](https://docs.python.org/2/library/stdtypes.html#mapping-types-dict)方法可以从[generator expression](https: //docs.python.org/2/reference/expressions.html#generator-expressions)生成一个'key','value'对的序列也应该可以在从2.4到2.7的版本中工作。 – martineau

2

这是熊猫图书馆的完美案例。我知道你刚开始学习,但是检查出来的数据操作(请忽略编号:))

In [37]: list_df = pd.read_csv('list.csv', header=None) 

In [38]: db_df = pd.read_csv('db.csv', sep='^', header=None) 

In [51]: db_df 
Out[51]: 
     0   1   2 
0 john  chief portland 
1 mary secretary   ny 
2 joanna supervisor washington 


In [48]: list_df 
Out[48]: 
     0 
0 john 
1 mary 
2 joanna 
3 lucas 
4 kate 

In [52]: df = list_df.merge(db_df, how='left') 

In [53]: df 
Out[53]: 
     0   1   2 
0 john  chief portland 
1 mary secretary   ny 
2 joanna supervisor washington 
3 lucas   NaN   NaN 
4 kate   NaN   NaN 

In [54]: df.sort(0) 
Out[54]: 
     0   1   2 
2 joanna supervisor washington 
0 john  chief portland 
4 kate   NaN   NaN 
3 lucas   NaN   NaN 
1 mary secretary   ny 

从那里,你可以调用df.to_csv功能,让你正在寻找的输出。

(回写) http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html

+0

你如何回写该文件? – praveenraj

+0

编辑指向文档的答案。它非常直接。 – Kelvin

相关问题