2016-07-26 166 views
0

我想合并两个csv文件与一个共同的id列并将合并写入一个新的文件。我曾尝试以下,但它给我一个错误 -通过共同的列合并两个CSV文件python

import csv 
from collections import OrderedDict 

filenames = "stops.csv", "stops2.csv" 
data = OrderedDict() 
fieldnames = [] 
for filename in filenames: 
    with open(filename, "rb") as fp: # python 2 
     reader = csv.DictReader(fp) 
     fieldnames.extend(reader.fieldnames) 
     for row in reader: 
      data.setdefault(row["stop_id"], {}).update(row) 

fieldnames = list(OrderedDict.fromkeys(fieldnames)) 
with open("merged.csv", "wb") as fp: 
    writer = csv.writer(fp) 
    writer.writerow(fieldnames) 
    for row in data.itervalues(): 
     writer.writerow([row.get(field, '') for field in fieldnames]) 

两个文件有“stop_id”一栏,但我发现这个错误回来 - KeyError异常:“stop_id”

任何帮助非常感谢。

由于

+0

'data.setdefault(row [“stop_id”],{})。update(row)' - 为什么这么复杂? – Alleo

+0

另外,按列合并两个表是用'pandas.merge'完成的,请参阅http://pandas.pydata.org/pandas-docs/stable/merging.html#brief-primer-on-merge-methods-relational - 代数 – Alleo

+0

我用另一个堆栈溢出示例作为输入。你能提出一个替代方案吗?谢谢 – sgpbyrne

回答

0

由于四条的例子。

这是什么为我合并后的每个csv中的第一列合并。

import csv 
from collections import OrderedDict 

with open('stops.csv', 'rb') as f: 
    r = csv.reader(f) 
    dict2 = {row[0]: row[1:] for row in r} 

with open('stops2.csv', 'rb') as f: 
    r = csv.reader(f) 
    dict1 = OrderedDict((row[0], row[1:]) for row in r) 

result = OrderedDict() 
for d in (dict1, dict2): 
    for key, value in d.iteritems(): 
     result.setdefault(key, []).extend(value) 

with open('ab_combined.csv', 'wb') as f: 
    w = csv.writer(f) 
    for key, value in result.iteritems(): 
     w.writerow([key] + value) 
1

下面是使用大熊猫

import sys 
from StringIO import StringIO 
import pandas as pd 

TESTDATA=StringIO("""DOB;First;Last 
    2016-07-26;John;smith 
    2016-07-27;Mathew;George 
    2016-07-28;Aryan;Singh 
    2016-07-29;Ella;Gayau 
    """) 

list1 = pd.read_csv(TESTDATA, sep=";") 

TESTDATA=StringIO("""Date of Birth;Patient First Name;Patient Last Name 
    2016-07-26;John;smith 
    2016-07-27;Mathew;XXX 
    2016-07-28;Aryan;Singh 
    2016-07-20;Ella;Gayau 
    """) 


list2 = pd.read_csv(TESTDATA, sep=";") 

print list2 
print list1 

common = pd.merge(list1, list2, how='left', left_on=['Last', 'First', 'DOB'], right_on=['Patient Last Name', 'Patient First Name', 'Date of Birth']).dropna() 
print common