2016-12-27 60 views
0

我正在尝试比较两个CSV文件,以检查file1.csv的第一列中的IP地址是否在file2.csv的一行中Python 3.6。如果地址是在文件2,我需要该行复制到一个新的文件,该文件是相同的文件1.两个文件设置的第二列的值如下所示:使用两个CSV文件中的匹配列值创建一个包含组合数据的新文件

文件1:

XX.XXX.XXX.1,Test1 
XX.XXX.XXX.2,Test2 
XX.XXX.XXX.3,Test3 
XX.XXX.XXX.4,Test4 
XX.XXX.XXX.5,Test5 
XX.XXX.XXX.6,Test6 
XX.XXX.XXX.7,Test7 
XX.XXX.XXX.8,Test8 

and so on 

文件2:

XX.XXX.XXX.6, Name6 
XX.XXX.XXX.7, Name7 
XX.XXX.XXX.8, Name8 

我需要的result.csv文件看起来像这样:

XX.XXX.XXX.1,Test1, Not found 
XX.XXX.XXX.2,Test2, Not found 
XX.XXX.XXX.3,Test3, Not found 
XX.XXX.XXX.4,Test4, Not found 
XX.XXX.XXX.5,Test5, Not found 
XX.XXX.XXX.6,Test6,Name6 
XX.XXX.XXX.7,Test7,Name7 
XX.XXX.XXX.8,Test8,Name8 

我的代码到目前为止如下:

import csv 

f1 = open('file1.csv', 'r') 
f2 = open('file2.csv', 'r') 
f3 = open('results.csv', 'w') 

c1 = csv.reader(f1) 
c2 = csv.reader(f2) 
c3 = csv.writer(f3) 

file2 = list(c2) 

for file1_row in c1: 
    row = 1 
    found = False 
    for file2_row in file2: 
     results_row = file1_row 
     x = file2_row[3] 
     if file1_row[1] == file2_row[1]: 

     results_row.append('Found. Name: ' + x) 
     found = True 
     break 
    row += 1 
if not found: 
    results_row.append('Not found in File1') 
c3.writerow(results_row) 

f1.close() 
f2.close() 
f3.close() 

此时此代码正在检查相同的行而不是值。这意味着它不会匹配任何内容,因为它要求两个文件的IP列和相邻列都相同,此外它还匹配文件的第1行,第2行,第3行等,但我需要它搜索一个文件在另一个中查找匹配,不按索引比较行。

回答

0

我移动results_row和改变的缩进的位置排+ = 1

import csv 

f1 = open('file1.csv', 'r') 
f2 = open('file2.csv', 'r') 
f3 = open('results.csv', 'w') 

c1 = csv.reader(f1) 
c2 = csv.reader(f2) 
c3 = csv.writer(f3) 

file2 = list(c2) 

for file1_row in c1: 
    row = 1 
    found = False 
    results_row = file1_row #Moved out from nested loop 
    for file2_row in file2:   
     x = file2_row[1] 
     if file1_row[0] == file2_row[0]: 
      results_row.append(x) 
      found = True 
      break 
    row += 1 
    if not found: 
     results_row.append('Not found')  
    c3.writerow(results_row) 

f1.close() 
f2.close() 
f3.close() 
+0

这工作完美。谢谢! –

0

甲大熊猫溶液:

含量 results.csv
import pandas as pd 

df1 = pd.read_csv('file_1.csv', names=['a', 'b']) 
df2 = pd.read_csv('file_2.csv', names=['a', 'b']) 
merged = pd.merge(df1, df2, on='a', how='outer') 
merged.to_csv('results.csv', header=False, index=False, na_rep='Not found') 

XX.XXX.XXX.1,Test1,Not found 
XX.XXX.XXX.2,Test2,Not found 
XX.XXX.XXX.3,Test3,Not found 
XX.XXX.XXX.4,Test4,Not found 
XX.XXX.XXX.5,Test5,Not found 
XX.XXX.XXX.6,Test6, Name6 
XX.XXX.XXX.7,Test7, Name7 
XX.XXX.XXX.8,Test8, Name8 
0

密切溶液后您尝试的内容如下:

with open('result.csv', 'w') as out: 
    with open('file1.csv', 'r') as f1, open('file2.csv', 'r') as f2: 
     f2_lines = [line for line in f2.readlines() if len(line) > 1] 
     f1_lines = [line for line in f1.readlines() if len(line) > 1] 
     for line in f1_lines: 
      val = 'Not found' 
      b = [line.split(',')[0].strip() in item for item in f2_lines] 
      if any(b): 
       val = f2_lines[b.index(True)].split(',')[1].strip() 
      out.write('{}, {}\n'.format(line.strip(), val)) 

输出:

XX.XXX.XXX.1,Test1, Not found 
XX.XXX.XXX.2,Test2, Not found 
XX.XXX.XXX.3,Test3, Not found 
XX.XXX.XXX.4,Test4, Not found 
XX.XXX.XXX.5,Test5, Not found 
XX.XXX.XXX.6,Test6, Name6 
XX.XXX.XXX.7,Test7, Name7 
XX.XXX.XXX.8,Test8, Name8 
+0

我尝试使用此代码,它确实提供了正确的匹配,但它也提供了不正确的匹配。有些名字不止一次匹配。我发布的文件内容是我拥有的IP列表的一小部分,因此它可能没有在小范围内显示,但是以.103结尾的IP的名称也以IP结尾显示。 1,没有匹配的名字。 –

+0

我认为这是一个循环问题,也许它具有价值,然后将其分配给下一个开放空间? –

0

这里有一个非大熊猫的解决方案(假设你使用Python 3.X):

import csv 

present = {} 
with open('file2.csv', 'r', newline='') as file2: 
    reader = csv.reader(file2, skipinitialspace=True) 
    for ip, name in reader: 
     present[ip] = name 

with open('file1.csv', 'r', newline='') as file1, \ 
    open('results.csv', 'w', newline='') as results: 
    reader = csv.reader(file1, skipinitialspace=True) 
    writer = csv.writer(results) 
    for ip, name in reader: 
     writer.writerow([ip, name, present.get(ip, ' Not found')]) 

文件Results.csv

XX.XXX.XXX.1,Test1, Not found 
XX.XXX.XXX.2,Test2, Not found 
XX.XXX.XXX.3,Test3, Not found 
XX.XXX.XXX.4,Test4, Not found 
XX.XXX.XXX.5,Test5, Not found 
XX.XXX.XXX.6,Test6,Name6 
XX.XXX.XXX.7,Test7,Name7 
XX.XXX.XXX.8,Test8,Name8 
+0

我正在使用Python 3.6,我的道歉,我应该在问题中指定。 –

相关问题