2016-12-13 91 views
-1

我有重演这样的csv文件:使用Python或C#删除csv中的行?

"col1", "col2","col3" 
Integer, Integer, Varchar(50) 
7, 8, 21554 
24, 25, 36544 
"col1", "col2","col3" 
Integer, Integer, Varchar(50) 
7, 8, 21554 
24, 25, 36544 

如何剥离重复的部分,包括后来的标题,数据类型行,数据行?
我只希望这样的:

"col1", "col2","col3" 
Integer, Integer, Varchar(50) 
7, 8, 21554 
24, 25, 36544 

回答

1

我们甚至不需要使用csv模块这一点。我们会记住文件的第一行是什么,然后再写行,直到我们再次看到它,此时我们将停止并截断文件。

with open('infile.csv', newline='') as infile, open('outfile.csv', 'w+', newline='')as outfile: 
    first = next(infile) 
    outfile.write(first) 
    for line in infile: 
     if line == first: 
      break 
     outfile.write(line) 
+1

谢谢,帕特里克,你的代码工作般的魅力,除了一个小的变化:与第二去除” “之前”打开“,当我在Python3中运行。 – B2Y

0

你可以使用csv模块(假设的Python 2.x的),像这样做:

import csv 

seen = set() 
with open('duplicates.csv', 'rb') as infile, open('cleaned.csv', 'wb') as outfile: 
    reader = csv.reader(infile, skipinitialspace=True) 
    writer = csv.writer(outfile) 
    for row in (tuple(row) for row in reader): 
     if row not in seen: 
      writer.writerow(row) 
      seen.add(row) 

print('done') 
+0

谢谢Martineau。你的代码对我也很好! – B2Y

+0

不客气。建议您阅读[_当有人回答我的问题时该怎么办?_](http://stackoverflow.com/help/someone-answers) – martineau