你可以只消耗文件的第一行读取,并在文件中写回写:
with open('master.csv', 'r') as master, open('match.csv', 'w') as matched:
matched.write(next(master)) # can't use readline when iterating on the file afterwards
似乎你真的需要csv
模块,但其余的。我会编辑我的答案,尝试向这个方向努力
随着csv
模块,不需要那些不安全split
。逗号是默认分隔符,引号也可以正确处理。所以我只是写:
import csv
with open('master.csv', 'r') as master, open('match.csv', 'w') as matched:
cr = csv.reader(master)
cw = csv.writer(matched)
cw.writerow(next(cr)) # copy title
for row in cr: # iterate on the rows, already organized as lists
if any(city in row[5] for city in citys) and \
any(state in row[6] for state in states) and \
not any(category in row[2] for category in categorys):
cw.writerow(row)
BTW你的过滤器会检查city
包含在row[5]
,但也许你想完全匹配。例如:"York"
将匹配"New York"
,这可能不是你想要的。所以我的建议是使用in
检查字符串在字符串列表,每个标准:
import csv
with open('master.csv', 'r') as master, open('match.csv', 'w') as matched:
cr = csv.reader(master)
cw = csv.writer(matched)
cw.writerow(next(cr)) # copy title
for row in cr:
if row[5] in citys and row[6] in states and not row[2] in categorys:
cw.writerow(row)
可使用发生器理解和写入一次的所有行甚至做得更好:
import csv
with open('master.csv', 'r') as master, open('match.csv', 'w') as matched:
cr = csv.reader(master)
cw = csv.writer(matched)
cw.writerow(next(cr)) # copy title
cw.writerows(row for row in cr if row[5] in citys and row[6] in states and not row[2] in categorys)
注意citys
,states
,并categorys
会更好,因为set
真是让人不是list
这么查找算法的速度要快得多(你没有提供的信息)
什么是需要对单双逗号 - 单双模式?这是否会忽略引号中嵌入的逗号? – ScottEdwards2000
你需要“城市中的城市”吗?你一次只在一行上运行IF语句,对吧? – ScottEdwards2000
@ ScottEdwards2000单双逗号单双模式是由于我的csv – CFraley