2016-12-28 47 views
1

我似乎无法弄清楚如何将我的标题行从主复制到匹配...我需要抓住主控csv中的第一行并将其写入第一个匹配,然后写入其余行,如果他们符合条件...如何将标题行复制到python中的新csv

with open('master.csv', 'r') as master, open('match.csv', 'w') as matched: 
    for line in master: 
      if any(city in line.split('","')[5] for city in citys) and \ 
      any(state in line.split('","')[6] for state in states) and \ 
      not any(category in line.split('","')[2] for category in categorys): 
       matched.write(line) 

请帮助。我是新来的python,不知道怎么用熊猫或其他任何东西......

+0

什么是需要对单双逗号 - 单双模式?这是否会忽略引号中嵌入的逗号? – ScottEdwards2000

+0

你需要“城市中的城市”吗?你一次只在一行上运行IF语句,对吧? – ScottEdwards2000

+0

@ ScottEdwards2000单双逗号单双模式是由于我的csv – CFraley

回答

2

你可以只消耗文件的第一行读取,并在文件中写回写:

with open('master.csv', 'r') as master, open('match.csv', 'w') as matched: 
    matched.write(next(master)) # can't use readline when iterating on the file afterwards 

似乎你真的需要csv模块,但其余的。我会编辑我的答案,尝试向这个方向努力

随着csv模块,不需要那些不安全split。逗号是默认分隔符,引号也可以正确处理。所以我只是写:

import csv 
with open('master.csv', 'r') as master, open('match.csv', 'w') as matched: 
    cr = csv.reader(master) 
    cw = csv.writer(matched) 
    cw.writerow(next(cr)) # copy title 

    for row in cr: # iterate on the rows, already organized as lists 
     if any(city in row[5] for city in citys) and \ 
     any(state in row[6] for state in states) and \ 
     not any(category in row[2] for category in categorys): 
      cw.writerow(row) 

BTW你的过滤器会检查city包含在row[5],但也许你想完全匹配。例如:"York"将匹配"New York",这可能不是你想要的。所以我的建议是使用in检查字符串在字符串列表,每个标准:

import csv 
with open('master.csv', 'r') as master, open('match.csv', 'w') as matched: 
    cr = csv.reader(master) 
    cw = csv.writer(matched) 
    cw.writerow(next(cr)) # copy title 
    for row in cr: 
     if row[5] in citys and row[6] in states and not row[2] in categorys: 
      cw.writerow(row) 

可使用发生器理解和写入一次的所有行甚至做得更好:

import csv 
with open('master.csv', 'r') as master, open('match.csv', 'w') as matched: 
    cr = csv.reader(master) 
    cw = csv.writer(matched) 
    cw.writerow(next(cr)) # copy title 
    cw.writerows(row for row in cr if row[5] in citys and row[6] in states and not row[2] in categorys) 

注意citysstates,并categorys会更好,因为set真是让人不是list这么查找算法的速度要快得多(你没有提供的信息)

+0

感谢您的帮助和建议。我一直在... 回溯(最近通话最后一个)将你的代码后,收到此错误: 文件“yelpscrape.py” 51行,在 cw.writerow(下一个(CR))#复制标题 ValueError异常:关闭文件的I/O操作 – CFraley

+0

Nevermind。我得到了它的工作。没有完全正确地完成它。我相信它现在正在工作。谢谢你的帮助! – CFraley

0

如果你不想太用力去想线生产的迭代器是如何工作的,oOne直接的方式做到这一点是治疗的第一行特殊:

with open('master.csv', 'r') as master, open('match.csv', 'w') as matched: 
    first_line = True 
    for line in master: 
      if first_line or (any(city in line.split('","')[5] for city in citys) and \ 
      any(state in line.split('","')[6] for state in states) and \ 
      not any(category in line.split('","')[2] for category in categorys)): 
       matched.write(line) 
      first_line = False 
相关问题