删除一行，如果它包含CSV文件中的字符串

我在删除包含一列中的字符串的文本文件中的行时遇到问题。到目前为止，我的代码无法删除该行，但它能够读取文本文件并将其作为CSV文件保存到不同的列中。但行不会被删除。删除一行，如果它包含CSV文件中的字符串

这是该列中的值的样子：

Ship To or Bill To 
------------------ 
3000000092-BILL_TO 
3000000092-SHIP_TO 
3000004000_SHIP_TO-INAC-EIM

而且有20多列，加50,000K行。所以基本上我试图删除全部包含字符串'INAC'或'EIM'的行。

import csv 

my_file_name = "NVG.txt" 
cleaned_file = "cleanNVG.csv" 
remove_words = ['INAC','EIM'] 

with open(my_file_name, 'r', newline='') as infile, \ 
    open(cleaned_file, 'w',newline='') as outfile: 
    writer = csv.writer(outfile) 
    for line in csv.reader(infile, delimiter='|'): 
     if not any(remove_word in line for remove_word in remove_words): 
      writer.writerow(line)

来源

2016-10-04 Cesar

这里的问题是，csv.reader对象返回文件的行作为单个列值的列表，因此“in”测试正在检查该列表中的任何单个值是否等于remove_word。

速战速决将尝试

 if not any(remove_word in element for element in line for remove_word in remove_words):

，因为如果在该行任何字段包含任何remove_words的，这将是真实的。

来源

2016-10-04 16:40:16 holdenweb

谢谢你为我工作。 – Cesar

通过CSV阅读每一输出线是一个字符串列表，而不是一个字符串，所以你的列表理解是检查是否“INAC”或“EIM”是列表的成员之一，即：

'INAC' in ['3000004000_SHIP_TO-INAC-EIM', ...]

由于'in'在列表中调用时会查找完全匹配，因此这总是错误的。如果要检查字符串是否存在于任何地方行，你并不需要一个CSV阅读器，而是可以使用普通的open（）：

import csv 

my_file_name = "NVG.txt" 
cleaned_file = "cleanNVG.csv" 
remove_words = ['INAC','EIM'] 

with open(my_file_name, 'r', newline='') as infile, open(cleaned_file, 'w',newline='') as outfile: 
    writer = csv.writer(outfile) 
    for line in infile: 
     if not any(remove_word in line for remove_word in remove_words): 
      writer.writerow(line)

来源

2016-10-04 16:39:58

哦没关系，但我仍然需要分隔符“|”因为所有的列都被这个值分开。我将如何包括这一点？ – Cesar

我不确定我是否理解你的问题，如果字符串出现在行中的任何位置，或者只是删除行内的特定列，并且保持其余列未更改，是否尝试删除整行？ –

如果字符串出现在行的任何位置，请删除整行。 – Cesar

至于其他的答案已经指出的那样，你的代码不起作用的原因是因为每个line in csv.reader实际上是列值的列表，所以remove_word in line检查，看其中是否是正好等于remove_words之一 - 这显然从来没有True。

如果您只需要检查一列中的单词，没有理由检查所有单词。以下内容只会检查一列的值，因此应该比检查文件每一行中的所有20个或更多的值要快得多。

import csv 

my_file_name = "NVG.txt" 
cleaned_file_name = "cleanNVG.csv" 
ONE_COLUMN = 1 
remove_words = ['INAC', 'EIM'] 

with open(my_file_name, 'r', newline='') as infile, \ 
    open(cleaned_file_name, 'w',newline='') as outfile: 
    writer = csv.writer(outfile) 
    for row in csv.reader(infile, delimiter='|'): 
     column = row[ONE_COLUMN] 
     if not any(remove_word in column for remove_word in remove_words): 
      writer.writerow(row)

来源

2016-10-04 18:55:08 martineau

删除一行，如果它包含CSV文件中的字符串

回答

相关问题