2017-02-11 139 views
-1

我在阅读csv文件时遇到问题。在Python中读取csv文件

csv格式: 下面是CSV文件中的两个条目格式:

"1", "one", "<long class=\"like\" >\ 
    <short class=\"over\">\ 
    </short> 
    </long>", "3", "<long class=\"like\" >\ 
    <short class=\"over\">\ 
    </short> 
    </long>", "facebook" 
    "2", "two", "<long class=\"like\" >\ 
    <short class=\"over\">\ 
    </short> 
    </long>", "3", "<long class=\"like\" >\ 
    <short class=\"over\">\ 
    </short> 
    </long>", "facebook" 

如何读取每一行这种CSV文件的?

+2

对于csv文件,这是一个奇怪的内容。如何看待预期的结果? – RomanPerekhrest

+0

每行前面应该有4个空格还是格式问题? –

+0

stackoverflow格式行之前没有空格 – justkid

回答

0

假设从CSV文件看一些两个条目象下面这样:

"1", "one", "<long class=\"like\" ><short class=\"over\"></short></long>", "3", "<long class=\"like\" ><short class=\"over\"></short></long>" "facebook" 
"2", "two", "<long class=\"like\" ><short class=\"over\"></short></long>", "3", "<long class=\"like\" ><short class=\"over\"></short></long>" "facebook" 

考虑使用re.findall()功能:

import re 

with open('test.csv', 'r') as fh: 
    lines = fh.read().split('\n') 
    for l in lines: 
     fields = re.findall(r'^\"(\d+)\", \"(\w+)\", (.+)', l, re.S) 
     a, b, c = fields[0] # unpacking fields 
     print(a, b, c, sep='\t') 

输出:

1 one "<long class=\"like\" ><short class=\"over\"></short></long>", "3", "<long class=\"like\" ><short class=\"over\"></short></long>" "facebook" 
2 two "<long class=\"like\" ><short class=\"over\"></short></long>", "3", "<long class=\"like\" ><short class=\"over\"></short></long>" "facebook" 
1

为什么不要使用csv包?

你可以阅读每一行和像你想用它玩,例如:

import csv 
with open('prueba.csv','r') as file: 
    reader = csv.reader(file, delimiter=';') 
    for row in reader: 
     <That you want to do with each row> 

但也许你想要做的另一个不同的事情。