2016-08-02 95 views
0

我有时包含逗号和换行符的.csv列中的数据。如果我的数据中有逗号,我用双引号括住了整个字符串。如何将该列的输出解析为一个.txt文件,并考虑换行符和逗号。Awk获取包含逗号和换行符的.csv列

不以我的命令工作

的样本数据:

,"This is some text with a , in it.", #data with commas are enclosed in double quotes 

,line 1 of data 
line 2 of data, #data with a couple of newlines 

,"Data that may a have , in it and 
also be on a newline as well.", 

这是我到目前为止有:

awk -F "\"*,\"*" '{print $4}' file.csv > column_output.txt 
+0

您是否可以在双引号分隔字段中使用双引号,如果是这样,它们是如何转义的? '“foo \”bar“或'”foo“”bar“'或其他什么? –

回答

0
$ cat decsv.awk 
BEGIN { FPAT = "([^,]*)|(\"[^\"]+\")"; OFS="," } 
{ 
    # create strings that cannot exist in the input to map escaped quotes to 
    gsub(/a/,"aA") 
    gsub(/\\"/,"aB") 
    gsub(/""/,"aC") 

    # prepend previous incomplete record segment if any 
    $0 = prev $0 
    numq = gsub(/"/,"&") 
    if (numq % 2) { 
     # this is inside double quotes so incomplete record 
     prev = $0 RT 
     next 
    } 
    prev = "" 

    for (i=1;i<=NF;i++) { 
     # map the replacement strings back to their original values 
     gsub(/aC/,"\"\"",$i) 
     gsub(/aB/,"\\\"",$i) 
     gsub(/aA/,"a",$i) 
    } 

    printf "Record %d:\n", ++recNr 
    for (i=0;i<=NF;i++) { 
     printf "\t$%d=<%s>\n", i, $i 
    } 
    print "#######" 

$ awk -f decsv.awk file 
Record 1: 
     $0=<,"This is some text with a , in it.", #data with commas are enclosed in double quotes> 
     $1=<> 
     $2=<"This is some text with a , in it."> 
     $3=< #data with commas are enclosed in double quotes> 
####### 
Record 2: 
     $0=<,"line 1 of data 
line 2 of data", #data with a couple of newlines> 
     $1=<> 
     $2=<"line 1 of data 
line 2 of data"> 
     $3=< #data with a couple of newlines> 
####### 
Record 3: 
     $0=<,"Data that may a have , in it and 
also be on a newline as well.",> 
     $1=<> 
     $2=<"Data that may a have , in it and 
also be on a newline as well."> 
     $3=<> 
####### 
Record 4: 
     $0=<,"Data that \"may\" a have ""quote"" in it and 
also be on a newline as well.",> 
     $1=<> 
     $2=<"Data that \"may\" a have ""quote"" in it and 
also be on a newline as well."> 
     $3=<> 
####### 

以上使用GNU awk FPAT和RT。我不知道有哪种CSV格式可以让你在没有用引号括起来的字段中间有一个换行符(如果是的话,你永远不会知道任何记录结束的地方),所以脚本不允许那。以上是在此输入文件上运行的:

$ cat file 
,"This is some text with a , in it.", #data with commas are enclosed in double quotes 
,"line 1 of data 
line 2 of data", #data with a couple of newlines 
,"Data that may a have , in it and 
also be on a newline as well.", 
,"Data that \"may\" a have ""quote"" in it and 
also be on a newline as well.",