$ cat decsv.awk
BEGIN { FPAT = "([^,]*)|(\"[^\"]+\")"; OFS="," }
{
# create strings that cannot exist in the input to map escaped quotes to
gsub(/a/,"aA")
gsub(/\\"/,"aB")
gsub(/""/,"aC")
# prepend previous incomplete record segment if any
$0 = prev $0
numq = gsub(/"/,"&")
if (numq % 2) {
# this is inside double quotes so incomplete record
prev = $0 RT
next
}
prev = ""
for (i=1;i<=NF;i++) {
# map the replacement strings back to their original values
gsub(/aC/,"\"\"",$i)
gsub(/aB/,"\\\"",$i)
gsub(/aA/,"a",$i)
}
printf "Record %d:\n", ++recNr
for (i=0;i<=NF;i++) {
printf "\t$%d=<%s>\n", i, $i
}
print "#######"
。
$ awk -f decsv.awk file
Record 1:
$0=<,"This is some text with a , in it.", #data with commas are enclosed in double quotes>
$1=<>
$2=<"This is some text with a , in it.">
$3=< #data with commas are enclosed in double quotes>
#######
Record 2:
$0=<,"line 1 of data
line 2 of data", #data with a couple of newlines>
$1=<>
$2=<"line 1 of data
line 2 of data">
$3=< #data with a couple of newlines>
#######
Record 3:
$0=<,"Data that may a have , in it and
also be on a newline as well.",>
$1=<>
$2=<"Data that may a have , in it and
also be on a newline as well.">
$3=<>
#######
Record 4:
$0=<,"Data that \"may\" a have ""quote"" in it and
also be on a newline as well.",>
$1=<>
$2=<"Data that \"may\" a have ""quote"" in it and
also be on a newline as well.">
$3=<>
#######
以上使用GNU awk FPAT和RT。我不知道有哪种CSV格式可以让你在没有用引号括起来的字段中间有一个换行符(如果是的话,你永远不会知道任何记录结束的地方),所以脚本不允许那。以上是在此输入文件上运行的:
$ cat file
,"This is some text with a , in it.", #data with commas are enclosed in double quotes
,"line 1 of data
line 2 of data", #data with a couple of newlines
,"Data that may a have , in it and
also be on a newline as well.",
,"Data that \"may\" a have ""quote"" in it and
also be on a newline as well.",
您是否可以在双引号分隔字段中使用双引号,如果是这样,它们是如何转义的? '“foo \”bar“或'”foo“”bar“'或其他什么? –