我有一个需要解析的CSV文件类型。下面的正是我需要考虑(缺少列标题,引号内换行,丢失数据等)的条件:RegExp适用于String.match,但不适用于String.split
ID,NAME,TITLE,DESCRIPTION,,
PRO1234,"JOHN SMITH",ENGINEER,"JOHN HAS BEEN WORKING
HARD ON BEING A GOOD
SERVENT."
PRO1235,"KEITH SMITH",ENGINEER,"keith has been working
hard on being a good
servent."
PRO1235,"KENNY SMITH",,"keith has been working
hard on being a good
servent."
PRO1235,"RICK SMITH",,,
你会发现,有行以及换行说明内部将用于新的数据行。
我写这个正则表达式查找换行符报价之外,它的伟大工程here
代码,如何使用Node.js:
var fs = require('fs');
function parseCSV(filename){
var rx = new RegExp(/\n(?=([^"\\]*(\\.|"([^"\\]*\\.)*[^"\\]*"))*[^"]*$)/g);
var strFile = fs.readFileSync(filename).toString();
console.log("line feed count via match: " + strFile.match(rx).length);
var csv = strFile.split(rx);
console.log("csv length: " + csv.length);
console.log("csv items ###############################");
csv.forEach(function(e,i,a){
console.log("item e: " + e);
});
}
当我运行这个,你”会看到换行计数(按匹配找到的换行)是正确的,即。然而,使用与String.split()相同的RET时,它回来了所得阵列是不稳定的:
line feed count via match: 4
csv length: 17
csv items ###############################
item e: ID,NAME,TITLE,DESCRIPTION,,
item e:
PRO1235,"RICK SMITH"
item e: "RICK SMITH"
item e: undefined
item e: PRO1234,"JOHN SMITH",ENGINEER,"JOHN HAS BEEN WORKING
HARD ON BEING A GOOD
SERVENT."
item e:
PRO1235,"RICK SMITH"
item e: "RICK SMITH"
item e: undefined
item e: PRO1235,"KEITH SMITH",ENGINEER,"keith has been working
hard on being a good
servent."
item e:
PRO1235,"RICK SMITH"
item e: "RICK SMITH"
item e: undefined
item e: PRO1235,"KENNY SMITH",,"keith has been working
hard on being a good
servent."
item e: PRO1235,"RICK SMITH"
item e: "RICK SMITH"
item e: undefined
item e: PRO1235,"RICK SMITH",,,
我在做什么毛病分裂?我的想法是,如果我能确定4个与match()完美配合的换行符,那么同一个regEx应该提供将字符串“分割”的位置。
重新发明轮子的经典案例。 [为什么不使用专用的CSV解析器?](https://code.google.com/p/jquery-csv/) – anubhava 2014-09-23 16:46:19
首先,您不能从中间开始解析字符串。 – sln 2014-09-23 17:01:17
sln - 你能解释一下你的评论吗?如果我调用string.split(regExp),如何解析中间的字符串? – neoRiley 2014-09-23 17:10:59