2017-02-08 34 views
1

我与org.apache.commons-CSV 1.4的工作,这周我在我们的JUnit测试的一个发现,这个奇怪的behaviuor错误:阿帕奇公地CSV帖

CSVReader reader = null; 
    List<String[]> linesCsv = new ArrayList<>(); 
    FileInputStream fileStream = null; 
    InputStreamReader inputStreamReader = null; 

    try { 
     fileStream = new FileInputStream(file); 
     inputStreamReader = new InputStreamReader(fileStream, "ISO-8859-1"); 
     reader = new CSVReader(inputStreamReader, ',', '"', 0); 

     String[] record = null; 
     while ((record = reader.readNext()) != null) { 
      linesCsv.add(record); 
     } 

    } catch (Exception e) { 
     logger.error("Error in ", e); 
    } finally { 
     if (inputStreamReader != null) { 
      inputStreamReader.close(); 
     } 
     if (fileStream != null) { 
      fileStream.close(); 
     } 
     if (reader != null) { 
      reader.close(); 
     } 
    } 

*错误情况

输入的.csv

DAR_123451     ,"XXXXX Hello World "Hello World XXX " 
DAR_123452     ,"XXXXX Hello World "Hello World XXX " 

爪哇KO:

[0.0] DAR_123451
[0.1] XXXXX的Hello World的 “Hello World XXX \ nDAR_123456,XXXXX的Hello World的” Hello World XXX


*正确的大小写

输入的.csv

DAR_123451     ,"XXXXX Hello World "Hello World" XXX " 
DAR_123452     ,"XXXXX Hello World "Hello World" XXX " 

的Java OK:

[0.0] DAR_123451 [0.1] XXXXX Hello World“Hello World”XXX

[1.0] DAR_123452 [1.1] XXXXX的Hello World“Hello World”的XXX

我不能设置公共CSV库才能正常工作,现在看来,这是一个错误,我们如何能正确读出在单引号中的字符串字符串?

+0

检查文件input.csv中第一行结尾处的行。 – user1516873

回答

0

如果值被引号括起来,CSV格式通常使用2个连续的双引号将文本中的双引号括起来。以下着作。

当我使用最新版本的公地CSV我甚至获得与您输入的例外(IOException: (line 1) invalid char between encapsulated token and delimiter

因此,要正确地包含双引号,你需要使用以下

DAR_123451     ,"XXXXX Hello World ""Hello World"" XXX " 
DAR_123452     ,"XXXXX Hello World ""Hello World"" XXX " 

而且测试用例,然后按预期工作:

Reader in = new StringReader(
      "DAR_123451     ,\"XXXXX Hello World \"\"Hello World XXX\"\" \"\n" + 
        "DAR_123452     ,\"XXXXX Hello World \"\"Hello World XXX\"\" \""); 
    Iterable<CSVRecord> records = CSVFormat.DEFAULT.parse(in); 
    for (CSVRecord record : records) { 
     for (int i = 0; i < record.size(); i++) { 
      System.out.println("At " + i + ": " + record.get(i)); 
     } 
    } 

输出:

At 0: DAR_123451     
At 1: XXXXX Hello World "Hello World XXX" 
At 0: DAR_123452     
At 1: XXXXX Hello World "Hello World XXX" 

有关详细信息,请参见https://en.wikipedia.org/wiki/Comma-separated_values#General_functionality