2015-11-26 92 views
0

我试图用java逗号分隔的CSV文件转换为制表符分隔的csv文件spearated。然而,文件内部很少有值包含逗号。请参考下面的例子:转换逗号分隔的CSV文件选项卡用java

Direct - House,Bayer House Advertiser,537121661,,160 x 600,Bayer US Publisher,537121625,Bayer.com,537224178,160x600_MyeBay_US,538146889,2015-11-18,"8,455,844",0,0,0,0.000000,USD,0.000000,0.000000,0.000000 

Direct - House,Bayer House Advertiser,537121661,,160 x 600,Bayer US Publisher,537121625,Bayer.com,537224178,160x600_Search_SLR,538146895,2015-11-18,"20,175,240",30,0,0,0.000000,USD,0.000000,0.000000,0.000000 

所以任何人都可以帮助我如何处理这些值?

谢谢。

+0

那么,什么是模式?你提到了几个有逗号的值?这些值只是数值吗?这是所有单线还是多线? – Raf

+0

@Raf:我现在ahve更新的记录。上面有2条记录。此外导致问题的值是数字。例如,“8,455,844”。 – user1496783

回答

2

我认为最好的办法是依靠不改变模式。您曾提到,你必须具有逗号作为千位分隔符的数字问题。我看到在你的文章中,这些数字是用双引号括起来的。基于以下假设:

  1. 数双引号括起来
  2. 有一个在每一行(如果多于一个,然后找到所有对双引号,并将其存储在只有这样的号码中的一个数组或列表,并检查,以确保指数不会在每个

那么你的做的范围内)属于下列内容:

  1. 获取双引号即第一指标154
  2. 获得双引号的第二个/最后一个索引,即159
  3. 用逗号替换所有逗号,前提是逗号的索引小于第一个双引号的第一个索引或逗号的索引大于双引号的最后一个索引(这应该跳过数的逗号与\吨代替)

下面的代码不正是上面为您:

import java.io.BufferedReader; 
import java.io.File; 
import java.io.FileReader; 
import java.io.PrintWriter; 
import java.util.ArrayList; 
import java.util.List; 

public class CsvToTabConvertor { 
    public static void main(String[] args) { 
     File file = new File("C:\\test_java\\csvtotab.txt"); 
     List<String> processedLines = new ArrayList<String>(); 

     try { 
      BufferedReader br = new BufferedReader(new FileReader(file)); 
      String line; 
      StringBuilder builder; 
      while((line=br.readLine()) != null) { 
       builder = new StringBuilder(line); 

       //find number in double quote - assuming there is only one number with double quotes 
       int doubleQuoteIndexStart = builder.indexOf("\""); 
       int doubleQuoteIndexLast = builder.lastIndexOf("\""); 

       //for each line, find all indexes of comma 
       int index = builder.indexOf(","); 

       //previous used to when there is consecutive comma 
       int prevIndex = 0; 

       while (index >= 0) { 
        if(index < doubleQuoteIndexStart || index > doubleQuoteIndexLast) { 
         builder.setCharAt(index, '\t'); 
        } 

        //get next index of comma 
        index = builder.indexOf(",", index + 1); 

        //check for consecutive commas 
        if(index != -1 && (prevIndex +1) == index) { 
         builder.setCharAt(index, ' '); 
         //get next index of comma 
         index = builder.indexOf(",", index + 1); 
        } 
       } 

       //add the line to list of lines for later storage to file 
       processedLines.add(builder.toString()); 
      } 

      //close the output stream 
      br.close(); 

      //write all the lines to the file 
      File outFile = new File("C:\\test_java\\csvtotab_processed.txt"); 
      PrintWriter writer = new PrintWriter(outFile); 
      for(int i = 0; i < processedLines.size(); i++) { 
       writer.println(processedLines.get(i)); 
      } 

      writer.close(); 
     } catch(Exception ex) { 
      //handle exception 
     } 
    } 
} 

输入文件包含以下行:

Direct - House,eBay House Advertiser,537121661,,160 x 600,eBay US Publisher,537121625,eBay.com,537224178,160x600_MyeBay_US,538146889,2015-11-18,"8,455,844",0,0,0,0.000000,USD,0.000000,0.000000,0.000000 
Direct - House,eBay House Advertiser,537121661,,160 x 600,eBay US Publisher,537121625,eBay.com,537224178,160x600_Search_SLR,538146895,2015-11-18,"20,175,240",30,0,0,0.000000,USD,0.000000,0.000000,0.000000 

处理后的输出文件是如下:

Direct - House eBay House Advertiser 537121661  160 x 600 eBay US Publisher 537121625 eBay.com 537224178 160x600_MyeBay_US 538146889 2015-11-18 "8,455,844" 0 0 0 0.000000 USD 0.000000 0.000000 0.000000 
Direct - House eBay House Advertiser 537121661  160 x 600 eBay US Publisher 537121625 eBay.com 537224178 160x600_Search_SLR 538146895 2015-11-18 "20,175,240" 30 0 0 0.000000 USD 0.000000 0.000000 0.000000 

修改上面的代码和它的逻辑,以满足任何需求进一步。

+0

非常感谢@Raf :) – user1496783

+0

@ user1496783欢迎您,如果答案有助于解决您的问题,那么您可以选择将答案标记为**接受**,您可以在这里阅读更多http:// stackoverflow .com/help/accepted-answer – Raf

+0

只是用于我自己的问题,工作得很好! @Raf欢呼我想用制表符替换逗号,然后用逗号完全停止,所以我只是复制/粘贴代码并将其运行到相同的主文件中。第一个输出是第二个输入。不是最优雅的代码,但做了这份工作。 –