如何在Java中高效地计算CSV文件的行

我开发了一个代码，它打开一个CSV文件并使用for循环计算行数，但我觉得这种方法效率不高，并导致多次延迟。如何在Java中高效地计算CSV文件的行

TargetFile.mdb有120行
report.csv有11000行

如果我用这个方法的代码需要运行120*11000=1320000 times计算每个资源计数。这里是我的代码：

这里是新的，工作代码，由Xavier Delamotte有效地计算行：

import java.io.File; 
import java.io.FileReader; 
import java.io.IOException; 
import java.sql.SQLException; 
import java.util.HashMap; 
import java.util.List; 
import java.util.Map; 

import au.com.bytecode.opencsv.CSVReader; 

import com.healthmarketscience.jackcess.Database; 
import com.healthmarketscience.jackcess.Table; 

public class newcount { 

    public static class ValueKey{ 
     String mdmId; 
     String pgName; 

     @Override 
     public int hashCode() { 
      final int prime = 31; 
      int result = 1; 
      result = prime * result + ((mdmId == null) ? 0 : mdmId.hashCode()); 
      result = prime * result 
       + ((pgName == null) ? 0 : pgName.hashCode()); 
      return result; 
     } 
     @Override 
     public boolean equals(Object obj) { 
      if (this == obj) 
       return true; 
      if (obj == null) 
       return false; 
      if (getClass() != obj.getClass()) 
       return false; 
      ValueKey other = (ValueKey) obj; 
      if (mdmId == null) { 
       if (other.mdmId != null) 
        return false; 
      } else if (!mdmId.equals(other.mdmId)) 
       return false; 
      if (pgName == null) { 
       if (other.pgName != null) 
        return false; 
      } else if (!pgName.equals(other.pgName)) 
       return false; 
      return true; 
     } 
     public ValueKey(String mdmId, String pgName) { 
      super(); 
      this.mdmId = mdmId; 
      this.pgName = pgName; 
     } 
    } 

    public static void main(String[] args) throws IOException, SQLException,Throwable{ 


     Integer count; 

     String MDMID,NAME,PGNAME,PGTARGET,TEAM; 

     Table RESOURCES = Database.open(new File("C:/STATS/TargetFile.mdb")).getTable("RESOURCES"); 
     int pcount = RESOURCES.getRowCount(); 


     String csvFilename = "C:\\MDMSTATS\\APEX\\report.csv"; 
     CSVReader csvReader = new CSVReader(new FileReader(csvFilename)); 
     List<String[]> content = csvReader.readAll(); 
     Map<ValueKey, Integer> csvValuesCount = new HashMap<ValueKey, Integer>(); 
     for (String[] rowcsv : content) { 
      ValueKey key = new ValueKey(rowcsv[6], rowcsv[1]); 
      count = csvValuesCount.get(key); 
      csvValuesCount.put(key,count == null ? 1: count + 1); 

     } 

     //int count = 0; 
     // Taking 1st resource data 
     for (int i = 0; i < pcount-25; i++) { 
      Map<String, Object> row = RESOURCES.getNextRow(); 
      TEAM = row.get("TEAM").toString(); 
      MDMID = row.get("MDM ID").toString(); 
      NAME = row.get("RESOURCE NAME").toString(); 
      PGNAME = row.get("PG NAME").toString(); 
      PGTARGET = row.get("PG TARGET").toString(); 
      int PGTARGETI = Integer.parseInt(PGTARGET); 
      Integer countInteger = csvValuesCount.get(new ValueKey(MDMID, PGNAME)); 
      count = countInteger == null ? 0: countInteger; 
      System.out.println(NAME+"\t"+PGNAME+"\t"+count); 

     } 
    } 
}

来源

2013-04-06 H4SN

所有我想要做的是通过使用CSV文件，SQL查询来计算资源计数 – H4SN 2013-04-06 11:38:50

我建议只读一次csv文件，并计算由mdmId和pgName组成的密钥的出现次数。

如果你有番石榴，你可以使用一个MultiSet<ValueKey>http://guava-libraries.googlecode.com/svn-history/r8/trunk/javadoc/com/google/common/collect/Multiset.html代替Map<ValueKey,Integer>

编辑：和使用你需要把在其他文件或声明为静态的ValueKey类。

类ValueKey：

public static class ValueKey{ 
     String mdmId; 
     String pgName; 
     @Override 
     public int hashCode() { 
      final int prime = 31; 
      int result = 1; 
      result = prime * result + ((mdmId == null) ? 0 : mdmId.hashCode()); 
      result = prime * result 
        + ((pgName == null) ? 0 : pgName.hashCode()); 
      return result; 
     } 
     @Override 
     public boolean equals(Object obj) { 
      if (this == obj) 
       return true; 
      if (obj == null) 
       return false; 
      if (getClass() != obj.getClass()) 
       return false; 
      ValueKey other = (ValueKey) obj; 
      if (mdmId == null) { 
       if (other.mdmId != null) 
        return false; 
      } else if (!mdmId.equals(other.mdmId)) 
       return false; 
      if (pgName == null) { 
       if (other.pgName != null) 
        return false; 
      } else if (!pgName.equals(other.pgName)) 
       return false; 
      return true; 
     } 
     public ValueKey(String mdmId, String pgName) { 
      super(); 
      this.mdmId = mdmId; 
      this.pgName = pgName; 
     } 
    }

你的方法：

Table RESOURCES = Database.open(new File("TargetFile.mdb")).getTable("RESOURCES"); 
    int pcount = RESOURCES.getRowCount(); 

    String csvFilename = "C:\\STATS\\APEX\\report.csv"; 
    CSVReader csvReader = new CSVReader(new FileReader(csvFilename)); 
    List<String[]> content = csvReader.readAll(); 
    Map<ValueKey, Integer> csvValuesCount = new HashMap<ValueKey, Integer>(); 
    for (String[] rowcsv : content) { 
     ValueKey key = new ValueKey(rowcsv[6], rowcsv[1]); 
     Integer count = csvValuesCount.get(key); 
     csvValuesCount.put(key,count == null ? 1: count + 1); 

    } 

    int count = 0; 
    // Taking 1st resource data 
    for (int i = 0; i < pcount; i++) { 
     Map<String, Object> row = RESOURCES.getNextRow(); 
     TEAM = row.get("TEAM").toString(); 
     MDMID = row.get("MDM ID").toString(); 
     NAME = row.get("RESOURCE NAME").toString(); 
     PGNAME = row.get("PG NAME").toString(); 
     PGTARGET = row.get("PG TARGET").toString(); 
     int PGTARGETI = Integer.parseInt(PGTARGET); 
     Integer countInteger = csvValuesCount.get(new ValueKey(MDMID, PGNAME)); 
     count = countInteger == null ? 0: countInteger; 
    }

来源

2013-04-06 11:40:07

没有总计数，那么这个代码将只是包含计数，这也有两个循环[for（String [] rowcsv：content）]也会为每个资源运行11000次，现在更新的代码现在csv文件被拿走一次 – H4SN 2013-04-06 11:43:40

它看起来会工作，我会让你知道后，将其添加到我的整个代码:) – H4SN 2013-04-06 12:15:37

看到更新的问题我得到1代码中的错误 – H4SN 2013-04-07 15:28:41

亲爱的朋友，我建议你使用OpenCSV

我认为它能够满足您的要求; ）

来源

2013-04-06 11:32:35

我使用打开CSV再次看到代码是的，它可以满足，但我认为运行代码1320000次是不是一个好主意，它需要很长时间 – H4SN 2013-04-06 11:36:42

尊敬的H4SN，从代码中不清楚您使用OpenCSV，无论如何，Xaviar Delmotte的解决方案是好的，试试吧;） – 2013-04-06 11:57:17

先读取CSV，制作一组字段6值，然后用它来更新计数。这应该是相当快的。

//open csv and make lookup set 
Set<String> mdmids = new HashSet<String>() 
String[] rowcsv = null; 
String csvFilename = "C:\\STATS\\APEX\\report.csv"; 
CSVReader csvReader = new CSVReader(new FileReader(csvFilename)); 
List content = csvReader.readAll(); 

for (Object object : content) { 
    rowcsv = (String[]) object;    
     mdmids.add(rowcsv[6]) 
} 
Table RESOURCES = Database.open(new File("TargetFile.mdb")).getTable("RESOURCES"); 
pcount = RESOURCES.getRowCount(); 
count = 0; 
// Taking 1st resource data 
for (i = 0; i < pcount; i++){ 
Map<String, Object> row = RESOURCES.getNextRow();        
    TEAM = row.get("TEAM").toString(); 
MDMID = row.get("MDM ID").toString(); 
NAME = row.get("RESOURCE NAME").toString(); 
PGNAME = row.get("PG NAME").toString(); 
PGTARGET = row.get("PG TARGET").toString(); 
int PGTARGETI = Integer.parseInt(PGTARGET); 

// use lookup set 
if(mdmids.contains(MDMID)) { 
    count++; 
} 
}

来源

2013-04-06 11:37:18 rongenre

怎么样count会返回第一个MDMID的行数？ – H4SN 2013-04-06 11:56:57

csv将被打开，csv（第6列）中的一组mdmids将被建立起来。然后csv完成并可以收集垃圾。通过数据库行，它将使用该集来查看是否存在匹配的mdmid。这是一个哈希操作，只有11000个条目相当快。 – rongenre 2013-04-06 12:03:26

如果（mdmids.contains（MDMID））对于特定的MDMID – H4SN 2013-04-07 03:14:08

如何在Java中高效地计算CSV文件的行

回答

相关问题