2013-04-06 162 views
0

我开发了一个代码,它打开一个CSV文件并使用for循环计算行数,但我觉得这种方法效率不高,并导致多次延迟。如何在Java中高效地计算CSV文件的行

  • TargetFile.mdb有120行
  • report.csv有11000行

如果我用这个方法的代码需要运行120*11000=1320000 times计算每个资源计数。这里是我的代码:

这里是新的,工作代码,由Xavier Delamotte有效地计算行:

import java.io.File; 
import java.io.FileReader; 
import java.io.IOException; 
import java.sql.SQLException; 
import java.util.HashMap; 
import java.util.List; 
import java.util.Map; 

import au.com.bytecode.opencsv.CSVReader; 

import com.healthmarketscience.jackcess.Database; 
import com.healthmarketscience.jackcess.Table; 

public class newcount { 

    public static class ValueKey{ 
     String mdmId; 
     String pgName; 

     @Override 
     public int hashCode() { 
      final int prime = 31; 
      int result = 1; 
      result = prime * result + ((mdmId == null) ? 0 : mdmId.hashCode()); 
      result = prime * result 
       + ((pgName == null) ? 0 : pgName.hashCode()); 
      return result; 
     } 
     @Override 
     public boolean equals(Object obj) { 
      if (this == obj) 
       return true; 
      if (obj == null) 
       return false; 
      if (getClass() != obj.getClass()) 
       return false; 
      ValueKey other = (ValueKey) obj; 
      if (mdmId == null) { 
       if (other.mdmId != null) 
        return false; 
      } else if (!mdmId.equals(other.mdmId)) 
       return false; 
      if (pgName == null) { 
       if (other.pgName != null) 
        return false; 
      } else if (!pgName.equals(other.pgName)) 
       return false; 
      return true; 
     } 
     public ValueKey(String mdmId, String pgName) { 
      super(); 
      this.mdmId = mdmId; 
      this.pgName = pgName; 
     } 
    } 

    public static void main(String[] args) throws IOException, SQLException,Throwable{ 


     Integer count; 

     String MDMID,NAME,PGNAME,PGTARGET,TEAM; 

     Table RESOURCES = Database.open(new File("C:/STATS/TargetFile.mdb")).getTable("RESOURCES"); 
     int pcount = RESOURCES.getRowCount(); 


     String csvFilename = "C:\\MDMSTATS\\APEX\\report.csv"; 
     CSVReader csvReader = new CSVReader(new FileReader(csvFilename)); 
     List<String[]> content = csvReader.readAll(); 
     Map<ValueKey, Integer> csvValuesCount = new HashMap<ValueKey, Integer>(); 
     for (String[] rowcsv : content) { 
      ValueKey key = new ValueKey(rowcsv[6], rowcsv[1]); 
      count = csvValuesCount.get(key); 
      csvValuesCount.put(key,count == null ? 1: count + 1); 

     } 

     //int count = 0; 
     // Taking 1st resource data 
     for (int i = 0; i < pcount-25; i++) { 
      Map<String, Object> row = RESOURCES.getNextRow(); 
      TEAM = row.get("TEAM").toString(); 
      MDMID = row.get("MDM ID").toString(); 
      NAME = row.get("RESOURCE NAME").toString(); 
      PGNAME = row.get("PG NAME").toString(); 
      PGTARGET = row.get("PG TARGET").toString(); 
      int PGTARGETI = Integer.parseInt(PGTARGET); 
      Integer countInteger = csvValuesCount.get(new ValueKey(MDMID, PGNAME)); 
      count = countInteger == null ? 0: countInteger; 
      System.out.println(NAME+"\t"+PGNAME+"\t"+count); 

     } 
    } 
} 
+0

所有我想要做的是通过使用CSV文件,SQL查询来计算资源计数 – H4SN 2013-04-06 11:38:50

回答

3

我建议只读一次csv文件,并计算由mdmId和pgName组成的密钥的出现次数。

如果你有番石榴,你可以使用一个MultiSet<ValueKey>http://guava-libraries.googlecode.com/svn-history/r8/trunk/javadoc/com/google/common/collect/Multiset.html代替Map<ValueKey,Integer>

编辑:和使用你需要把在其他文件或声明为静态的ValueKey类。

类ValueKey:

public static class ValueKey{ 
     String mdmId; 
     String pgName; 
     @Override 
     public int hashCode() { 
      final int prime = 31; 
      int result = 1; 
      result = prime * result + ((mdmId == null) ? 0 : mdmId.hashCode()); 
      result = prime * result 
        + ((pgName == null) ? 0 : pgName.hashCode()); 
      return result; 
     } 
     @Override 
     public boolean equals(Object obj) { 
      if (this == obj) 
       return true; 
      if (obj == null) 
       return false; 
      if (getClass() != obj.getClass()) 
       return false; 
      ValueKey other = (ValueKey) obj; 
      if (mdmId == null) { 
       if (other.mdmId != null) 
        return false; 
      } else if (!mdmId.equals(other.mdmId)) 
       return false; 
      if (pgName == null) { 
       if (other.pgName != null) 
        return false; 
      } else if (!pgName.equals(other.pgName)) 
       return false; 
      return true; 
     } 
     public ValueKey(String mdmId, String pgName) { 
      super(); 
      this.mdmId = mdmId; 
      this.pgName = pgName; 
     } 
    } 

你的方法:

Table RESOURCES = Database.open(new File("TargetFile.mdb")).getTable("RESOURCES"); 
    int pcount = RESOURCES.getRowCount(); 

    String csvFilename = "C:\\STATS\\APEX\\report.csv"; 
    CSVReader csvReader = new CSVReader(new FileReader(csvFilename)); 
    List<String[]> content = csvReader.readAll(); 
    Map<ValueKey, Integer> csvValuesCount = new HashMap<ValueKey, Integer>(); 
    for (String[] rowcsv : content) { 
     ValueKey key = new ValueKey(rowcsv[6], rowcsv[1]); 
     Integer count = csvValuesCount.get(key); 
     csvValuesCount.put(key,count == null ? 1: count + 1); 

    } 

    int count = 0; 
    // Taking 1st resource data 
    for (int i = 0; i < pcount; i++) { 
     Map<String, Object> row = RESOURCES.getNextRow(); 
     TEAM = row.get("TEAM").toString(); 
     MDMID = row.get("MDM ID").toString(); 
     NAME = row.get("RESOURCE NAME").toString(); 
     PGNAME = row.get("PG NAME").toString(); 
     PGTARGET = row.get("PG TARGET").toString(); 
     int PGTARGETI = Integer.parseInt(PGTARGET); 
     Integer countInteger = csvValuesCount.get(new ValueKey(MDMID, PGNAME)); 
     count = countInteger == null ? 0: countInteger; 
    } 
+0

没有总计数,那么这个代码将只是包含计数,这也有两个循环[for(String [] rowcsv:content)]也会为每个资源运行11000次,现在更新的代码现在csv文件被拿走一次 – H4SN 2013-04-06 11:43:40

+0

它看起来会工作,我会让你知道后,将其添加到我的整个代码:) – H4SN 2013-04-06 12:15:37

+0

看到更新的问题我得到1代码中的错误 – H4SN 2013-04-07 15:28:41

0

亲爱的朋友,我建议你使用OpenCSV

我认为它能够满足您的要求; )

+1

我使用打开CSV再次看到代码是的,它可以满足,但我认为运行代码1320000次是不是一个好主意,它需要很长时间 – H4SN 2013-04-06 11:36:42

+0

尊敬的H4SN,从代码中不清楚您使用OpenCSV,无论如何,Xaviar Delmotte的解决方案是好的,试试吧;) – 2013-04-06 11:57:17

0

先读取CSV,制作一组字段6值,然后用它来更新计数。这应该是相当快的。

//open csv and make lookup set 
Set<String> mdmids = new HashSet<String>() 
String[] rowcsv = null; 
String csvFilename = "C:\\STATS\\APEX\\report.csv"; 
CSVReader csvReader = new CSVReader(new FileReader(csvFilename)); 
List content = csvReader.readAll(); 

for (Object object : content) { 
    rowcsv = (String[]) object;    
     mdmids.add(rowcsv[6]) 
} 
Table RESOURCES = Database.open(new File("TargetFile.mdb")).getTable("RESOURCES"); 
pcount = RESOURCES.getRowCount(); 
count = 0; 
// Taking 1st resource data 
for (i = 0; i < pcount; i++){ 
Map<String, Object> row = RESOURCES.getNextRow();        
    TEAM = row.get("TEAM").toString(); 
MDMID = row.get("MDM ID").toString(); 
NAME = row.get("RESOURCE NAME").toString(); 
PGNAME = row.get("PG NAME").toString(); 
PGTARGET = row.get("PG TARGET").toString(); 
int PGTARGETI = Integer.parseInt(PGTARGET); 

// use lookup set 
if(mdmids.contains(MDMID)) { 
    count++; 
} 
} 
+0

怎么样count会返回第一个MDMID的行数? – H4SN 2013-04-06 11:56:57

+0

csv将被打开,csv(第6列)中的一组mdmids将被建立起来。然后csv完成并可以收集垃圾。 通过数据库行,它将使用该集来查看是否存在匹配的mdmid。这是一个哈希操作,只有11000个条目相当快。 – rongenre 2013-04-06 12:03:26

+0

如果(mdmids.contains(MDMID))对于特定的MDMID – H4SN 2013-04-07 03:14:08

相关问题