算法的优化

我有读取文本文件并创建boolean类型的输入数组[]的代码。它的大小约为100,000-300,000个物品。现在我面临的问题是创建所有那些大小为N，3> = N> = 9的子集，它们具有连续的真值。算法的优化

E.g.对于N = 3，如果所有3个trues都在连续索引中，[true] [true] [true]是所需的子集。

虽然我创建了一个算法，但它非常慢。我需要一个更好的解决方案，快速高效。

请提出一些建议。

public static void createConsecutivePassingDays() 
    {  
     for (String siteName : sitesToBeTestedList.keySet()) 
     { 
      System.out.println("\n*****************Processing for Site--->"+siteName+" ***********************"); 

      LinkedHashMap<String,ArrayList<String>> cellsWithPassedTripletsDates=new LinkedHashMap<String, ArrayList<String>>(); 

      for (String cellName : sitesToBeTestedList.get(siteName)) 
      { 

       System.out.println("\n*****************Processing for Cell--->"+cellName+" ***********************"); 

       boolean failed=false; 

       ArrayList<String> passedDatesTriplets=new ArrayList<String>(); 
       int consecutiveDays=0; 
       String tripletDate=""; 
       String prevDate_day=""; 
       String today_Date=""; 

       for (String date : cellDateKpiMetOrNotMap.get(cellName).keySet()) 
       { 
        System.out.println("\nprocessing for Date-->"+date); 
        if(!(prevDate_day.trim().equals(""))) 
         today_Date=getNextDay(prevDate_day.substring(0, prevDate_day.lastIndexOf('_'))); 

        if(Connection.props.getProperty("INCLUDE_WEEKENDS").equalsIgnoreCase("FALSE")) 
        { 
         if(date.endsWith("SAT") || date.endsWith("SUN") || (!(date.substring(0, date.lastIndexOf('_')).equalsIgnoreCase(today_Date)))) 
         { 
          if(consecutiveDays >= Reader.days) 
          { 
           passedDatesTriplets.add(tripletDate); 
          } 

          tripletDate=""; 
          consecutiveDays=0; 
          prevDate_day=date; 
          continue; 
         } 
        } 


        if(cellDateKpiMetOrNotMap.get(cellName).get(date).equalsIgnoreCase("TRUE")) 
        { 

         if(tripletDate.equals("")) 
          tripletDate=date; 
         else 
          tripletDate+="#"+date; 

         consecutiveDays++; 

        } 
        else 
        { 
         failed=true; 
         if(consecutiveDays >= Reader.days)//kd 
         { 
          System.out.println("Triplet to be added-->"+tripletDate); 
          passedDatesTriplets.add(tripletDate); 
         } 
         tripletDate=""; 
         consecutiveDays=0; 
        } 

        prevDate_day=date; 
       } 

       if(!failed) 
        passedDatesTriplets.add(tripletDate); 
       else 
       { 
        if(tripletDate.trim().split("#").length >= Reader.days) 
        { 
         passedDatesTriplets.add(tripletDate); 
        } 
       } 

       cellsWithPassedTripletsDates.put(cellName, passedDatesTriplets); 

      } 

      siteItsCellsWithPassedDates.put(siteName, cellsWithPassedTripletsDates); 

     } 

     System.out.println("\n************************************************SITES***************************************"); 
     for (String site : siteItsCellsWithPassedDates.keySet()) 
     { 
      System.out.println("\n********************Site="+site+" ***********************"); 
      for (String cellName : siteItsCellsWithPassedDates.get(site).keySet()) 
      { 
       System.out.println("\nCellName="+cellName); 
       System.out.println(siteItsCellsWithPassedDates.get(site).get(cellName)); 
      } 
      System.out.println("***********************************************************"); 
     } 
     System.out.println("********************************************************************************************"); 
    }

来源

2012-10-10 KDjava

你目前的算法是什么？代码请 – RNJ

嘿，代码是.. ,,但我问的问题实际上是非常基本的想法，在代码背后工作，但代码太复杂，无法显示，因为许多其他功能嵌入到它。 – KDjava

算法及其数据结构如何映射到代码？你谈到了一些布尔值，但是在代码中看到大量的字符串和列表... –

首先，我会远离array[boolean] a BitSet是更高的内存效率，我希望它在你的情况下更快。因为它会更好地利用缓存。见boolean[] vs. BitSet: Which is more efficient?

对于算法：通过数据结构

迭代。当您遇到第一个true时，请记住它的位置（start），直到达到false。这是位置end 在这一点上，你有一个true值的连续间隔的开始和结束，这基本上是你的结果。你从start到end - n得到你的子集。

重复，直到你最终数据结构

，你甚至可以通过启动正进程，每个处理阵列的不同部分，该segement开始后开始与第一false值，并继续在parallize这直到第一个false结束。

来源

2012-10-10 10:58:35

我会sugggest你创建一个StringBuilder和追加1添加到布尔数组每一个“真”值，并为每一个“假” 0添加。因此你的stringbuilder将有一个1和0的序列。然后，只需使用indexOf（“111”）来获取三个连续的“true”值的起始索引，它将成为stringbuilder和布尔数组中的起始索引。

来源

2012-10-10 10:33:03

雅听起来不错，但我不认为这会给我正确的重叠trues子集，例如，[true] [true] [true] [true]，实际上应该给你索引为2的true的2个子集， 1,2和1,2,3。 – KDjava

如果您使用上次找到的startindex +1开始下一次搜索，则可以解决此问题 –

最简单的算法是检查从索引x开始的N个值。如果至少存在一个错误，则可以直接进入索引x + N。否则你可以检查索引x + 1; 如果没有有效的序列，那么您将检查大小/ N个单元格。

伪代码：

int max = array.length - N; 
int index = 0; 
boolean valid = true; 
while (index < max) { 
    valid = true; 
    for (check = index; check<index+N; check++){ 
     valid = valid && array[check]; 
    } 
    if (valid) { 
     // you got a continous sequence of true of size N 
     ; 
     index++; 
    } else { 
     index = index + N; 
    }  
}

也，有位集合，而不是你可以使用nextClearByte获得下一个虚假的索引的数组。与前面的假负N的差值表示N真的序列的序数（其中前面的假初始值为-1）。

来源

2012-10-10 11:13:04

回答

相关问题