有效地Scala的集合

与OpenFlow工作，我想计算两个所有组合比较多少双流满足以下条件（给定任意两个流，例如f1和f2）：有效地Scala的集合

f1“ s Src ip必须等于f2的Dst ip。
f1的Dst ip必须等于f2的Src ip。
f1和f2必须具有相同的协议。

这个我认为应该是两个元素(NFlows Choose 2)的组合。由于this question我使用的方法combinations象下面这样：

val pairFlows = matchs.combinations(2).filter{ list => 
    val f1 = list.head 
    val f2 = list.tail.head 

    f1.dl_type == f2.dl_type && 
    f1.nw_dst == f2.nw_src && 
    f1.nw_src == f2.nw_dst 
}

我的问题是，这是执行这种计算的最有效的方法是什么？或者为了跟踪每个src，dst和protocol type，创建一个HashMap会更好吗？

这里是数据的一个示例：

{ 
    1: [ 
     { 
      "actions": [ 
       "OUTPUT:2" 
      ], 
      "idle_timeout": 0, 
      "cookie": 0, 
      "packet_count": 18, 
      "hard_timeout": 0, 
      "byte_count": 1764, 
      "duration_sec": 6884, 
      "duration_nsec": 5000000, 
      "priority": 1, 
      "length": 120, 
      "flags": 0, 
      "table_id": 0, 
      "match": { 
       "dl_type": 2048, 
       "dl_dst": "00:00:00:00:00:02", 
       "nw_src": "10.0.0.1", 
       "in_port": 1, 
       "nw_dst": "10.0.0.2" 
      } 
     }, 
     { 
      "actions": [ 
       "OUTPUT:2" 
      ], 
      "idle_timeout": 0, 
      "cookie": 0, 
      "packet_count": 18, 
      "hard_timeout": 0, 
      "byte_count": 1764, 
      "duration_sec": 1123, 
      "duration_nsec": 181000000, 
      "priority": 1, 
      "length": 120, 
      "flags": 0, 
      "table_id": 0, 
      "match": { 
       "dl_type": 2048, 
       "dl_dst": "00:00:00:00:00:02", 
       "nw_src": "10.0.0.3", 
       "in_port": 3, 
       "nw_dst": "10.0.0.2" 
      } 
     }, 
     { 
      "actions": [ 
       "OUTPUT:1" 
      ], 
      "idle_timeout": 0, 
      "cookie": 0, 
      "packet_count": 10, 
      "hard_timeout": 0, 
      "byte_count": 980, 
      "duration_sec": 1132, 
      "duration_nsec": 253000000, 
      "priority": 1, 
      "length": 120, 
      "flags": 0, 
      "table_id": 0, 
      "match": { 
       "dl_type": 2048, 
       "dl_dst": "00:00:00:00:00:01", 
       "nw_src": "10.0.0.3", 
       "in_port": 3, 
       "nw_dst": "10.0.0.1" 
      } 
     }, 
     { 
      "actions": [ 
       "OUTPUT:1" 
      ], 
      "idle_timeout": 0, 
      "cookie": 0, 
      "packet_count": 16, 
      "hard_timeout": 0, 
      "byte_count": 1568, 
      "duration_sec": 1141, 
      "duration_nsec": 652000000, 
      "priority": 1, 
      "length": 120, 
      "flags": 0, 
      "table_id": 0, 
      "match": { 
       "dl_type": 2048, 
       "dl_dst": "00:00:00:00:00:01", 
       "nw_src": "10.0.0.2", 
       "in_port": 2, 
       "nw_dst": "10.0.0.1" 
      } 
     }, 
     { 
      "actions": [ 
       "OUTPUT:3" 
      ], 
      "idle_timeout": 0, 
      "cookie": 0, 
      "packet_count": 20, 
      "hard_timeout": 0, 
      "byte_count": 1960, 
      "duration_sec": 6883, 
      "duration_nsec": 961000000, 
      "priority": 1, 
      "length": 120, 
      "flags": 0, 
      "table_id": 0, 
      "match": { 
       "dl_type": 2048, 
       "dl_dst": "00:00:00:00:00:03", 
       "nw_src": "10.0.0.2", 
       "in_port": 2, 
       "nw_dst": "10.0.0.3" 
      } 
     }, 
     { 
      "actions": [ 
       "OUTPUT:3" 
      ], 
      "idle_timeout": 0, 
      "cookie": 0, 
      "packet_count": 12, 
      "hard_timeout": 0, 
      "byte_count": 1176, 
      "duration_sec": 6883, 
      "duration_nsec": 984000000, 
      "priority": 1, 
      "length": 120, 
      "flags": 0, 
      "table_id": 0, 
      "match": { 
       "dl_type": 2048, 
       "dl_dst": "00:00:00:00:00:03", 
       "nw_src": "10.0.0.1", 
       "in_port": 1, 
       "nw_dst": "10.0.0.3" 
      } 
     }, 
     { 
      "actions": [ 
       "OUTPUT:CONTROLLER" 
      ], 
      "idle_timeout": 0, 
      "cookie": 0, 
      "packet_count": 0, 
      "hard_timeout": 0, 
      "byte_count": 0, 
      "duration_sec": 9156, 
      "duration_nsec": 252000000, 
      "priority": 65535, 
      "length": 96, 
      "flags": 0, 
      "table_id": 0, 
      "match": { 
       "dl_type": 35020, 
       "dl_dst": "01:80:c2:00:00:0e" 
      } 
     }, 
     { 
      "actions": [ 
       "OUTPUT:CONTROLLER" 
      ], 
      "idle_timeout": 0, 
      "cookie": 0, 
      "packet_count": 22, 
      "hard_timeout": 0, 
      "byte_count": 1260, 
      "duration_sec": 9156, 
      "duration_nsec": 268000000, 
      "priority": 0, 
      "length": 80, 
      "flags": 0, 
      "table_id": 0, 
      "match": {} 
     } 
    ], 
}

在这里有8间流动，其中3个是对。

来源

2017-01-16 elbaulp

与GROUPBY（_。dl_type）开始会给你几个较小的列表，而不是一个巨大的，检查。根据您的数据的外观，这可以帮助很多，一点或根本没有。另外，看着通过h来自n个元素列表的2个元素的所有组合是O（n^2）。如果您保留两个列表，一个按dst排序，另一个按src排序（这是在O（nlogn）中完成的），您可以遍历它们一次并收集所有匹配。这会给你比上面的例子更好的时间复杂度。 – Buhb

@Buhb谢谢，我已经设法在'O（n） – elbaulp

我终于写的代码是O(n)，我保持Map与关键f.nw_src + f.nw_dst + f.dl_type和价值Int表示，如果该键有相应的对流量（这将是一个具有关键f.nw_dst + f.nw_src + f.dl_type下面是最终代码：

val table = flows./:(Map.empty[String,Int]){ case (m,f) => 
    val key = f.nw_src + f.nw_dst + f.dl_type 
    val inverseKey = f.nw_dst + f.nw_src + f.dl_type 
    val haspair = m get inverseKey match { 
    case Some(v) => v + 1 
    case None => 0 
    } 
    m + (key -> haspair) 
} 

val pairs = table.filter(_._2 > 0)

希望这有助于有人找一个类似的问题。

来源

2017-01-17 11:13:01 elbaulp

有效地Scala的集合

回答

相关问题