2016-11-16 85 views
3

我有一组对象(我将称它们为节点)相互连接。每个节点都连接到至少一个其他节点,并且整个节点集合都是一个大块(没有异常值)。节点的数量是三的倍数。将连接的节点分为三组

我想要做的是将节点分成三个连接节点组(组间没有重叠)。


每个节点都有唯一的ID。节点存储在一个文档中,其中每一行都是节点的ID,后面跟着它连接的节点。它看起来是这样的:

30:240 
40:210/240/230/60 
50:80/220/230/70/210/190/200 
60:240/40/230/220 
70:90/200/50/220 
80:50/170/190/210 
90:200/70/180 
100:110/160/190/170/200/180 
110:100/180 
160:170/100 
170:80/160/190/100 
180:90/200/110/100 
190:50/80/200/170/100 
200:90/70/100/50/190/180 
210:50/80/40/230 
220:50/230/70/60 
230:50/210/60/40/220 
240:40/30/60 

我做了一些图像,以帮助它想象:

nodes, ungrouped

我想我的结点分为三组,但它们必须连接。有很多可能的组合节点的组合。一种可能性是以下图像:

nodes, grouped properly

然而,我奋力想出一种算法,可以确保每个节点是一组三个连接的节点(并没有重叠)的英寸为了避免重叠,我在创建组时声称节点,这是我检查节点是否已经属于另一个组的方法。

我想出了下面的代码:

nodes = dict() 
with open("input.txt") as f: 
    for line in f.readlines(): 
     split = line.split(":") 
     node = int(split[0]) 
     nodes[node] = [] 
     for connection in split[1].split("/"): 
      nodes[node].append(int(connection.strip())) 
groups = dict() 
claimed = [] 
for node in nodes: 
    if node in claimed: 
     continue 
    connections = nodes[node] 
    if len(connections) > 1: 
     for i in range(len(connections)): 
      if node in groups and len(groups[node]) == 2: 
       break 
      if not connections[i] in claimed: 
       if not node in groups: 
        groups[node] = [] 
       groups[node].append(connections[i]) 
       claimed.append(connections[i]) 
       if not node in claimed: 
        claimed.append(node) 
f = open("output.txt",'w') 
for group in groups: 
    line = str(group) + ":" 
    for node in groups[group]: 
     line += str(node) + "/" 
    line = line[:-1] + "\n" 
    f.write(line) 
f.close() 

然而,这将产生以下:

160:170/100 
80:190 
230:50/210 
70:90/200 
110:180 
240:40/30 
220:60 

其中可以可视化这样的:

nodes, grouped wrongly

正如你所看到的,有四个三人组(蓝色),和三组两个(橙色)。这是因为我声称节点是由于我正在创建组,导致某些节点不再拥有两个未声明的连接节点。


所以我的问题是:怎样才可以有我的节点要求以这样的方式,其他节点仍然可以形成自己的三个一组连接的节点?

或更基本上:如何将连接的节点分成三个连接节点的组,并且没有重叠?每个节点都需要有三个小组。

我的问题是关于算法比代码更多。我也开放给一个完全不同的算法。

+0

每组连接节点中的节点数总是3的倍数? –

+0

@RiccardoPetraglia每组3个节点,节点总数是3的倍数 – Chronicle

+2

并不总是有解决方案。例如:假设您有9个顶点(a-i),以及以下边:(a,d),(b,d),(c,d)以及e-i上的完整图。然后,a,b和c中的每一个都需要与d相同,但其中只有两个可以。 – Dave

回答

1

data.py

node_string = """ 
30:240 
40:210/240/230/60 
50:80/220/230/70/210/190/200 
60:240/40/230/220 
70:90/200/50/220 
80:50/170/190/210 
90:200/70/180 
100:110/160/190/170/200/180 
110:100/180 
160:170/100 
170:80/160/190/100 
180:90/200/110/100 
190:50/80/200/170/100 
200:90/70/100/50/190/180 
210:50/80/40/230 
220:50/230/70/60 
230:50/210/60/40/220 
240:40/30/60 
     """ 

node_string_1 = """ 
(238,140,141):(208,134,191)/(117,119,222)/(229,134,102)/(167,234,229) 
(176,120,113):(150,201,147)/(111,237,177) 
(113,171,164):(237,128,198)/(159,172,151)/(202,171,225)/(240,168,211)/(111,218,139) 
(208,134,191):(238,140,141)/(229,134,102) 
(131,150,110):(212,209,198)/(155,133,217)/(168,140,187)/(173,204,120)/(147,210,106)/(207,149,101)/(228,172,108) 
(229,134,102):(208,134,191)/(238,140,141)/(167,234,229) 
(212,209,198):(109,129,191)/(155,133,217)/(147,210,106)/(131,150,110) 
(105,199,125):(150,201,147)/(175,118,238)/(240,168,211) 
(175,118,238):(105,199,125)/(240,168,211) 
(114,178,129):(164,233,166)/(227,123,219)/(135,156,161)/(224,183,104)/(228,172,108) 
(147,210,106):(212,209,198)/(173,204,120)/(131,150,110)/(143,147,114) 
(222,128,188):(135,156,161)/(237,128,198)/(159,172,151)/(111,237,177)/(150,201,147) 
(166,218,221):(207,149,101)/(143,114,198)/(159,172,151)/(202,171,225) 
(143,147,114):(173,204,120)/(107,184,168)/(147,210,106) 
(111,218,139):(132,138,234)/(202,171,225)/(113,171,164) 
(117,119,222):(109,129,191)/(155,133,217)/(238,140,141)/(167,234,229) 
(107,184,168):(164,233,166)/(173,204,120)/(143,147,114)/(227,123,219) 
(224,183,104):(164,233,166)/(114,178,129)/(111,237,177)/(135,156,161) 
(228,172,108):(227,123,219)/(173,204,120)/(135,156,161)/(159,172,151)/(207,149,101)/(131,150,110)/(114,178,129) 
(227,123,219):(173,204,120)/(164,233,166)/(107,184,168)/(228,172,108)/(114,178,129) 
(168,140,187):(155,133,217)/(207,149,101)/(176,204,213)/(143,114,198)/(131,150,110)/(159,166,127) 
(207,149,101):(168,140,187)/(159,172,151)/(166,218,221)/(143,114,198)/(131,150,110)/(228,172,108) 
(159,172,151):(207,149,101)/(135,156,161)/(237,128,198)/(202,171,225)/(113,171,164)/(166,218,221)/(222,128,188)/(228,172,108) 
(143,114,198):(168,140,187)/(207,149,101)/(202,171,225)/(166,218,221)/(184,163,168)/(159,166,127) 
(164,233,166):(227,123,219)/(114,178,129)/(107,184,168)/(224,183,104) 
(150,201,147):(176,120,113)/(240,168,211)/(237,128,198)/(105,199,125)/(111,237,177)/(222,128,188) 
(111,237,177):(135,156,161)/(224,183,104)/(222,128,188)/(176,120,113)/(150,201,147) 
(159,166,127):(184,163,168)/(168,140,187)/(176,204,213)/(143,114,198) 
(155,133,217):(212,209,198)/(168,140,187)/(167,234,229)/(176,204,213)/(109,129,191)/(117,119,222)/(131,150,110) 
(109,129,191):(212,209,198)/(117,119,222)/(155,133,217) 
(132,138,234):(184,163,168)/(202,171,225)/(111,218,139) 
(240,168,211):(150,201,147)/(237,128,198)/(105,199,125)/(175,118,238)/(113,171,164) 
(167,234,229):(155,133,217)/(117,119,222)/(238,140,141)/(176,204,213)/(229,134,102) 
(173,204,120):(227,123,219)/(147,210,106)/(143,147,114)/(107,184,168)/(131,150,110)/(228,172,108) 
(135,156,161):(114,178,129)/(224,183,104)/(159,172,151)/(111,237,177)/(222,128,188)/(228,172,108) 
(237,128,198):(150,201,147)/(159,172,151)/(222,128,188)/(240,168,211)/(113,171,164) 
(176,204,213):(155,133,217)/(168,140,187)/(159,166,127)/(167,234,229) 
(202,171,225):(132,138,234)/(184,163,168)/(159,172,151)/(113,171,164)/(166,218,221)/(143,114,198)/(111,218,139) 
(184,163,168):(143,114,198)/(132,138,234)/(159,166,127)/(202,171,225) 
""" 

node_grouper.py

import itertools 
import math 
from collections import deque 
from ast import literal_eval as totuple 

import data 

class NodeGrouper(): 

    finished = False 

    steps = 0 

    nodes = {} 
    groups = {} 
    weights = {} 

    best_group = [] 
    returned_results_cache = [] 

    # parse node string 
    # works with two different conventions: 
    # - node names that are numbers 
    # - node names that are tuples 
    def parse_node(self, string): 
     try: 
      return totuple(string) 
     except: 
      return int(string) 

    # Get a dictionary from the source string. 
    # receives node_string (str): the node string passed in settings 
    # returns {160: [170, 100], 240: [40, 30, 60] ... } 
    def parse_string(self, node_string): 
     nodes = {} 
     for line in node_string.split(): 
      split = line.split(":") 
      node = self.parse_node(split[0]) 
      if node not in nodes: 
       nodes[node] = set() 
      for connection in split[1].split("/"): 
       connection = self.parse_node(connection.strip()) 
       if connection not in nodes: 
        nodes[connection] = set() 
       nodes[node].add(connection) 
       nodes[connection].add(node) 
     self.nodes = nodes 

     for node in self.nodes: 
      self.nodes[node] = sorted(self.nodes[node]) 
     return nodes 


    # Get all possible combinations. 
    # Build a tuple with every possible 3 node connection. 
    # receives node (dict) as created by self.parse_string() 
    # returns ((90, 180, 200), (80, 100, 190), (60, 70, 220) ...) 
    def possible_combinations(self, nodes): 
     possible_links = {} 
     for node, links in nodes.items(): 
      link_possibilities = itertools.combinations(links, self.nodes_per_group - 1) 
      for link_possibility in link_possibilities: 
       if node not in possible_links: 
        possible_links[node] = [] 
       possible_links[node] += [link_possibility] 
     groups = set() 
     for node, possible_groups in possible_links.items(): 
      for possible_group in possible_groups: 
       groups.add(tuple(sorted(possible_group + (node,)))) 
     return tuple(groups) 



    # Handle status messages (if settings['log_status'] == True). 
    def log_status(self): 
     # log how many steps were needed 
     if self.settings['log_status']: 
      print 'Error margin (unused nodes): %s' % (len(self.nodes) -(len(self.best_group) * 3)) 
      print '%s steps' % self.steps 
      print self.format_output(self.best_group) 


    # Create an output string in the format of the Stack Overflow ticket. 
    def format_output(self, group_combination): 

     out_dict = {} 

     for group in group_combination: 
      for node in group: 
       links = set([n for n in group if n != node]) 

       if set(links).issubset(set(self.nodes[node])): 
        if node not in out_dict: 
         out_dict[node] = set() 
        out_dict[node] = links 
        break 

     output = '' 
     for node, links in out_dict.items(): 
      output += '%s:' % str(node) 
      output += '/'.join([str(link) for link in links]) 
      output += '\n' 

     return output 



    # Start with groups with nodes harder to match. 
    def sort_possible_groups(self, queried_group): 
     gid = self.groups.index(queried_group) + 1 

     for queried_group in queried_group: 
      if queried_group in self.preferred_nodes: 
       self.weights[gid - 1] -= 3 

     return self.weights[gid - 1] 


    def initial_weights(self): 
     weights = [] 
     groups_with_this_node = {} 
     for group_id, group in enumerate(self.groups): 
      ordinarity = 0 
      for node in group: 
       if node not in groups_with_this_node: 
        groups_with_this_node[node] = sum(x.count(node) for x in self.groups) 
       ordinarity += groups_with_this_node[node] * 100 
      weights += [ordinarity] 
     return weights 


    def selected_group_available(self, group_1, group_pool): 
     for group_2 in group_pool: 
      for node in group_1: 

       self.steps += 1 

       if self.settings['log_status']: 

        # log status each 500000 steps, if specified in the settings 
        if self.steps % self.settings['log_period'] == 0: 
         self.log_status() 

       if node in group_2: 
        return False 

     return True         

    def get_results(self, i = None, weights = None): 

     group_length = len(self.groups) 
     group_length_iterator = range(group_length) 

     groups = deque(sorted(self.groups, key = self.sort_possible_groups)) 

     output = [groups[i]] 
     for j in group_length_iterator: 

      if i == j: 
       continue 

      selected_group = tuple(groups[j]) 

      if self.selected_group_available(selected_group, output): 
       output += [selected_group] 


     match = sorted(tuple(output)) 

     if len(output) > len(self.best_group): 
      self.best_group = output 


     if len(output) >= self.maximum_groups or \ 
      (
       self.settings['maximum_steps'] > 0 and \ 
       self.steps > self.settings['maximum_steps'] 
      ): 
      self.finished = True 


     # print len(output) 
     if match not in self.returned_results_cache: 
      self.returned_results_cache += [match] 
      self.preferred_nodes = [n for n in self.nodes] 
      for g in match: 
       for n in g: 
        if n in self.preferred_nodes: 
         self.preferred_nodes.remove(n) 

     return len(output) 




    def __init__(self, **settings): 

     # save settings in a property 
     self.settings = {} 
     for key, value in settings.iteritems(): 
      self.settings[key] = value 

     self.nodes_per_group = 3 
     self.nodes = self.parse_string(self.settings['node_string']) 
     self.preferred_nodes = [node for node in self.nodes] 
     self.groups = self.possible_combinations(self.nodes)   


     self.groupable_nodes = list(set(sum(self.groups,()))) 
     self.groupable_nodes_total = len(self.groupable_nodes) 

     self.maximum_groups = int(math.floor(self.groupable_nodes_total/self.nodes_per_group)) 

     self.weights = self.initial_weights() 


    def calculate(self): 

     for i in range(len(ng.groups)): 
      self.get_results(i) 
      if self.finished: 
       break 


if __name__ == "__main__": 

    ng = NodeGrouper(

     # 0 is for unlimitted 
     # (try until all nodes are used) 
     maximum_steps = 10000000, 

     # Log number of iterations 
     log_status = False, 
     # Log each x iterations 
     log_period = 5000000, 

     node_string = data.node_string_3, 

    ) 

    ng.calculate() 

    print ng.format_output(ng.best_group) 

它在12.85亿个步骤处理本例中:

enter image description here

+1

你的实现工作很好,但我发现自从尝试其他实现后,这个问题在计算上非常复杂。就像使用搜索树方法一样,在大型数据集上计算只需要很长时间。也许在20年后更好的电脑......! – Chronicle

+0

当我运行'itertools.combinations'时,我认为当我们有一个大的数据集时,我的脚本就会成为一个严重的问题。任何方式我都可以掌握你的数据,看看我能不能解决它?谢谢! –

+0

当然,我可以在一个小时左右的时间内将它放在pastebin之类的东西上。然而,当我运行脚本时,'possible_combinations'在不到一秒钟内完成。瓶颈是'sub_tuple_combinations' – Chronicle

2

我不知道它是否工作在每一个边缘的情况下,但如果你可以通过通过每个节点通过一次图形找到一个路径,你可以组路径分为三个路段。根据图的大小,你可以用一个简单的DFS做到这一点。

编辑: 为了这个工作,你需要修剪所有的尾巴(不归路径),但这意味着当你将尾部相邻的尾部分组时, 。

0

更大的问题是您的搜索中有多少个节点。 如果#个节点保持低位,您可以随时查看所有可能的组合。 并重申每种可能的解决方案。

但是,如果你有100个节点,这将变得不可能。那么你应该喜欢为你工作的AI代码。

+0

我有18669个节点,但没有连接到超过11个其他节点。 – Chronicle

+0

也许你应该看看K Maens [链接] https://www.youtube.com/watch?v=RD0nNK51Fp8。也许不是解决方案,但它应该有助于给出一个方向。 –

1

根据连接的节点创建如下的邻接列表。

90 - > 180,200,70

70 - > 90,200,220

180 - > 90,110,100,200

110 - > 180,100

100→110,180,200,190,160,170

导航应该如下。

  1. 转到90并分配到组1.将标记90完成。
  2. 转到180,因为它是第一个连接的节点。分配给组1并将180标记为完成。
  3. 前往180清单
  4. 第一个是已经标记的90,所以跳过它。
  5. 转到同一列表中的下一个节点。这是110.分配给组1并将110标记为完成。
  6. 转到110列表。

该组已满后递增组编号。

等等......

一旦条件(组数* 3)==节点的总数为满足时,跳出循环。