2014-08-29 72 views
2

我有一些3维位置的数据。3维数据挖掘性能

# max size of grid (x, y, z) 
grid_size = (999, 999, 99) 

class MyObject(object): 
    def __init__(self, id): 
    self.id = id 
    self.trace = [] 

[...] 
# objects have some positions in their "trace" 
print(myobject1.trace) 
[(65, 128, 12), (66, 128, 12), (66, 129, 12)] 
print(myobject2.trace) 
[(456, 255, 75), (456, 254, 75), (456, 254, 74)] 

我需要创建一个包含所有这些对象的位置的地图。目标是找到在该地图中找到对象的最高性能方式。例如,我有一个X坐标列表:在这些坐标系中有什么对象?

,所以我想大约四个策略:

一点维字典与字符串键

{'65.128.12':myobject1, '66.128.12':myobject1, '66.129.12':myobject1, 
'456.255.75':myobject2, '456.254.75':myobject2, '456.254.74':myobject2} 

def find_in_str_map(search_points, map_str): 
    found_objects = [] 
    for trace_point in search_points: 
    key = str(trace_point[0])+'.'+str(trace_point[1])+'.'+str(trace_point[2]) 
    if key in map_str: 
     if map_str[key].id != myobject.id: 
     found_objects.append(map_str[key]) 
    return found_objects 

一点维字典与诠释键

{6512812:myobject1, 6612812:myobject1, 6612912:myobject1, 
45625575:myobject2, 45625475:myobject2, 45625474:myobject2} 

def find_in_int_map(search_points, map_str): 
    found_myobjects = [] 
    for trace_point in search_points: 
    key = trace_point[0]*100000+trace_point[1]*100+trace_point[2] 
    if key in map_str: 
     if map_str[key].id != myobject.id: 
     found_myobjects.append(map_str[key]) 
    return found_myobjects 

一维字典与元组(coordonate)键

{(65, 128, 12):myobject1, (66, 128, 12):myobject1, (66, 129, 12):myobject1, 
(456, 255, 75):myobject2, (456, 254, 75):myobject2, (456, 254, 74):myobject2} 

def find_in_tuple_map(search_points, map): 
    found_myobjects = [] 
    for trace_point in search_points: 
    if trace_point in map: 
     if map[trace_point].id != myobject.id: 
     found_objects.append(map[trace_point]) 
    return found_objects 

三维字典

{456: {254: {74: myobject2, 75: myobject2}, 255: {75: myobject2}}, 65: {128: {12: myobject1}}, 66: {128: {12: myobject1}, 129: {12: myobject1}}} 

def find_in_3d_map(search_points, map): 
    founds_myobjects = [] 
    for trace_point in search_points: 
    x = trace_point[0] 
    y = trace_point[1] 
    z = trace_point[2] 
    if x in map: 
     if y in map[x]: 
     if z in map[x][y]: 
      founds_myobjects.append(map[x][y][z]) 
    return founds_myobjects 

所以,我测试用timeit(和大量的对象),这些strategys的性能:在这里

print('str', timeit.timeit('find_in_str_map(bugs, map_str)', number=10, [...] 
print('int', timeit.timeit('find_in_int_map(bugs, map_int)', number=10, [...] 
print('3d ', timeit.timeit('find_in_3d_map(bugs, map_3d)', number=10, [...] 
print('tup', timeit.timeit('find_in_tuple_map(bugs, map_tuple)', number=10, [...] 

(可测试的代码:http://pastebin.com/FfkeEw9U

分的结果是:

python2.7

('str', 8.213999032974243) 
('int', 5.6337010860443115) 
('3d ', 6.18729305267334) 
('tup', 5.0934319496154785) 

python3.3

str 10.11169655699996 
int 5.984578157000215 
3d 6.448565245998907 
tup 5.139268291999542 

确实存在其他战略入库,矿图3D的坐标收藏?我提交的3个战略是可优化的?

+0

您的跟踪点'tuple's,所以为什么不使用那些元组直接键?简单得多,似乎也快一点。 – 2014-08-29 08:35:07

+0

我忘了测试它,arf。我将它添加到测试战略 – bux 2014-08-29 09:06:22

回答

0

最简单的方法是使用您的坐标元组作为您的地图的关键。

{(65,128,12):myobject1, (66,128,12):myobject1, (66,129,12):myobject1, 
(456,255,75):myobject2, (456,254,75):myobject2, (456,254,74):myobject2}  

def find_collisions_tuple_map(bugs, map): 
    collisions_bugs = [] 
    for bug in bugs: 
    for trace_point in bug.get_possibles_future_trace_point(): 
     if trace_point in map: 
     collisions_bugs.append(map[trace_point]) 
    return collisions_bugs 

在我的电脑,它的速度稍快

('str', 10.188277582443057) 
('int', 7.133011876243648) 
('3d ', 7.486879201843017) 
('tuple ', 6.406966607422291)