2
我有一些3维位置的数据。3维数据挖掘性能
# max size of grid (x, y, z)
grid_size = (999, 999, 99)
class MyObject(object):
def __init__(self, id):
self.id = id
self.trace = []
[...]
# objects have some positions in their "trace"
print(myobject1.trace)
[(65, 128, 12), (66, 128, 12), (66, 129, 12)]
print(myobject2.trace)
[(456, 255, 75), (456, 254, 75), (456, 254, 74)]
我需要创建一个包含所有这些对象的位置的地图。目标是找到在该地图中找到对象的最高性能方式。例如,我有一个X坐标列表:在这些坐标系中有什么对象?
,所以我想大约四个策略:
一点维字典与字符串键:
{'65.128.12':myobject1, '66.128.12':myobject1, '66.129.12':myobject1,
'456.255.75':myobject2, '456.254.75':myobject2, '456.254.74':myobject2}
def find_in_str_map(search_points, map_str):
found_objects = []
for trace_point in search_points:
key = str(trace_point[0])+'.'+str(trace_point[1])+'.'+str(trace_point[2])
if key in map_str:
if map_str[key].id != myobject.id:
found_objects.append(map_str[key])
return found_objects
一点维字典与诠释键:
{6512812:myobject1, 6612812:myobject1, 6612912:myobject1,
45625575:myobject2, 45625475:myobject2, 45625474:myobject2}
def find_in_int_map(search_points, map_str):
found_myobjects = []
for trace_point in search_points:
key = trace_point[0]*100000+trace_point[1]*100+trace_point[2]
if key in map_str:
if map_str[key].id != myobject.id:
found_myobjects.append(map_str[key])
return found_myobjects
一维字典与元组(coordonate)键:
{(65, 128, 12):myobject1, (66, 128, 12):myobject1, (66, 129, 12):myobject1,
(456, 255, 75):myobject2, (456, 254, 75):myobject2, (456, 254, 74):myobject2}
def find_in_tuple_map(search_points, map):
found_myobjects = []
for trace_point in search_points:
if trace_point in map:
if map[trace_point].id != myobject.id:
found_objects.append(map[trace_point])
return found_objects
三维字典
{456: {254: {74: myobject2, 75: myobject2}, 255: {75: myobject2}}, 65: {128: {12: myobject1}}, 66: {128: {12: myobject1}, 129: {12: myobject1}}}
def find_in_3d_map(search_points, map):
founds_myobjects = []
for trace_point in search_points:
x = trace_point[0]
y = trace_point[1]
z = trace_point[2]
if x in map:
if y in map[x]:
if z in map[x][y]:
founds_myobjects.append(map[x][y][z])
return founds_myobjects
所以,我测试用timeit(和大量的对象),这些strategys的性能:在这里
print('str', timeit.timeit('find_in_str_map(bugs, map_str)', number=10, [...]
print('int', timeit.timeit('find_in_int_map(bugs, map_int)', number=10, [...]
print('3d ', timeit.timeit('find_in_3d_map(bugs, map_3d)', number=10, [...]
print('tup', timeit.timeit('find_in_tuple_map(bugs, map_tuple)', number=10, [...]
(可测试的代码:http://pastebin.com/FfkeEw9U)
分的结果是:
python2.7:
('str', 8.213999032974243)
('int', 5.6337010860443115)
('3d ', 6.18729305267334)
('tup', 5.0934319496154785)
python3.3:
str 10.11169655699996
int 5.984578157000215
3d 6.448565245998907
tup 5.139268291999542
确实存在其他战略入库,矿图3D的坐标收藏?我提交的3个战略是可优化的?
您的跟踪点'tuple's,所以为什么不使用那些元组直接键?简单得多,似乎也快一点。 – 2014-08-29 08:35:07
我忘了测试它,arf。我将它添加到测试战略 – bux 2014-08-29 09:06:22