我有一个问题,我有元组称为状态和行为,我想计算它的“二进制功能”。下面描述计算状态和动作特征的功能。请注意,这只是一个玩具代码。我有大约700,000个状态和动作组合。我还需要具有numpy数组/ scipy稀疏矩阵中的特征。我应该将值存储在字典中还是即时计算?
现在,问题是,我必须计算状态和动作的特征百万次。我有两个选择。
一种选择是事先使用低于700,000个组合的函数来计算并将其存储在字典中。键是(状态,动作),值是二进制功能。
另一个选项是每次我想要查找每个状态和动作的二进制特征的值时调用下面的函数。
我的目标是要获得良好的性能,并且要有记忆效率。
from numpy import array
from scipy import sparse
def compute_features(state, action):
# state and action are 3-tuples of integers.
# e.g. (1, 2, 3)
return array(state) - array(action)
def computer_binary_features(state, action, coord_size):
# e.g.
# features = (1, 0, 2)
# sizes = (2, 2, 3)
# Meaning, the size of first coordinate is 2, second is 2 and third is 3.
# It means the first coordinate can only take value integers 0 to 7.
#
# What this function does is turning (1, 0, 2) into binary features.
# For the first coordinate, the value is 1 and the size is 2, so the binary
# features of the first coordinate it (0, 1).
# Second coordinate, the value is 0 and the size is 2. The binary features
# is (1, 0)
# Third coordinate, the value is 2 and the size is 3. The binary features is
# (0, 0, 1).
#
# So the binary features of (1, 0, 2) is: (0, 1, 1, 0, 0, 0, 1)
#
# This function does not do concatenation but rather finding position of ones
# in the binary features of size sum(sizes).
# returns a coo sparse 0-1 valued 1 x n matrix.
features = compute_features(state, action)
coord_size = array(coord_size)
col = []
index = 0
for i in range(len(features)):
index = index + coord_size[i]
col.append(index + features[i] - min_elem[i])
row = [0] * len(col)
data = [1] * len(col)
mtx = sparse.coo_matrix((data, (row, col)), (1, sum(coord_size)),
dtype=np.uint8)
return mtx
Python 3.2+仅为此目的而具有[@ functools.lru_cache](https://docs.python.org/3.4/library/functools.html#functools.lru_cache)装饰器。 – wwii 2015-02-08 18:28:25
有趣。我会看看这个。 – 2015-02-09 02:51:40
我选择了第一个选项,性能几乎提高了400%。 – 2015-02-09 15:31:06