2014-10-27 84 views
1

对应的k行我有一个矩阵X (shape mXn)和矢量y(Shape mX1)和概率向量p(shape mX1)选择从矩阵和矢量

我要采样的行从X和相应的行y中基于概率p k行..

我该如何在python中实现这个功能(因为在那里已经有内置的函数了吗?)

回答

1

你需要使用累积分布函数(或者使用numpy或者自己写),然后将这些向量一起压缩到实施你打算实现的目标

实施

def sample(population, k, prob = None): 
    import random 
    from bisect import bisect 
    from operator import itemgetter 
    def cdf(population, k, prob): 
     population = map(itemgetter(1), sorted(zip(prob, population))) 
     _cumm = [prob[0]] 
     for i in range(1, len(P)): 
      _cumm.append(_cumm[-1] + P[i]) 
     return [population[bisect(_cumm, random.random())] for i in range(k)] 
    if prob == None: 
     return random.sample(population, k) 
    else: 
     return cdf(population, k, prob) 

def gen_sample_data(m, n): 
    X = [random.sample(range(100), n) for _ in range(m)] 
    Y = random.sample(range(100), m) 
    P = random.sample(range(100), m) 
    P = [1. * e/sum(P) for e in P] 
    return X, Y, P 


>>> X, Y, P = gen_sample_data(10, 5) 
>>> pprint.pprint(X) 
[[29, 14, 95, 4, 83], 
[80, 73, 34, 70, 49], 
[67, 25, 94, 46, 83], 
[78, 24, 80, 38, 91], 
[90, 22, 53, 20, 71], 
[91, 0, 64, 90, 59], 
[82, 66, 22, 33, 93], 
[25, 34, 7, 5, 2], 
[87, 0, 91, 8, 78], 
[17, 30, 73, 14, 63]] 
>>> pprint.pprint(Y) 
[83, 61, 62, 59, 41, 72, 56, 23, 36, 97] 
>>> pprint.pprint(P) 
[0.015424164524421594, 
0.002570694087403599, 
0.2544987146529563, 
0.02570694087403599, 
0.10796915167095116, 
0.033419023136246784, 
0.08483290488431877, 
0.20565552699228792, 
0.2236503856041131, 
0.04627249357326478] 
>>> pprint.pprint(zip(*sample(zip(X,Y), 5, prob = P))) 
[([67, 25, 94, 46, 83], 
    [87, 0, 91, 8, 78], 
    [82, 66, 22, 33, 93], 
    [87, 0, 91, 8, 78], 
    [87, 0, 91, 8, 78]), 
(62, 36, 56, 36, 36)] 
+0

是否有范围对于i一个错字(1,LEN(P)):什么是P' – Fraz 2014-10-28 06:22:51

+0

是的,这是一个印刷错误。它实际上应该是prob – Abhijit 2014-10-28 06:44:20