Python/Numpy - 填补非连续点之间的差距？

我试图找到一个矢量/快/ numpy的友好的方式转换成在A列下面的值，列B：Python/Numpy - 填补非连续点之间的差距？

算法定义列“B”将填补之间的所有缝隙1和-1的组的值为1，跳过每对中的第一行。也就是说，对于ID4-ID7，列B填充了1（给定列A @ ID3中的最初1）。接下来，从ID10-ID14填充1（因为列A @ ID9 = 1）。

虽然这很容易做for循环，但我想知道是否存在非循环解决方案？一个O（n）的循环基础的解决方案是如下：

import numpy as np 
import pandas as pd 
x = np.array([ 0, 0, 1, 1, 0 ,0, -1, 0, 1, 0 , 0, 1, 0, -1, 0]) 


def make_y(x,showminus=False): 
    y = x * 0 
    state = 0 # are we in 1 or 0 or -1 
    for i,n in enumerate(x): 
     if n == 1 and n != state: 
      state = n 
      if i < len(y)-1: 
       y[i+1] = state 
     elif n == -1 and n != state: 
      y[i] = state 
      if showminus: 
       state = -1 
      else: 
       state = 0 
     else: 
      y[i] = state 
    return y 

y = make_y(x) 
print pd.DataFrame([x,y]).T

上述功能得到我的机器上具有以下性能：

%timeit y = make_y(x) 
10000 loops, best of 3: 28 µs per loop

我猜一定是有方法，使整个事情速度更快，因为我最终将需要处理的是1000万个+元素长数组...

来源

2014-09-26 bazel

是模式始终如果A是1个，那么下一行是1时至-1出现在答：这是1和-1标志着连续的开始和结束1s（但不包括1出现在A中的那一行） – EdChum 2014-09-26 12:33:59

@EdChum--这是正确的。然而，你可能已经注意到'make_y'循环函数中有一个参数也可以跟踪-1区域。为了简化事情（最初），我将这一部分放在了问题的范围之外。 – bazel 2014-09-26 12:52:28

这是棘手的，我想不出没有迭代的方法，你可以使用类似'mask = df.loc [（df ['A']。shift（）== 1）| （df ['A'] == - 1）]'然后使用'mask.loc [（mask ['A'] == -1）| （mask ['A']。shift（-1）！= -1）]'然后应该显示开始和结束索引，然后遍历或拉动索引到成对的列表中，并将其设置为1. – EdChum 2014-09-26 13:25:34

一个可能的量化的解决方案可能是如下

idx_1s, = np.where(x == -1) # find the positions of the -1's 
idx1s, = np.where(x == 1) # find the positions of the 1's

找哪家1级的应变成0和标记1的块的开始：

idx0s = np.concatenate(([0], np.searchsorted(idx1s, idx_1s[:-1]))) 
idx0s = idx1s[idx0s]

我们现在有两条等长的阵列，idx0s和idx_1s，标志着第一个和最后一个项目的位置每个块，所以我们现在可以做的事：

y = x.copy() 
y[idx0s] = 0 
idx0s += 1 
idx_1s += 1 
mask = np.zeros_like(y, dtype=np.bool) 
mask[idx0s] = True 
mask[idx_1s] = True 
mask = np.logical_xor.accumulate(mask) 
y[mask] = 1

其产生期望的：

>>> y 
array([0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0])

它可能是一个利由于格式不正确，我不认为它会优雅地处理拖尾-1。但唯一的非O（n）操作是对搜索已分类的调用，但searchsorted进行了优化，可以更快地搜索已排序的键，因此它可能不会引起注意。

如果我把它放在你的x上，它不会超过循环版本，但对于更大的数组，它可能会。

来源

2014-09-26 14:26:40 Jaime

这很漂亮，谢谢。我在2600+元素的数组上定时了你的解决方案。最初的for循环出现了大约500ms。Cython优化版将其降至2ms。该解决方案将其降至113μs。 Jaime好，再次感谢！ – bazel 2014-09-26 23:46:19

这工作得很好，

A=[0,0,1,1,0,0,-1,0,1,0,0,1,0,-1,0] 
B=[] 
#initializing column with same number of zeros 
for j in range(len(A)): 
    B.append(0) 
print A 
for i in range(len(A)): 
    #retrieve the indices of pair (1 to -1) 
    try: 
      one_index=A.index(1) 
      neg_one_index=A.index(-1) 
    except: 
      pass 
    one_index=one_index+1 
    #replacing the zeros in column B by 1 at correct locations 
    while one_index<=neg_one_index: 
      B[one_index]=1 
      A[one_index-1]=0 
      A[one_index]=0 
      one_index=one_index+1 
print B 
#output->[0,0,0,1,1,1,1,0,0,1,1,1,1,1,0] (i.e correct)

来源

2014-09-26 16:05:07 Yogesh

对不起，这不比OP的尝试更快，OP正在寻找矢量化解决方案。 – EdChum 2014-09-26 17:05:23

Python/Numpy - 填补非连续点之间的差距？

回答

相关问题